convert .data file to .csv
Matthew Harrington
I have found the following data set named ecoli.data and available in:
I would like to open it in R for making a classification task, but I would prefer to convert this document into a csv file. When I open it in word I notice that is not tab delimited, because there are like tree spaces between each row; so bottomline question is how to convert this file into csv using Excel or maybe Python.
29 Answers
Rename the file to ecoli.txt then open it in Excel. This way you will be using the "Text Import Wizard" of Microsoft Excel that enables you to chose options like "Fixed width". Just click on "next" a few times and "finish" and you will have the data in the Excel grid. Now save it again as CSV.
Using Python 2.7:
import csv
with open('ecoli.data.txt') as input_file: lines = input_file.readlines() newLines = [] for line in lines: newLine = line.strip().split() newLines.append( newLine )
with open('output.csv', 'wb') as test_file: file_writer = csv.writer(test_file) file_writer.writerows( newLines ) Rename it in the file folder from
ecoli.data to
ecoli.csvThen you can use it in your code with the standard import code for csv, without any adding. No more to look back on it. It worked for me with adult.data!
Here are two ways to actually do that in R (that work):
library(readr)
url <- ""with base R
df <- read.table(url)
dplyr::glimpse(df)
## Observations: 336
## Variables:
## $ V1 (fctr) AAT_ECOLI, ACEA_ECOLI, ACEK_ECOLI, ACKA_ECOLI, ADI_ECOLI, ...
## $ V2 (dbl) 0.49, 0.07, 0.56, 0.59, 0.23, 0.67, 0.29, 0.21, 0.20, 0.42,...
## $ V3 (dbl) 0.29, 0.40, 0.40, 0.49, 0.32, 0.39, 0.28, 0.34, 0.44, 0.40,...
## $ V4 (dbl) 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48,...
## $ V5 (dbl) 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,...
## $ V6 (dbl) 0.56, 0.54, 0.49, 0.52, 0.55, 0.36, 0.44, 0.51, 0.46, 0.56,...
## $ V7 (dbl) 0.24, 0.35, 0.37, 0.45, 0.25, 0.38, 0.23, 0.28, 0.51, 0.18,...
## $ V8 (dbl) 0.35, 0.44, 0.46, 0.36, 0.35, 0.46, 0.34, 0.39, 0.57, 0.30,...
## $ V9 (fctr) cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp...
write.csv(df, "ecoli.csv", row.names=FALSE)with readr functions
df <- read_table(url, col_names=FALSE)
dplyr::glimpse(df)
## Observations: 336
## Variables:
## $ X1 (chr) "AAT_ECOLI", "ACEA_ECOLI", "ACEK_ECOLI", "ACKA_ECOLI", "ADI...
## $ X2 (dbl) 0.49, 0.07, 0.56, 0.59, 0.23, 0.67, 0.29, 0.21, 0.20, 0.42,...
## $ X3 (dbl) 0.29, 0.40, 0.40, 0.49, 0.32, 0.39, 0.28, 0.34, 0.44, 0.40,...
## $ X4 (dbl) 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48,...
## $ X5 (dbl) 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,...
## $ X6 (dbl) 0.56, 0.54, 0.49, 0.52, 0.55, 0.36, 0.44, 0.51, 0.46, 0.56,...
## $ X7 (dbl) 0.24, 0.35, 0.37, 0.45, 0.25, 0.38, 0.23, 0.28, 0.51, 0.18,...
## $ X8 (dbl) 0.35, 0.44, 0.46, 0.36, 0.35, 0.46, 0.34, 0.39, 0.57, 0.30,...
## $ X9 (chr) "cp", "cp", "cp", "cp", "cp", "cp", "cp", "cp", "cp", "cp",...
write_csv(df, "ecoli.csv") Use
pandas.read_table(' delim_whitespace=True) 1 The simplest way I got, is just renaming the file from ecoli.data to ecoli.csv, but not in PC. Rename in your mobile. It will automatically convert the extension.
An alternative to solve your problem could be to read your .data file on R using the read.table command.
ecoli <- read.table("ecoli.data",header=F) Using Window Power shell command,
cat german.data | %{$_ -replace " ",","} | sc german.csv
Using UXIX SED command,
sed 's/ /,/g' german.data > german.csv
It's very simple, click the actual dataset name ex: xyz.data and rename it with XYZ.csv this will be converted into CSV format.