Velvet Star Monitor

Standout celebrity highlights with iconic style.

general

convert .data file to .csv

Writer Matthew Harrington

I have found the following data set named ecoli.data and available in:

I would like to open it in R for making a classification task, but I would prefer to convert this document into a csv file. When I open it in word I notice that is not tab delimited, because there are like tree spaces between each row; so bottomline question is how to convert this file into csv using Excel or maybe Python.

2

9 Answers

Rename the file to ecoli.txt then open it in Excel. This way you will be using the "Text Import Wizard" of Microsoft Excel that enables you to chose options like "Fixed width". Just click on "next" a few times and "finish" and you will have the data in the Excel grid. Now save it again as CSV.

Using Python 2.7:

import csv
with open('ecoli.data.txt') as input_file: lines = input_file.readlines() newLines = [] for line in lines: newLine = line.strip().split() newLines.append( newLine )
with open('output.csv', 'wb') as test_file: file_writer = csv.writer(test_file) file_writer.writerows( newLines )

Rename it in the file folder from

ecoli.data 

to

ecoli.csv

Then you can use it in your code with the standard import code for csv, without any adding. No more to look back on it. It worked for me with adult.data!

Here are two ways to actually do that in R (that work):

library(readr)
url <- ""

with base R

df <- read.table(url)
dplyr::glimpse(df)
## Observations: 336
## Variables:
## $ V1 (fctr) AAT_ECOLI, ACEA_ECOLI, ACEK_ECOLI, ACKA_ECOLI, ADI_ECOLI, ...
## $ V2 (dbl) 0.49, 0.07, 0.56, 0.59, 0.23, 0.67, 0.29, 0.21, 0.20, 0.42,...
## $ V3 (dbl) 0.29, 0.40, 0.40, 0.49, 0.32, 0.39, 0.28, 0.34, 0.44, 0.40,...
## $ V4 (dbl) 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48,...
## $ V5 (dbl) 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,...
## $ V6 (dbl) 0.56, 0.54, 0.49, 0.52, 0.55, 0.36, 0.44, 0.51, 0.46, 0.56,...
## $ V7 (dbl) 0.24, 0.35, 0.37, 0.45, 0.25, 0.38, 0.23, 0.28, 0.51, 0.18,...
## $ V8 (dbl) 0.35, 0.44, 0.46, 0.36, 0.35, 0.46, 0.34, 0.39, 0.57, 0.30,...
## $ V9 (fctr) cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp, cp...
write.csv(df, "ecoli.csv", row.names=FALSE)

with readr functions

df <- read_table(url, col_names=FALSE)
dplyr::glimpse(df)
## Observations: 336
## Variables:
## $ X1 (chr) "AAT_ECOLI", "ACEA_ECOLI", "ACEK_ECOLI", "ACKA_ECOLI", "ADI...
## $ X2 (dbl) 0.49, 0.07, 0.56, 0.59, 0.23, 0.67, 0.29, 0.21, 0.20, 0.42,...
## $ X3 (dbl) 0.29, 0.40, 0.40, 0.49, 0.32, 0.39, 0.28, 0.34, 0.44, 0.40,...
## $ X4 (dbl) 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48, 0.48,...
## $ X5 (dbl) 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,...
## $ X6 (dbl) 0.56, 0.54, 0.49, 0.52, 0.55, 0.36, 0.44, 0.51, 0.46, 0.56,...
## $ X7 (dbl) 0.24, 0.35, 0.37, 0.45, 0.25, 0.38, 0.23, 0.28, 0.51, 0.18,...
## $ X8 (dbl) 0.35, 0.44, 0.46, 0.36, 0.35, 0.46, 0.34, 0.39, 0.57, 0.30,...
## $ X9 (chr) "cp", "cp", "cp", "cp", "cp", "cp", "cp", "cp", "cp", "cp",...
write_csv(df, "ecoli.csv")

Use

pandas.read_table(' delim_whitespace=True)
1

The simplest way I got, is just renaming the file from ecoli.data to ecoli.csv, but not in PC. Rename in your mobile. It will automatically convert the extension.

An alternative to solve your problem could be to read your .data file on R using the read.table command.

ecoli <- read.table("ecoli.data",header=F)

Using Window Power shell command,

cat german.data | %{$_ -replace " ",","} | sc german.csv

Using UXIX SED command,

sed 's/ /,/g' german.data > german.csv

It's very simple, click the actual dataset name ex: xyz.data and rename it with XYZ.csv this will be converted into CSV format.

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy