How to read in a large mixed csv data file

2 views (last 30 days)
Noah
Noah on 30 Jun 2015
Commented: Walter Roberson on 1 Jul 2015
Hi,
Please excuse me if this is difficult to understand - I am new to Matlab and coding in general.
I’m trying to read in a large mixed data file that I can then manipulate. The file is 53 columns by about 8.8 million observations (rows) and is in csv format separated by commas. It is arranged as follows:
Numbers only: Columns 1-28, 30-34, 36, 50, 53
All other columns are text only (e.g. Johannesburg) or mixed text and numbers (e.g. E44). Some include spaces (e.g.Cape Town) and others symbols like slashes (e.g. A00-A09).
It is not clear to me if the first row is headings or not.
I’m assuming I need to use either readtable or textscan, but so far have been unsuccessful with the code.
Thanks for the help! Noah
  2 Comments
Image Analyst
Image Analyst on 1 Jul 2015
Crop out just a few rows, say 20, and upload the cropped file here with the paper clip icon. I think readtable should work, unless it got confused at some columns that have both numbers only in them and letters only in them and perhaps a mix. Post your readtable() call.
per isakson
per isakson on 1 Jul 2015
Edited: per isakson on 1 Jul 2015
There are many alternatives. Did you try A = importdata(filename)?

Sign in to comment.

Answers (1)

Walter Roberson
Walter Roberson on 1 Jul 2015
fmts = repmat({'%^[,]'}, 1, 53); % %^[,] includes spaces but %s ends at spaces
fmts([1:28, 30:34, 36, 50, 53]) = {'%g'};
fmt = [fmts{:}];
headers = 1; %maybe 0?
fid = fopen('YouFile.csv','rt');
datacell = textscan(fids, fmt, 'Headerlines', headers, 'Delimiter', ',');
fclose(fid);
and now datacell{1} is the first column, datacell{2} is the second column, and so on. The text columns will be a cell array of strings, one per row.
  2 Comments
Noah
Noah on 1 Jul 2015
Hi Walter,
Thanks for your time on this.
I had success as far as the penultimate command, (datacell = textscan...) At that point I got the error:
Error using textscan
Badly formed format string
If you have suggestions that would be great, but otherwise will keep trying.
Thanks again.
Walter Roberson
Walter Roberson on 1 Jul 2015
Sorry, I had two errors. The correction is
fmts = repmat({'%[^,]'}, 1, 53); % %[^,] includes spaces but %s ends at spaces
fmts([1:28, 30:34, 36, 50, 53]) = {'%f'};

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!