File Exchange

image thumbnail

csv2cell

version 1.0 (6.52 KB) by

Parse CSV files from Excel, mixed data types, to cell array.

3.88889
8 Ratings

43 Downloads

Updated

No License

% CSV2CELL - parses a Windows CSV file into an NxM cell array, where N is
% the number of lines in the CSV text and M is the number of fields in the
% longest line of the CSV file. Lines are delimited by carriage returns
% and/or newlines.
%
% A Windows CSV file format allows for commas (,) and double quotes (") to
% be contained within fields of the CSV file. Regular fields are just text
% separated by commas (e.g. foo,bar,hello world). Fields containing commas
% or double quotes are surrounded by double quotes (e.g.
% foo,bar,"item1,item2,item3",hello world). In the previous example,
% "item1,item2,item3" is one field in the CSV text. For double quotes to be
% represented, they are written in pairs in the file, and contained within
% a quoted field, (e.g. foo,"this field contains ""quotes""",bar). Spaces
% within fields (even leading and trailing) are preserved.
%
% All fields from the CSV file are returned as strings. If the CSV text
% contains lines with different numbers of fields, then the "missing"
% fields with appear as empty arrays, [], in the returned data. You can
% easily convert the data you expect to be numeric using str2num() and
% num2cell().
%
% Examples:
% >> csv2cell('foo.csv','fromfile') % loads and parses entire file
% >> csv2cell(',,,') % returns cell array {'','','',''}
% >> csv2cell(',,,','text') % same as above, declaring text input
% >> csv2cell(sprintf('%s\r\n%s',...
% '"Ten Thousand",10000,,"10,000","""It''s ""10 Grand"", baby",10k',...
% ',foo,bar,soo'))
% ans =
% 'Ten Thousand' '10000' '' '10,000' [1x22 char] '10k'
% '' 'foo' 'bar' 'soo' [] []
% >> % note the two empty [] cells, because the second line has two fewer
% >> % fields than the first. The empty field '' at the beginning of the
% >> % second line is due to the leading comma on that line, which is
% >> % correct behavior. A trailing comma will do the same to the end of a
% >> % line.
%
% Limitations/Exceptions:
% * This code is untested on large files. It may take a long time due to
% variables growing inside loops (yes, poor practice, but easy coding).
% * This code has been minimally tested to work with a variety of weird
% Excel files that I have.
% * Behavior with improperly formatted CSV files is untested.
% * Technically, CSV files from Excel always separate lines with the pair
% of characters \r\n. This parser will also separate lines that have only
% \r or \n as line terminators.
% * Line separation is the first operation. I don't think the Excel CSV
% format has any allowance for newlines or carriage returns within
% fields. If it does, then this parser does not support it and would not
% return bad output.

Comments and Ratings (16)

Seems to work just fine (and relatively quickly) with large files that end up producing arrays with over 10,000 rows and dozens of columns. Thanks!

Richard Scott

Thank you. This function helped me parse CSV data I extracted from an element in a XML file.

grega

grega (view profile)

Jerry

Jerry (view profile)

eric larrieux

the function interprets NaNs as character strings rather than doubles... is there any simple work around to fix this?

eric larrieux

Ivan Brezani

works wxcellent for me, thanks!

Arthur Hebert

Arthur Hebert (view profile)

Hi Luigi,

After a bit of searching, it looks like the 'split' argument that is built-in to the regexp command was introduced in 2007b.

That line of code is splitting the file into a cell array of lines, which are then split into tokens based on other delimiters (commas, quotes, etc.).

I see that your comment was made six months ago. If you still want to fix it, I can probably help with a solution that works on older versions of MATLAB.

It would be an excellent tool, but... I tried to use it and I got the following message:

>> csv2cell('cfg_param.csv','fromfile')
??? Error using ==> regexp
Invalid option for regexp: split.

Error in ==> csv2cell at 76
lines = regexp(text,'(\r\n|[\r\n])','split'); % lines should now be a cell array of text split by newlines

I am using MATLAB 7.0.1. Is there something wrong with the regexp function that I have installed? Is it a built-in function? Obviously if i cut the 'split' option, than lines is not a cell array and I got another error below...

Robin

Robin (view profile)

ci xiang

Arthur Hebert

I hadn't noticed that XLSREAD would take CSV files, which is helpful. I'm not sure if you need MS Office installed as well for the COM server to work in 'raw' mode. I don't have any Windows machines without Office to test that.

CSV2CELL remains valuable in my mind for dealing simply with a ubiquitous file format in a platform independent way.

Dimitri Shvorob

Hmm.. XLSREAD does read CSV files, and I have not realized that the 3-output syntax of XLSREAD - [numeric,text,raw], with 'raw' a cell array containing all data - depended on the ability to start a COM server, although, as a Windows user, I might take it for granted.

Arthur Hebert

Hi Dimitri,

Both XLSREAD and CSVREAD deal poorly with mixed data types (i.e. text and numeric) in a file. CSVREAD only returns numeric data, and XLSREAD returns text and numeric data separately, unless your machine has the ability to start a COM server. Furthermore, XLSREAD works on Excel files directly and not CSV files.

Although this code specifically parses the Microsoft CSV format (quote delimiters, etc.), it also works on CSV files from most other sources. CSV2CELL is a more general solution than XLSREAD.

Dimitri Shvorob

Why not just use XLSREAD? (And there's a CSVREAD, right?)

MATLAB Release
MATLAB 7.6 (R2008a)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video