No BSD License  

4.83333

4.8 | 6 ratings Rate this file 31 Downloads (last 30 days) File Size: 6.52 KB File ID: #20836

csv2cell

by

 

23 Jul 2008 (Updated )

Parse CSV files from Excel, mixed data types, to cell array.

| Watch this File

File Information
Description

% CSV2CELL - parses a Windows CSV file into an NxM cell array, where N is
% the number of lines in the CSV text and M is the number of fields in the
% longest line of the CSV file. Lines are delimited by carriage returns
% and/or newlines.
%
% A Windows CSV file format allows for commas (,) and double quotes (") to
% be contained within fields of the CSV file. Regular fields are just text
% separated by commas (e.g. foo,bar,hello world). Fields containing commas
% or double quotes are surrounded by double quotes (e.g.
% foo,bar,"item1,item2,item3",hello world). In the previous example,
% "item1,item2,item3" is one field in the CSV text. For double quotes to be
% represented, they are written in pairs in the file, and contained within
% a quoted field, (e.g. foo,"this field contains ""quotes""",bar). Spaces
% within fields (even leading and trailing) are preserved.
%
% All fields from the CSV file are returned as strings. If the CSV text
% contains lines with different numbers of fields, then the "missing"
% fields with appear as empty arrays, [], in the returned data. You can
% easily convert the data you expect to be numeric using str2num() and
% num2cell().
%
% Examples:
% >> csv2cell('foo.csv','fromfile') % loads and parses entire file
% >> csv2cell(',,,') % returns cell array {'','','',''}
% >> csv2cell(',,,','text') % same as above, declaring text input
% >> csv2cell(sprintf('%s\r\n%s',...
% '"Ten Thousand",10000,,"10,000","""It''s ""10 Grand"", baby",10k',...
% ',foo,bar,soo'))
% ans =
% 'Ten Thousand' '10000' '' '10,000' [1x22 char] '10k'
% '' 'foo' 'bar' 'soo' [] []
% >> % note the two empty [] cells, because the second line has two fewer
% >> % fields than the first. The empty field '' at the beginning of the
% >> % second line is due to the leading comma on that line, which is
% >> % correct behavior. A trailing comma will do the same to the end of a
% >> % line.
%
% Limitations/Exceptions:
% * This code is untested on large files. It may take a long time due to
% variables growing inside loops (yes, poor practice, but easy coding).
% * This code has been minimally tested to work with a variety of weird
% Excel files that I have.
% * Behavior with improperly formatted CSV files is untested.
% * Technically, CSV files from Excel always separate lines with the pair
% of characters \r\n. This parser will also separate lines that have only
% \r or \n as line terminators.
% * Line separation is the first operation. I don't think the Excel CSV
% format has any allowance for newlines or carriage returns within
% fields. If it does, then this parser does not support it and would not
% return bad output.

Acknowledgements

This file inspired Flotation Kinetics Equation Fitting and Rosin Rammler Diagram Plotting Tool.

MATLAB release MATLAB 7.6 (R2008a)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (13)
10 Dec 2014 Fernando Esteves  
25 Jun 2014 Jerry  
22 Feb 2012 eric larrieux

the function interprets NaNs as character strings rather than doubles... is there any simple work around to fix this?

22 Feb 2012 eric larrieux  
29 Aug 2010 Ivan Brezani

works wxcellent for me, thanks!

24 Jun 2010 Arthur Hebert

Hi Luigi,

After a bit of searching, it looks like the 'split' argument that is built-in to the regexp command was introduced in 2007b.

That line of code is splitting the file into a cell array of lines, which are then split into tokens based on other delimiters (commas, quotes, etc.).

I see that your comment was made six months ago. If you still want to fix it, I can probably help with a solution that works on older versions of MATLAB.

26 Jan 2010 Luigi Mazzalai

It would be an excellent tool, but... I tried to use it and I got the following message:

>> csv2cell('cfg_param.csv','fromfile')
??? Error using ==> regexp
Invalid option for regexp: split.

Error in ==> csv2cell at 76
lines = regexp(text,'(\r\n|[\r\n])','split'); % lines should now be a cell array of text split by newlines

I am using MATLAB 7.0.1. Is there something wrong with the regexp function that I have installed? Is it a built-in function? Obviously if i cut the 'split' option, than lines is not a cell array and I got another error below...

17 Dec 2009 Robin  
11 Jan 2009 ci xiang  
06 Aug 2008 Arthur Hebert

I hadn't noticed that XLSREAD would take CSV files, which is helpful. I'm not sure if you need MS Office installed as well for the COM server to work in 'raw' mode. I don't have any Windows machines without Office to test that.

CSV2CELL remains valuable in my mind for dealing simply with a ubiquitous file format in a platform independent way.

29 Jul 2008 Dimitri Shvorob

Hmm.. XLSREAD does read CSV files, and I have not realized that the 3-output syntax of XLSREAD - [numeric,text,raw], with 'raw' a cell array containing all data - depended on the ability to start a COM server, although, as a Windows user, I might take it for granted.

28 Jul 2008 Arthur Hebert

Hi Dimitri,

Both XLSREAD and CSVREAD deal poorly with mixed data types (i.e. text and numeric) in a file. CSVREAD only returns numeric data, and XLSREAD returns text and numeric data separately, unless your machine has the ability to start a COM server. Furthermore, XLSREAD works on Excel files directly and not CSV files.

Although this code specifically parses the Microsoft CSV format (quote delimiters, etc.), it also works on CSV files from most other sources. CSV2CELL is a more general solution than XLSREAD.

26 Jul 2008 Dimitri Shvorob

Why not just use XLSREAD? (And there's a CSVREAD, right?)

Contact us