Path: news.mathworks.com!not-for-mail
From: "Branko " <bogunovic@mbss.org>
Newsgroups: comp.soft-sys.matlab
Subject: reading an annoying ascii text file
Date: Tue, 27 Oct 2009 08:13:04 +0000 (UTC)
Organization: National Institute of Biology
Lines: 33
Message-ID: <hc6a2g$2f7$1@fred.mathworks.com>
References: <hc2gsv$375$1@fred.mathworks.com>
Reply-To: "Branko " <bogunovic@mbss.org>
NNTP-Posting-Host: webapp-02-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1256631184 2535 172.30.248.37 (27 Oct 2009 08:13:04 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Tue, 27 Oct 2009 08:13:04 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 237386
Xref: news.mathworks.com comp.soft-sys.matlab:580275


"Derik " <d.nospam.schupbach@lombardodier.please.com> wrote in message <hc2gsv$375$1@fred.mathworks.com>...
> Dear Sunday readers,
> I am trying to read the below file format. I tried textscan but I must be missing things... I either errors or empty cell (I run version7.5.0 2007b)
> I have several difficulties as a beginner:
> * all these doublequotes seem not to be well understood 
> * Unfortunately the comma delimiter is also the thousand delimiter
> * I would like to have the first line transformed as the variable names of the columns
> * I would like to change the date string "MM/DD/YYYY" to matlab dates
> * the file is around 7000 lines and 70 variables
> 
> extract of the file:
> "Fund_ID","Fund","Firm","Structure","Minimum_Investment","Additional_Investment","Inception","Reporting"
> "10003","Enterprise Fund Ltd. (Class E) - Emerging Markets","Advantage Management Limited","Corporation","10,000","","06/01/2003","Monthly"
> 
> Thank you very much in advance
> derik

Another approach using regexp:

fid = fopen(filename,'rt');
val=textscan(fid,'%s','delimiter','','headerlines', 0);
fclose(fid);

Header=regexp(val{:}{1},'(\w+)','match'); % Remove all numeric
as=regexprep(val{:}{2},'\d*,\d{3}','${strrep($&,'','','''')}'); % Replace 10,000 with 10000
as=regexprep(as,'\d{2}/\d{2}/\d{4}','${num2str(datenum($&, ''mm/dd/yyyy'')'')}'); %Convert Gregorian tu Julian
as=regexprep(as,'"','');        % Remove double quotes
Data=regexp(as, ',', 'split');   % Split data
Data{5}=str2num(Data{5});   % Convert string to numeric
Data{7}=str2num(Data{7});   % Convert string to numeric
DATA = cell2struct(Data,Header,2);

Branko