Thread Subject: reading an annoying ascii text file

Subject: reading an annoying ascii text file

From: Derik

Date: 25 Oct, 2009 21:45:03

Message: 1 of 3

Dear Sunday readers,
I am trying to read the below file format. I tried textscan but I must be missing things... I either errors or empty cell (I run version7.5.0 2007b)
I have several difficulties as a beginner:
* all these doublequotes seem not to be well understood
* Unfortunately the comma delimiter is also the thousand delimiter
* I would like to have the first line transformed as the variable names of the columns
* I would like to change the date string "MM/DD/YYYY" to matlab dates
* the file is around 7000 lines and 70 variables

extract of the file:
"Fund_ID","Fund","Firm","Structure","Minimum_Investment","Additional_Investment","Inception","Reporting"
"10003","Enterprise Fund Ltd. (Class E) - Emerging Markets","Advantage Management Limited","Corporation","10,000","","06/01/2003","Monthly"

Thank you very much in advance
derik

Subject: reading an annoying ascii text file

From: Doug Schwarz

Date: 26 Oct, 2009 03:43:59

Message: 2 of 3

In article <hc2gsv$375$1@fred.mathworks.com>,
 "Derik " <d.nospam.schupbach@lombardodier.please.com> wrote:

> Dear Sunday readers,
> I am trying to read the below file format. I tried textscan but I must be
> missing things... I either errors or empty cell (I run version7.5.0 2007b)
> I have several difficulties as a beginner:
> * all these doublequotes seem not to be well understood

Use the %q format with textscan.


> * Unfortunately the comma delimiter is also the thousand delimiter
> * I would like to have the first line transformed as the variable names of
> the columns

Don't do this, it's more trouble than it's worth. Instead use the
column headers as field names for a structure array.


> * I would like to change the date string "MM/DD/YYYY" to matlab dates
> * the file is around 7000 lines and 70 variables
>
> extract of the file:
> "Fund_ID","Fund","Firm","Structure","Minimum_Investment","Additional_Investmen
> t","Inception","Reporting"
> "10003","Enterprise Fund Ltd. (Class E) - Emerging Markets","Advantage
> Management Limited","Corporation","10,000","","06/01/2003","Monthly"
>
> Thank you very much in advance
> derik

Here's what I would do (assume your data is in a file called derik.dat):

% Read in entire file.
fid = fopen('derik.dat');
header = textscan(fid,'%q%q%q%q%q%q%q%q',1,'Delimiter',',');
raw = textscan(fid,'%q%q%q%q%q%q%q%q','Delimiter',',');
fclose(fid);
 
% Store data in a structure array, data.
fields = [header{:}];
raw_array = [raw{:}];
data = cell2struct(raw_array,fields,2);
 
% Convert column 5 (Minimum_investment) from string to numeric.
min_invest_str = {data.(fields{5})};
min_invest = str2double(min_invest_str);
min_invest_cell = num2cell(min_invest);
[data.(fields{5})] = min_invest_cell{:};
 
% Convert column 7 (Inception) into date numbers.
date_str = {data.(fields{7})};
date_num = datenum(date_str,'mm/dd/yyyy');
date_num_cell = num2cell(date_num);
[data.(fields{7})] = date_num_cell{:};

--
Doug Schwarz
dmschwarz&ieee,org
Make obvious changes to get real email address.

Subject: reading an annoying ascii text file

From: Branko

Date: 27 Oct, 2009 08:13:04

Message: 3 of 3

"Derik " <d.nospam.schupbach@lombardodier.please.com> wrote in message <hc2gsv$375$1@fred.mathworks.com>...
> Dear Sunday readers,
> I am trying to read the below file format. I tried textscan but I must be missing things... I either errors or empty cell (I run version7.5.0 2007b)
> I have several difficulties as a beginner:
> * all these doublequotes seem not to be well understood
> * Unfortunately the comma delimiter is also the thousand delimiter
> * I would like to have the first line transformed as the variable names of the columns
> * I would like to change the date string "MM/DD/YYYY" to matlab dates
> * the file is around 7000 lines and 70 variables
>
> extract of the file:
> "Fund_ID","Fund","Firm","Structure","Minimum_Investment","Additional_Investment","Inception","Reporting"
> "10003","Enterprise Fund Ltd. (Class E) - Emerging Markets","Advantage Management Limited","Corporation","10,000","","06/01/2003","Monthly"
>
> Thank you very much in advance
> derik

Another approach using regexp:

fid = fopen(filename,'rt');
val=textscan(fid,'%s','delimiter','','headerlines', 0);
fclose(fid);

Header=regexp(val{:}{1},'(\w+)','match'); % Remove all numeric
as=regexprep(val{:}{2},'\d*,\d{3}','${strrep($&,'','','''')}'); % Replace 10,000 with 10000
as=regexprep(as,'\d{2}/\d{2}/\d{4}','${num2str(datenum($&, ''mm/dd/yyyy'')'')}'); %Convert Gregorian tu Julian
as=regexprep(as,'"',''); % Remove double quotes
Data=regexp(as, ',', 'split'); % Split data
Data{5}=str2num(Data{5}); % Convert string to numeric
Data{7}=str2num(Data{7}); % Convert string to numeric
DATA = cell2struct(Data,Header,2);

Branko

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
textscan Derik 2 Nov, 2009 18:08:16
ascii Derik 2 Nov, 2009 18:08:08
regexp Branko 27 Oct, 2009 04:14:08
rssFeed for this Thread
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com