<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/264075</link>
    <title>MATLAB Central Newsreader - reading an annoying ascii text file</title>
    <description>Feed for thread: reading an annoying ascii text file</description>
    <language>en-us</language>
    <copyright>&amp;copy;1994-2012 by MathWorks, Inc.</copyright>
    <webmaster>webmaster@mathworks.com</webmaster>
    <generator>MATLAB Central Newsreader</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <ttl>60</ttl>
    <image>
      <title>MathWorks</title>
      <url>http://www.mathworks.com/images/membrane_icon.gif</url>
    </image>
    <item>
      <pubDate>Sun, 25 Oct 2009 21:45:03 -0400</pubDate>
      <title>reading an annoying ascii text file</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/264075#689616</link>
      <author>Derik </author>
      <description>Dear Sunday readers,&lt;br&gt;
I am trying to read the below file format. I tried textscan but I must be missing things... I either errors or empty cell (I run version7.5.0 2007b)&lt;br&gt;
I have several difficulties as a beginner:&lt;br&gt;
* all these doublequotes seem not to be well understood &lt;br&gt;
* Unfortunately the comma delimiter is also the thousand delimiter&lt;br&gt;
* I would like to have the first line transformed as the variable names of the columns&lt;br&gt;
* I would like to change the date string &quot;MM/DD/YYYY&quot; to matlab dates&lt;br&gt;
* the file is around 7000 lines and 70 variables&lt;br&gt;
&lt;br&gt;
extract of the file:&lt;br&gt;
&quot;Fund_ID&quot;,&quot;Fund&quot;,&quot;Firm&quot;,&quot;Structure&quot;,&quot;Minimum_Investment&quot;,&quot;Additional_Investment&quot;,&quot;Inception&quot;,&quot;Reporting&quot;&lt;br&gt;
&quot;10003&quot;,&quot;Enterprise Fund Ltd. (Class E) - Emerging Markets&quot;,&quot;Advantage Management Limited&quot;,&quot;Corporation&quot;,&quot;10,000&quot;,&quot;&quot;,&quot;06/01/2003&quot;,&quot;Monthly&quot;&lt;br&gt;
&lt;br&gt;
Thank you very much in advance&lt;br&gt;
derik</description>
    </item>
    <item>
      <pubDate>Mon, 26 Oct 2009 03:43:59 -0400</pubDate>
      <title>Re: reading an annoying ascii text file</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/264075#689656</link>
      <author>Doug Schwarz</author>
      <description>In article &amp;lt;hc2gsv$375$1@fred.mathworks.com&amp;gt;,&lt;br&gt;
&amp;nbsp;&quot;Derik &quot; &amp;lt;d.nospam.schupbach@lombardodier.please.com&amp;gt; wrote:&lt;br&gt;
&lt;br&gt;
&amp;gt; Dear Sunday readers,&lt;br&gt;
&amp;gt; I am trying to read the below file format. I tried textscan but I must be &lt;br&gt;
&amp;gt; missing things... I either errors or empty cell (I run version7.5.0 2007b)&lt;br&gt;
&amp;gt; I have several difficulties as a beginner:&lt;br&gt;
&amp;gt; * all these doublequotes seem not to be well understood &lt;br&gt;
&lt;br&gt;
Use the %q format with textscan.&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&amp;gt; * Unfortunately the comma delimiter is also the thousand delimiter&lt;br&gt;
&amp;gt; * I would like to have the first line transformed as the variable names of &lt;br&gt;
&amp;gt; the columns&lt;br&gt;
&lt;br&gt;
Don't do this, it's more trouble than it's worth.  Instead use the &lt;br&gt;
column headers as field names for a structure array.&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&amp;gt; * I would like to change the date string &quot;MM/DD/YYYY&quot; to matlab dates&lt;br&gt;
&amp;gt; * the file is around 7000 lines and 70 variables&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; extract of the file:&lt;br&gt;
&amp;gt; &quot;Fund_ID&quot;,&quot;Fund&quot;,&quot;Firm&quot;,&quot;Structure&quot;,&quot;Minimum_Investment&quot;,&quot;Additional_Investmen&lt;br&gt;
&amp;gt; t&quot;,&quot;Inception&quot;,&quot;Reporting&quot;&lt;br&gt;
&amp;gt; &quot;10003&quot;,&quot;Enterprise Fund Ltd. (Class E) - Emerging Markets&quot;,&quot;Advantage &lt;br&gt;
&amp;gt; Management Limited&quot;,&quot;Corporation&quot;,&quot;10,000&quot;,&quot;&quot;,&quot;06/01/2003&quot;,&quot;Monthly&quot;&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Thank you very much in advance&lt;br&gt;
&amp;gt; derik&lt;br&gt;
&lt;br&gt;
Here's what I would do (assume your data is in a file called derik.dat):&lt;br&gt;
&lt;br&gt;
% Read in entire file.&lt;br&gt;
fid = fopen('derik.dat');&lt;br&gt;
header = textscan(fid,'%q%q%q%q%q%q%q%q',1,'Delimiter',',');&lt;br&gt;
raw = textscan(fid,'%q%q%q%q%q%q%q%q','Delimiter',',');&lt;br&gt;
fclose(fid);&lt;br&gt;
&amp;nbsp;&lt;br&gt;
% Store data in a structure array, data.&lt;br&gt;
fields = [header{:}];&lt;br&gt;
raw_array = [raw{:}];&lt;br&gt;
data = cell2struct(raw_array,fields,2);&lt;br&gt;
&amp;nbsp;&lt;br&gt;
% Convert column 5 (Minimum_investment) from string to numeric.&lt;br&gt;
min_invest_str = {data.(fields{5})};&lt;br&gt;
min_invest = str2double(min_invest_str);&lt;br&gt;
min_invest_cell = num2cell(min_invest);&lt;br&gt;
[data.(fields{5})] = min_invest_cell{:};&lt;br&gt;
&amp;nbsp;&lt;br&gt;
% Convert column 7 (Inception) into date numbers.&lt;br&gt;
date_str = {data.(fields{7})};&lt;br&gt;
date_num = datenum(date_str,'mm/dd/yyyy');&lt;br&gt;
date_num_cell = num2cell(date_num);&lt;br&gt;
[data.(fields{7})] = date_num_cell{:};&lt;br&gt;
&lt;br&gt;
-- &lt;br&gt;
Doug Schwarz&lt;br&gt;
dmschwarz&amp;ieee,org&lt;br&gt;
Make obvious changes to get real email address.</description>
    </item>
    <item>
      <pubDate>Tue, 27 Oct 2009 08:13:04 -0400</pubDate>
      <title>reading an annoying ascii text file</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/264075#689961</link>
      <author>Branko </author>
      <description>&quot;Derik &quot; &amp;lt;d.nospam.schupbach@lombardodier.please.com&amp;gt; wrote in message &amp;lt;hc2gsv$375$1@fred.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt; Dear Sunday readers,&lt;br&gt;
&amp;gt; I am trying to read the below file format. I tried textscan but I must be missing things... I either errors or empty cell (I run version7.5.0 2007b)&lt;br&gt;
&amp;gt; I have several difficulties as a beginner:&lt;br&gt;
&amp;gt; * all these doublequotes seem not to be well understood &lt;br&gt;
&amp;gt; * Unfortunately the comma delimiter is also the thousand delimiter&lt;br&gt;
&amp;gt; * I would like to have the first line transformed as the variable names of the columns&lt;br&gt;
&amp;gt; * I would like to change the date string &quot;MM/DD/YYYY&quot; to matlab dates&lt;br&gt;
&amp;gt; * the file is around 7000 lines and 70 variables&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; extract of the file:&lt;br&gt;
&amp;gt; &quot;Fund_ID&quot;,&quot;Fund&quot;,&quot;Firm&quot;,&quot;Structure&quot;,&quot;Minimum_Investment&quot;,&quot;Additional_Investment&quot;,&quot;Inception&quot;,&quot;Reporting&quot;&lt;br&gt;
&amp;gt; &quot;10003&quot;,&quot;Enterprise Fund Ltd. (Class E) - Emerging Markets&quot;,&quot;Advantage Management Limited&quot;,&quot;Corporation&quot;,&quot;10,000&quot;,&quot;&quot;,&quot;06/01/2003&quot;,&quot;Monthly&quot;&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Thank you very much in advance&lt;br&gt;
&amp;gt; derik&lt;br&gt;
&lt;br&gt;
Another approach using regexp:&lt;br&gt;
&lt;br&gt;
fid = fopen(filename,'rt');&lt;br&gt;
val=textscan(fid,'%s','delimiter','','headerlines', 0);&lt;br&gt;
fclose(fid);&lt;br&gt;
&lt;br&gt;
Header=regexp(val{:}{1},'(\w+)','match'); % Remove all numeric&lt;br&gt;
as=regexprep(val{:}{2},'\d*,\d{3}','${strrep($&amp;,'','','''')}'); % Replace 10,000 with 10000&lt;br&gt;
as=regexprep(as,'\d{2}/\d{2}/\d{4}','${num2str(datenum($&amp;, ''mm/dd/yyyy'')'')}'); %Convert Gregorian tu Julian&lt;br&gt;
as=regexprep(as,'&quot;','');        % Remove double quotes&lt;br&gt;
Data=regexp(as, ',', 'split');   % Split data&lt;br&gt;
Data{5}=str2num(Data{5});   % Convert string to numeric&lt;br&gt;
Data{7}=str2num(Data{7});   % Convert string to numeric&lt;br&gt;
DATA = cell2struct(Data,Header,2);&lt;br&gt;
&lt;br&gt;
Branko</description>
    </item>
  </channel>
</rss>

