Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: how to read text file row/column info
Date: Sun, 16 Mar 2008 00:45:05 +0000 (UTC)
Organization: The MathWorks, Inc.
Lines: 136
Message-ID: <frhqih$8lb$1@fred.mathworks.com>
References: <frgs93$1ov$1@fred.mathworks.com> <47dc2815$0$289$b45e6eb0@senator-bedfellow.mit.edu>
Reply-To: <HIDDEN>
NNTP-Posting-Host: webapp-05-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1205628305 8875 172.30.248.35 (16 Mar 2008 00:45:05 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Sun, 16 Mar 2008 00:45:05 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1316757
Xref: news.mathworks.com comp.soft-sys.matlab:457449



Arthur G <gorramfreak+news@gmail.com> wrote in message 
<47dc2815$0$289$b45e6eb0@senator-bedfellow.mit.edu>...
> On 2008-03-15 12:08:03 -0400, "Bruce Eddy" 
<sailboats@cfl.rr.com> said:
> 
> > "Pekka " <pekka.nospam.kumpulainen@tut.please.fi> wrote 
in
> > message <frg668$im6$1@fred.mathworks.com>...
> >> "Bruce Eddy" <sailboats@cfl.rr.com> wrote in message
> >> <frfjcc$it5$1@fred.mathworks.com>...
> >>> Hi,
> >>> I am trying to read a large .txt file with 5 columns
> > and
> >>> over 12000 rows of data.  I'm having trouble getting
> > the
> >>> (row, column) numbers right.  The code looks like 
this;
> >>> 
> >>> loop = 0;
> >>> for i = 1:2
> >>> line1 = fgets(fid);
> >>> end
> >>> while feof(fid) == 0
> >>> loop = loop+1;
> >>> line1 = fgets(fid);
> >>> mass_prop(loop).name = line1(1, 1: 66);
> >>> mass_prop(loop).matl = line1(1,67: 102);
> >>> mass_prop(loop).volm = line1(1,103: 119);
> >>> mass_prop(loop).dens = line1(1,120:133);
> >>> mass_prop(loop).wght = line1(1,134:149);
> >>> end
> >>> 
> >>> Can anyone help?
> >>> Thanks.
> >> 
> >> Really hard to help much without knowing the structure 
of
> >> your file. Hard coding the indexing is generally not a
> > good
> >> idea. That will fail if any of the lines has different
> >> length in any of the fields.
> >> 
> >> Take a look at
> >> doc textscan
> >> that should read the entire file wthout loops and hard
> >> coded indexing.
> >> 
> > I thought about that after the post.  The file is too 
big
> > to show but it is a mix of text and numbers in a 
columnated
> > form.  The first column is text with various indenting, 
the
> > second column is text, and the last 3 columns are 
numbers.
> > I've seen the limitation of the hard coding but didn't 
know
> > of another way.  My goal is to have a code that will 
read
> > different text files of the same general format.  Thanks
> > for the replies.
> 
> I'm not sure what you mean by "various indenting", and 
that detail 
> could potentially make things more complicated. But one 
approach I've 
> used it to read a line at a time using fgetl, use 
textscan to parse 
> each column into strings (specifying the "Delimiter" as 
whatever 
> separates your columns), and then storing everything in a 
cell. 
> Afterward, I then try to convert every element of the 
cell into a 
> number (using sscanf seems fastest), and replace the 
string with a 
> number if the conversion is successful. That seems to be 
my best 
> compromise between speed and flexibility.
> 
> --Arthur
> 


Here is a sample of the file I'm trying to read, maybe it 
will make more sense.  The info is seperated into columns 
with spaces between, no tabs or commas.  The first column 
is indented in groups (af's) and there are no spaces in the 
text.  The second column is text that has spaces in the 
phrase which complicates it some.  the other columns are 
all numbers.  Ultimately I need to read several files like 
this that get pretty large so I'm hoping to come up with a 
code that will fit all sizes.  After reading the file the 
goal is to process it to eliminate duplicates replacing 
them with a single line and a quantity instead.  Next I 
need to sum one of the columns for all rows containing .prt.
Hopefully this is doable and not too ambitious.

Thanks for the inputs.

af_e_top_TIP_CURVE_WASHER.PRT                 NOT 
ASSIGNED                        0.003586       0.286000    
0.001091     
   af_e_RC_TIP_DBLR_STRIP.PRT                    NOT 
ASSIGNED                        0.040392       0.101000    
0.004080     
   af_e_top_TIP_CARD_CLVS.PRT                    NOT 
ASSIGNED                        0.014307       0.101000    
0.001443     
   af_e_RFRST_RC_LWR_FTG_BALL_AS.ASM            NOT 
ASSIGNED                        4.231699       1.000000    
0.438248     
     af_e_RFRST_RC_LWR_FTG_AS.ASM               NOT 
ASSIGNED                        4.203855       1.000000    
0.430540     
       MU_R_RFRST_RC_LWR_FTG.PRT                NOT 
ASSIGNED                        4.171692       0.101000    
0.437095     
       HDW_MS51830_203.PRT                       NOT 
ASSIGNED                        0.032163       0.286000    
0.004314     
     dt_s_RFRST_TETHER_BAL_af.ASM               NOT 
ASSIGNED                        0.026765       1.000000    
0.007515     
       dt_s_RFRST_TETHER_BALL_SK.PRT            NOT 
ASSIGNED                        0.000000       1.000000    
0.000000     
       dt_s_RFRST_TETHER_BALL.PRT               NOT 
ASSIGNED                        0.026288       0.280000    
0.007361     
       dt_SCD_RFRST_WIRE_ROP_SHT.PRT            NOT 
ASSIGNED                        0.000477       0.286000    
0.000154     
     AF_S_RFRST_T_BALL_SPR_FER.PRT              NOT 
ASSIGNED                        0.000516       0.098000    
0.000051