Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Pull out specific numbers from unstructured text file

Subject: Pull out specific numbers from unstructured text file

From: Stan

Date: 7 Feb, 2013 22:57:08

Message: 1 of 14

Here is an unstructured text file (chess_mov.out) that I have:

abc
abc a134
def r5234sdgh
gsgfs 6y856 words

Nmoves=84 (chess win)
Nrequired=101 (chess win maximum moves requested)

ghsdfg564
assdg 656756
text ddg3434t5

I need to:
a. ignore all header lines that come before the line starting with 'Nmoves'. For this part, I need to first search for the line where the string 'Nmoves' occurs and ignore all lines before it. The number of lines before the string 'Nmoves' is 4 for this example only, but it is not always fixed.
b. extract the numbers that come after 'Nmoves' and 'Nrequired'. So, in this case, the numbers I need are 84 and 101.

I have tried this with textscan. The problem is that this is an unstructured text file and textscan is having a lot of problems picking up a fixed pattern.

Question:
How can I extract the numbers that come after 'Nmoves' and 'Nrequired'?

Subject: Pull out specific numbers from unstructured text file

From: dpb

Date: 7 Feb, 2013 23:34:35

Message: 2 of 14

On 2/7/2013 4:57 PM, Stan wrote:
> Here is an unstructured text file (chess_mov.out) that I have:
>
> abc
> abc a134
> def r5234sdgh
> gsgfs 6y856 words
>
> Nmoves=84 (chess win)
> Nrequired=101 (chess win maximum moves requested)
>
> ghsdfg564
> assdg 656756
> text ddg3434t5
>
> I need to:
> a. ignore all header lines that come before the line starting with
> 'Nmoves'. For this part, I need to first search for the line where the
> string 'Nmoves' occurs and ignore all lines before it. The number of
> lines before the string 'Nmoves' is 4 for this example only, but it is
> not always fixed.
> b. extract the numbers that come after 'Nmoves' and 'Nrequired'. So, in
> this case, the numbers I need are 84 and 101.
>
> I have tried this with textscan. The problem is that this is an
> unstructured text file and textscan is having a lot of problems picking
> up a fixed pattern.
>
> Question:
> How can I extract the numbers that come after 'Nmoves' and 'Nrequired'?

fgetl() combined w/ string matching and sscanf()

Or, undoubtedly one could use regexp()

Perl, perhaps, would be the simpler option...

--

Subject: Pull out specific numbers from unstructured text file

From: dpb

Date: 8 Feb, 2013 02:54:22

Message: 3 of 14

On 2/7/2013 5:34 PM, dpb wrote:
> On 2/7/2013 4:57 PM, Stan wrote:
>> Here is an unstructured text file () that I have:
>>
>> abc
>> abc a134
>> def r5234sdgh
>> gsgfs 6y856 words
>>
>> Nmoves=84 (chess win)
>> Nrequired=101 (chess win maximum moves requested)
...

>>
>> Question:
>> How can I extract the numbers that come after 'Nmoves' and 'Nrequired'?
>
> fgetl() combined w/ string matching and sscanf()
>
...

Actually, it's not bad at all w/ straight ol' Matlab i/o...

fid=fopen(....,'rt');
l=' ';
while 1
   l=fgetl(fid);
   if strfind(l,'Nmoves')>0,break,end
end
Nmoves=sscanf(l,'Nmoves=%d');
Nrequired=fscanf(fid,'Nrequired=%d');
fid=fclose(fid);

Wrap in a function and voila! (Did it at command line to check...)

--

Subject: Pull out specific numbers from unstructured text file

From: Stan

Date: 8 Feb, 2013 05:00:20

Message: 4 of 14

It worked exactly as you said. I had to put in a leading space, but no other changes were needed:

> Nmoves=sscanf(l,' Nmoves=%d');
> Nrequired=fscanf(fid,' Nrequired=%d');

------------------X--------------------

Now, there are a few other lines immediately following line lines above. I would like to pull out those lines similarly. The entire file is:

 Nmoves=84 (chess win)
 Nrequired=101 (chess win maximum moves requested)
 TotRL = 0.26918E+08 (total IQ (RaL) in move)

 tolamax= 0.89606 ras
 chess one= 0.25 sh
 chess two = 0.80 sh

 Results for 0.27000E-01RaL moves

 Throw dice = 0.2183E-03+- 0.46E-05 %
 Total throw = 0.2839 +- 0.17E-03 %


So, I've added to your relevant commands as follows:

Nmoves=sscanf(l,' Nmoves=%d');
Nrequired=fscanf(fid,' Nrequired=%d');
Tot_RL=fscanf(fid,' TotRL=%15f');
tolamax=fscanf(fid,' tolamax=%11f');
Moves=fscanf(fid,' Results for%16f');
Throw_dice=fscanf(fid,' Throw dice = %f+-%f');
Total_throw=fscanf(fid,' Total throw = %f +-%f');

Yet, these don't seem to be working. Am I missing something in here, or is something more than this addition required to get the extra numbers TotRL, tolamax, number before "Moves", Throw dice and Total throw?

Subject: Pull out specific numbers from unstructured text file

From: dpb

Date: 8 Feb, 2013 13:47:15

Message: 5 of 14

On 2/7/2013 11:00 PM, Stan wrote:
> It worked exactly as you said. I had to put in a leading space, but no
> other changes were needed:
>
>> Nmoves=sscanf(l,' Nmoves=%d');
>> Nrequired=fscanf(fid,' Nrequired=%d');
>
> ------------------X--------------------
>
> Now, there are a few other lines immediately following line lines above.
> I would like to pull out those lines similarly. The entire file is:
>
> Nmoves=84 (chess win)
> Nrequired=101 (chess win maximum moves requested)
> TotRL = 0.26918E+08 (total IQ (RaL) in move)
>
> tolamax= 0.89606 ras
> chess one= 0.25 sh
> chess two = 0.80 sh
>
> Results for 0.27000E-01RaL moves
>
> Throw dice = 0.2183E-03+- 0.46E-05 %
> Total throw = 0.2839 +- 0.17E-03 %
>
>
> So, I've added to your relevant commands as follows:
>
> Nmoves=sscanf(l,' Nmoves=%d');
> Nrequired=fscanf(fid,' Nrequired=%d');
> Tot_RL=fscanf(fid,' TotRL=%15f');
> tolamax=fscanf(fid,' tolamax=%11f');
> Moves=fscanf(fid,' Results for%16f');
> Throw_dice=fscanf(fid,' Throw dice = %f+-%f');
> Total_throw=fscanf(fid,' Total throw = %f +-%f');
>
> Yet, these don't seem to be working. Am I missing something in here, or
> is something more than this addition required to get the extra numbers
> TotRL, tolamax, number before "Moves", Throw dice and Total throw?

Yeah, the shortcut I took doesn't go to the next line after the fscanf()
call. Since it's so irregular, to get more lines that are
near-contiquous (assuming the number of blank lines is also fixed) I'd
just continue w/ the fgetl() tack instead of writing explicit number of
fields...sotoo

l=fgetl(fid);
Nreq=sscanf(l,' Nrequired= %d');
l=fgetl(fid);
Tot_RL=sscanf(l,' TotRL= %f');
l=fgetl(fid);
tolamax=sscanf(l,' tolamax= %f');
l=fgetl(fid);
l=fgetl(fid);
Moves=sscanf(l,' Results for %f');

etc...

NB you've got to read past the blank line(s), too...

You should be able to get fscanf() (or even textscan() ) to work as well
w/ the inclusion of counting the fields and a %*[^\n] string to skip the
\n characters (be sure to use 'rt' of the fopen in Windows to ensure the
proper NL characters are recognized assuming the file also comes from
Windows) but it's simpler to just let fgetl() take care of each reading
getting to the next record imo...

Untested, salt to suit...

--

Subject: Pull out specific numbers from unstructured text file

From: Stan

Date: 8 Feb, 2013 19:51:08

Message: 6 of 14

Ok, I get the idea. Here is what I tried:

fid=fopen(....,'rt');
l=' ';
while 1
   l=fgetl(fid);
   if strfind(l,' Nmoves')>0,break,end
end
Nmoves=sscanf(l,' Nmoves=%d');
Nrequired=fscanf(fid,' Nrequired=%d'); %up until here, no changes
l=fgetl(fid); %added this line
Tot_RL=sscanf(l,' TotRL= %f'); %added this line
fid=fclose(fid);

I am getting:
> Tot_RL =

           [ ]

Is something missing with the second fgetl() line?

Subject: Pull out specific numbers from unstructured text file

From: Stan

Date: 9 Feb, 2013 17:12:09

Message: 7 of 14

^^^Could the above problem be because I need to open and close a file repeatedly? I am not doing this in the code. If this is the case, wouldn't this be rather time consuming just to get those additional numbers, compared to what was required to get the first two?

Subject: Pull out specific numbers from unstructured text file

From: dpb

Date: 9 Feb, 2013 17:36:59

Message: 8 of 14

On 2/9/2013 11:12 AM, Stan wrote:
> ^^^Could the above problem be because I need to open and close a file
> repeatedly?...

No.

It's as I explained previously--the fscanf() only processes as much as
needed to satisfy the format string. You have to get each record
(including blank ones) fully to progress from one to the next.

Work thru at the command line and examine the results and it should
become clear what happens (and thereby what is needed to fix it). This
should be a learning experience here, too... :)

--

Subject: Pull out specific numbers from unstructured text file

From: Stan

Date: 9 Feb, 2013 19:12:10

Message: 9 of 14

^^^^^Okay I think I don't understand Lines 5,7,8 in your shortcut code:

Line 1: > fid=fopen(....,'rt');
Line 2: > l=' ';
Line 3: > while 1
Line 4: > l=fgetl(fid);
Line 5: > if strfind(l,'Nmoves')>0,break,end
Line 6: > end
Line 7: > Nmoves=sscanf(l,'Nmoves=%d');
Line 8: > Nrequired=fscanf(fid,'Nrequired=%d');
Line 9: > fid=fclose(fid);

My explanation is:

while 1
.
.
.
end

This is for lines 4-6 and this reads the file. If fgetl encounters the end-of-file indicator, it returns -1. So, as long as it returns 1 (i.e. anywhere before the end of the file), this statement is saying the while loop should perform the actions inside the if statement.

My explanation for line 5:
If 'Nmoves' is found in the string l (where l is the contents of the file that have been read up to that point) then stop reading at that line.

My explanations for lines 7 and 8:
7: Scan l for 'Nmoves=%d'.
8. Scan fid for 'Nrequired=%d'.

Questions:
In line 8, why did you change from l to fid?
What is the connection between line 5 and lines 7,8?
How does it know, after line 5 (i.e. after reaching the end of the line containing Nmoves), that it needs to search for the next two lines?

Subject: Pull out specific numbers from unstructured text file

From: Nasser M. Abbasi

Date: 9 Feb, 2013 20:53:35

Message: 10 of 14

On 2/9/2013 1:12 PM, Stan wrote:
> ^^^^^Okay I think I don't understand Lines 5,7,8 in your shortcut code:
>
> Line 1: > fid=fopen(....,'rt');
> Line 2: > l=' ';
> Line 3: > while 1
> Line 4: > l=fgetl(fid);

do not use l as variable name as hard to read from 1.
use L and not l.

--Nasser

Subject: Pull out specific numbers from unstructured text file

From: dpb

Date: 9 Feb, 2013 21:50:08

Message: 11 of 14

On 2/9/2013 1:12 PM, Stan wrote:
> ^^^^^Okay I think I don't understand Lines 5,7,8 in your shortcut code:
>
> Line 1: > fid=fopen(....,'rt');
> Line 2: > l=' ';
> Line 3: > while 1
> Line 4: > l=fgetl(fid);
> Line 5: > if strfind(l,'Nmoves')>0,break,end
> Line 6: > end
> Line 7: > Nmoves=sscanf(l,'Nmoves=%d');
> Line 8: > Nrequired=fscanf(fid,'Nrequired=%d');
> Line 9: > fid=fclose(fid);
>
> My explanation is:
>
> while 1
> .
> .
> .
> end
>
> This is for lines 4-6 and this reads the file. If fgetl encounters the
> end-of-file indicator, it returns -1. So, as long as it returns 1 (i.e.
> anywhere before the end of the file), this statement is saying the while
> loop should perform the actions inside the if statement.

Not quite--the '1' in the WHILE construct is a constant and never
changes--only finding the string 'Nmoves=' somewhere in the file will
break the loop.

The condition in the WHILE would have to be something on the variable l
after returned by fgetl() if it were to have any effect. I chose to not
do that 'cuz I presumed you'd only use this on an appropriate file and
it would take reading the first line outside the loop or to otherwise
initialize the loop at the beginning. An alternate that would be a
little cleaner in case the string weren't to be in the file would be to
use while ~feof(fid) which would at least die gracefully on the EOF
(eventually).

> My explanation for line 5:
> If 'Nmoves' is found in the string l (where l is the contents of the
> file that have been read up to that point) then stop reading at that line.

Essentially--it breaks the loop having found the desired string and
therefore the first line to parse (on the assumption the string pattern
only exists for the line desired or at least it is the first
occurrence). At that point 'l' holds the content of the line read--the
strfind() simply scans the content for a match and returns.

>
> My explanations for lines 7 and 8:
> 7: Scan l for 'Nmoves=%d'.
> 8. Scan fid for 'Nrequired=%d'.

Well, depends on what you mean by "scan" -- they both do input
conversion matching the formatting string according to the rules
therefore. The rule for a literal string is to match that string in the
input and essentially ignore those matching characters. %d is to
convert a field as decimal number. sscanf() works from a string
variable ('l' in this case which we filled w/ the desired line from the
file previously so now we're getting the desired value to a variable)
while fscanf takes input from the file which has been connected via
fopen() and associated w/ a valid file handle (fid is just a convenient
variable name for that).

> Questions:
> In line 8, why did you change from l to fid?

Because we need to scan another line and it's done w/ one source code
line directly from the file via fscanf() whereas we had used fgetl() to
suck up a record in its entirety before while search for the target
first line. By your file, the next line was the location for the next
value wanted so didn't need any more searching to find another randomly
place record--it was given to be the next.

> What is the connection between line 5 and lines 7,8?
> How does it know, after line 5 (i.e. after reaching the end of the line
> containing Nmoves), that it needs to search for the next two lines?

You described the file format and said the next line after the one
containing "Nmoves" was the next desired field to be parsed.

You still don't seem to grasp that the fgetl() reads a record including
the \n (newline) and returned that in the character variable 'l' and the
first sscanf() is parsing that string--nothing else has happened in the
file at that point (after the sscanf() that is). _THEN_, we went back
to the file and got as much of the next record as required to get the
next variable by the use of fscanf().

fscanf(), however, unlike fgetl() does _NOT_ automagically read the
entire record _UNLESS_ and _IFF_ the format string provided tells it to
do that. Your initial description didn't say anything about reading
anything except these two values so I did just that--read records until
found the first one desired, then read just what was needed to get the
variable value requested from the following record. Period. End of
story. That's why later when you came back and said "Oh, that's not the
end of what's needed" I said what I gave you was a shortcut specifically
for the first problem outlined.

Now, the problem is that to read the rest of the desired records you've
got to either write specific formatting strings to handle them (a pita
since they're not symmetric in much of any useful way) to continue on w/
fscanf() (and including the fact that the file position marker is in the
middle of the Nrequired record as above).

So, as noted in my previous response, given you want to do the other
stuff I'd suggest it's simpler to revert to fgetl/sscanf pairs.

Again, take the sample code and your example file and just type the
while loop in at the command line and look at what the contents of 'l'
are and then what happens if you follow the fscanf() call w/ a fgetl()
to understand the difference...

Also read

doc fscanf
doc fgetl

and friends carefully...

--

Subject: Pull out specific numbers from unstructured text file

From: dpb

Date: 9 Feb, 2013 22:38:02

Message: 12 of 14

On 2/9/2013 3:50 PM, dpb wrote:
> On 2/9/2013 1:12 PM, Stan wrote:
>> ^^^^^Okay I think I don't understand Lines 5,7,8 in your shortcut code:
>>
>> Line 1: > fid=fopen(....,'rt');
>> Line 2: > l=' ';
>> Line 3: > while 1
>> Line 4: > l=fgetl(fid);
>> Line 5: > if strfind(l,'Nmoves')>0,break,end
>> Line 6: > end
>> Line 7: > Nmoves=sscanf(l,'Nmoves=%d');
>> Line 8: > Nrequired=fscanf(fid,'Nrequired=%d');
>> Line 9: > fid=fclose(fid);
>>
...

>> Questions:
>> In line 8, why did you change from l to fid?
>
...

> You still don't seem to grasp that the fgetl() reads a record including
> the \n (newline) and returned that in the character variable 'l' and the
> first sscanf() is parsing that string--nothing else has happened in the
> file at that point (after the sscanf() that is). _THEN_, we went back to
> the file and got as much of the next record as required to get the next
> variable by the use of fscanf().
>
...

Also intended to point out explicitly you're seeming to overlook that
fgetl() uses fid to get l -- the same fid as in fscanf(). One has to
read the file one place or the other--in one case it isn't being parsed
simply characters stored whereas the other is converting to numeric
internal form for a given format while reading...

--

Subject: Pull out specific numbers from unstructured text file

From: Stan

Date: 11 Feb, 2013 15:26:14

Message: 13 of 14

Update:

I used the fgetl() and scanf() commands as you had done earlier. I inserted searches for blank lines as required. It worked as you indicated.

Thanks for all your assistance. The explanations were very useful and corrected the way I was thinking.

Subject: Pull out specific numbers from unstructured text file

From: ba

Date: 8 Aug, 2014 10:51:11

Message: 14 of 14

I'm also reading such kind of file with different formats. Any help would be appreciated.

# Bundle file v0.3
269 31913
3.0107383211e+003 1.2457159171e-001 -3.2061110722e-001
6.3846423609e-001 -2.0702958325e-002 7.6937299585e-001
-3.4926013391e-001 8.8299277113e-001 3.1359388546e-001
-6.8584311478e-001 -4.6892979608e-001 5.5652858709e-001
8.8146166784e+001 1.0107379995e+001 2.3539905552e+001
- - - - - - - - - - - - - - - - - - - - - - - -- - --- -- - - - - - - - - - - - - - - -
7.2331177348e+000 2.8864575461e+001 -1.0624011395e+002
10 10 18
103 80 2114 665.0500 74.7900 81 2646 629.6399 52.3000 76 2086

so, i need to parse this file starting with the no. of cameras i.e. 269.
in the 2nd line : <focal length> <radial_distortion1> <radial_distortion2>
3rd to 5th line is a 3x3 matrix representing camera rotation
6th line is a 3-vector describing camera translation

this continues until the no. of line is 5 times the no. of cameras ( for eg in this case it would be 1345)

then from line no. 1346, the data is in other format :
line 1 : 3-vector describing the 3d position
line 2 : 3-vector describing rgb color
line 3: view list of unknown length.

so i'm not getting how to read the data of this format.
can anyone suggest anything ?

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us