Scanning a text file for bits and pieces of information

1 view (last 30 days)
Hi.
I'm new to matlab and I'm a bit stuck in where to begin with coding a program to open a text file, read specific strings and extract information from specific strings. Here's my general outline and an example of the text format I'm trying to read.
Example format:(wanted information has asterisks on either side of it)
<solution solution_id="telemetry" name="SITE_FRAME" add_date="2004-01-26T12:04:23Z" index1="1">
<reference_frame name="SITE_FRAME" index1=" *0* "/>
<offset x=" *-0.0* " y=" *0.0* " z=" *0.0* "/>
<orientation s="1.0" v1="0.0" v2="0.0" v3="0.0"/>
</solution>
<alias>
<old index1="0" index2="0" index3="1" index4="59"/>
<new index1="1"/>
</alias>
<solution solution_id="telemetry" name="SITE_FRAME" add_date="2004-01-26T12:04:25Z" index1="2">
<reference_frame name="SITE_FRAME" index1=" *1* "/>
<offset x=" *-0.0* " y=" *0.0* " z=" *0.0* "/>
<orientation s="1.0" v1="0.0" v2="0.0" v3="0.0"/>
</solution>
<alias>
<old index1="1" index2="0" index3="0" index4="13"/>
<new index1="2"/>
Outline:
Open file 'mer_site' (that's the file that contains this format of information)
Look for instances of * information
ie the parts that say 'name=SITE_FRAME" index1="*"/>'
export the * information into a vector (total size [1x158])
Look for instances of * information
ie the parts that say '<offset x="*" y="*" z="*"/>'
export the * information into a vector (total size [3x158])
Close File
So far, all I know is that I need
fid=fopen('mer_site')
something with textscan
something about putting the results from textscan into a cell
closing the file
I'm not sure which arguments I need for textscan because the information is mixed.
I'd be super grateful if anyone could help with this!!!
  9 Comments
per isakson
per isakson on 20 Jul 2015
I found an "issue" with mer1_master.
<solution solution_id="telemetry" name="SITE_FRAME" add_date="2013-12-03T17:01:12Z" index1="182">
<reference_frame name="SITE_FRAME" index1="181"/>
....
<solution add_date="2014-02-16T19:02:11Z" index1="183" name="SITE_FRAME" solution_id="telemetry">
<reference_frame index1="182" name="SITE_FRAME"/>
....
The order of the "items" changes after index, 183. I guess that is not significant with an xml-file.
Walter Roberson
Walter Roberson on 20 Jul 2015
The directory above says that the files are indeed XML files.

Sign in to comment.

Accepted Answer

per isakson
per isakson on 20 Jul 2015
Edited: per isakson on 20 Jul 2015
I'm surprised that xml2struct by Wouter Falkena failed with your file. Did it throw any error or warning message?
Instead of trying myself I did an exercise with regular expression.
>> out = cssm('c:\m\cssm\mer1_master.txt')
out =
1x191 struct array with fields:
index1
x
y
z
>> out(5)
ans =
index1: 4
x: 12.3513
y: 4.1437
z: -0.8949
>> out(185)
ans =
index1: 184
x: -438.7025
y: -0.5040
z: -10.6850
>>
where
function out = cssm( filespec )
str = fileread( filespec );
xpr = '(?<=<solution).+?(?=</solution>)';
cac = regexp( str, xpr, 'match' );
%
out = struct( 'index1', num2cell(nan(1,length(cac)))...
, 'x',[], 'y',[], 'z',[] );
%
xpr = cat( 2 ...
, '<reference_frame' ...
, '.*' ...
, ' index1="(?<index1>\d+)"' ...
, '.*' ...
, '/>' ...
, '\s*' ...
, '<offset' ...
, ' x="(?<x>[\-\d\.]+)"' ...
, ' y="(?<y>[\-\d\.]+)"' ...
, ' z="(?<z>[\-\d\.]+)"' ...
, '/>' );
%
for jj = 1 : length( cac )
sas = regexp( cac{jj}, xpr, 'names' );
out(jj).index1 = str2double( sas.index1 );
out(jj).x = str2double( sas.x );
out(jj).y = str2double( sas.y );
out(jj).z = str2double( sas.z );
end
end
Caveat: This function is based on backward engineering of one single file and tested with the same file. It may fail sometime in the future with some other file.
  1 Comment
Elena H.
Elena H. on 21 Jul 2015
This worked!!! Thank you so much! I'm super psyched and will give you credit if I ever publish the results of my data or reference this program. Thanks! I tried using something to process xml files and got an extremely confusing structure that I was trying to sift through, bit by bit. This helped so much!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!