Parsing a file with comments and several blocks

6 views (last 30 days)
Hello, I'm fairly new to Matlab and I'm having some trouble parsing a complex text file. The structure is as followed:
;I-333-PVC Reload
;38mm/36 Inch Hardware
;Uses Fast Nozzle
;460cc Nitrous Oxide
I333 38 914.4 0 0.0679998 0.929 Contrail
0.00894855 855.427
0.0290828 881.09
0.0536913 504.702
0.604027 342.171
0.796421 461.931
1.7 0
;
;I-400-HP
;38mm/36 Inch Hardware
;Uses Fast/X-Fast Nozzle
;460cc Nitrous Oxide
I400 38 914.4 0 0.0860002 0.925 Contrail
0.00447427 667.233
0.0782998 898.199
0.116331 598.799
0.297539 521.811
0.420582 410.605
0.559284 487.594
0.738255 367.834
1 0
;
;
The file is a repetition of hundreds of these blocks of data, where ';' are comments that I don't care about. (For those curious, this is a collection of motor files for high powered hobby rockets).
Knowns: The first part of the block is the first line after a comment. The first line is a list of length 7 with the data types '%s %f %f %s %f %f %s'. After the header line, it's repeating sets of %f (arbitrary number) up until the next comment ';'.
I was looking at using
A = textscan(fn,'CommentStyle',';')
or
A = importdata(fn);
with less than desirable results.
Probably the ideal way that this data will be organized is one two dimensional vector containing vectors of 7 cells (headers), and another vector of two dimensional arrays containing the time and thrust data (columns one and two respectively). Not sure if this entirely makes sense in Matlab, but it's how I would structure it in Python. I was also trying to find a way to use regular expressions to parse each block (in python because I'm much more familiar with it), but couldn't come up with a good expression that would split each block well.
I was able to successfully parse this data when it was the file contained just one motor file:
function [time, thrust, designations] = parse_motor(fn)
A = importdata(fn);
time = A.data(:,1);
thrust = A.data(:,2);
raw_designations = strsplit(char(A.textdata(end)),' ');
% designations = zeros(1,length(raw_designations))
designations = {1, 2, 3, 4, 5, 6, 7};
x = 0;
% conversions to correct data type
for i = raw_designations
x = x + 1;
% true if the string is not a number
if isequal(str2num(i{1}),[])
designations{x} = i{1};
else
designations{x} = str2num(i{1});
end
end
end
(probably a much better way to correctly set the correct datatype for the 'designation' headerline as well). At this point I'm going to try to develop a RE that will parse each block into a header row and data, and then feed each result into the parse_motor function (modified to not take in the filename). I'm sure that there are more elegant ways to do it though.
Any help would be greatly appreciated, Thanks, Adam

Accepted Answer

per isakson
per isakson on 29 May 2017
Edited: per isakson on 29 May 2017
"... a RE that will parse each block into a header row and data ..." &nbsp
str = fileread( 'cssm.txt'); % cssm.txt contains your sample data
cac = regexp( str, '(?m)^[^;]+(?=;)', 'match' );
and check what's in cac
>> header = textscan( cac{1}, '%s', 1, 'Delimiter', '\n' );
>> header{1}
ans =
'I333 38 914.4 0 0.0679998 0.929 Contrail '
>> data = textscan( cac{1}, '%f%f', 'Headerlines', 1, 'CollectOutput',true );
>> data{1}
ans =
0.0089 855.4270
0.0291 881.0900
0.0537 504.7020
0.6040 342.1710
0.7964 461.9310
1.7000 0
Caveat: I don't understand "arbitrary" in "it's repeating sets of %f (arbitrary number)..." &nbsp Undocumented
data = textscan( cac{1}, '', 'Headerlines',1, 'CollectOutput',true );
reads any number of columns
  1 Comment
Adam Poirier
Adam Poirier on 29 May 2017
Works beautifully, thanks for the help! Seems I need to brush up on my regexps a bit, I was unaware of the [^;] usage.

Sign in to comment.

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!