text file data processing

3 views (last 30 days)
Mohammad
Mohammad on 30 Jan 2015
Edited: Mohammad on 3 Feb 2015
Dear Experts, It would be great help if somebody can help on this.
I have a text file which I want to read. It has couple of measurements data. In the beginning it starts with texts followed by a data set and then text again and another data set and so on. I want to read each data set and I dont care about the text.The text file looks like:
if true
:MSR 2 # No. of measurement in file
:SYS BDS 0 # Beam Data Scanner System
#
# RFA300 ASCII Measurement Dump ( BDS format )
#
...............
..............
!
#
# X Y Z Dose
#
= -123.9 0.0 32.0 11.8
= -123.6 0.0 32.0 11.9
= -123.2 0.0 32.0 12.1
= -122.7 0.0 32.0 12.2
= -122.2 0.0 32.0 12.5
= -121.7 0.0 32.0 12.6
= -121.4 0.0 32.0 12.6
:EOM # End of Measurement
#
# RFA300 ASCII Measurement Dump ( BDS format )
#
# Measurement number 2
#
%VNR 1.0
!
#
# X Y Z Dose
#
= 132.0 0.0 100.0 8.1
= 131.7 0.0 100.0 8.2
= 131.3 0.0 100.0 8.2
= 130.8 0.0 100.0 8.3
= 130.3 0.0 100.0 8.4
= 129.8 0.0 100.0 8.6
= 129.3 0.0 100.0 8.8
= 129.0 0.0 100.0 8.8
= 128.5 0.0 100.0 8.9
= 128.0 0.0 100.0 9.2
= 127.5 0.0 100.0 9.3
= 127.2 0.0 100.0 9.4
:EOM # End of Measurement
:EOF # End of File
end
From this file I want to read the numerical data under the column header, i.e, X Y Z dose for each measurement. Attached is my text file.
I greatly appreciate any help. Thanks. Rafiq
  2 Comments
Hikaru
Hikaru on 30 Jan 2015
There's no file attached. Have you tried the function textscan?
Mohammad
Mohammad on 30 Jan 2015
I am sorry Hikaru. Attached is the file.

Sign in to comment.

Accepted Answer

Stephen23
Stephen23 on 31 Jan 2015
Edited: Stephen23 on 31 Jan 2015
This code works with your original data file (which I uploaded here too). It parses most of the file into a structure, which has size 1-by-(number of measurements). Save the code below in a script and run it:
% Read file into a string:
str = fileread('test.txt');
% Check the number of measurements:
N = sscanf(str,':MSR %d');
[S,E] = regexp(str,'(?<=^# Measurement number).*?^:EOM','lineanchors');
assert(all(N==[numel(S),numel(E)]),'This file is incomplete or corrupted.')
% Preallocate structure:
out(N) = struct();
% Loop over each measurement in the file:
for n = 1:N
sub = str(S(n):E(n));
out(n).num = sscanf(sub,'%d');
% Assign fields and values:
tkn = regexp(sub,'^%(\w{3})([^#]*?)(#\s.*?)?\s*$','lineanchors','tokens');
for m = 1:numel(tkn)
tmp = tkn{m};
out(n).(tmp{1}) = strtrim(tmp{2});
if ~isempty(tmp{3})
out(n).([tmp{1},'_note']) = tmp{3}(3:end);
end
end
% Assign header:
tmp = regexp(sub,'^#(\s+\w+)+\s+#\s+=','tokens','lineanchors','once');
out(n).hdr = regexp(strtrim(tmp),'\s+','split');
% Assign data:
tmp = regexp(sub,'^=\s+(\s+[\d\.-]+)+\s*$','match','lineanchors');
out(n).dat = cell2mat(textscan([tmp{:}],'=%f%f%f%f'));
end
%
Explore the structure in your variable viewer, it should be fairly self-explanatory as it uses the same fieldnames as your data file. There are only three new fields: "num" (measurement number), "hdr" (numeric matrix column headers), and "dat" (numeric matrix).
  4 Comments
Stephen23
Stephen23 on 2 Feb 2015
Edited: Stephen23 on 2 Feb 2015
You can access any of the parameter values by using the fieldname and the out structure, e.g. to get the FSZ values:
>> out(1).FSZ % only the first measurement
>> out.FSZ % all measurements
>> Z = {out.FSZ} % all in a cell array
Note that currently all parameter values are stored as strings. If you wish to convert all of the exclusively numeric parameters to numeric arrays, then you can try this version (I converted the date/time to a datevector too):
% Read file into a string:
str = fileread('test.txt');
% Check the number of measurements:
N = sscanf(str,':MSR %d');
[S,E] = regexp(str,'(?<=^# Measurement number).*?^:EOM','lineanchors');
assert(numel(S)==N&&numel(E)==N,'This file is incomplete or corrupted.')
% Preallocate structure:
out(N) = struct();
% Loop over each measurement in the file:
for n = 1:N
sub = str(S(n):E(n));
out(n).num = sscanf(sub,'%d');
% Assign fields and values:
tkn = regexp(sub,'^%(\w{3})([^#]*?)(#\s.*?)?\s*$','lineanchors','tokens');
for m = 1:numel(tkn)
if ~isempty(tkn{m}{3})
out(n).([tkn{m}{1},'_note']) = tkn{m}{3}(3:end);
end
% Convert to numeric array OR keep string parameter:
tmp = sscanf(tkn{m}{2},'%f',[1,Inf]);
if any(strcmpi(tkn{m}{1},{'DAT','TIM'})) || isempty(tmp)
out(n).(tkn{m}{1}) = strtrim(tkn{m}{2});
else
out(n).(tkn{m}{1}) = tmp;
end
end
% Add timestamp:
out(n).dtv([3,2,1]) = sscanf(out(n).DAT,'%f-%f-%f',[1,Inf]);
out(n).dtv = [out(n).dtv,sscanf(out(n).TIM,'%f:%f:%f',[1,Inf])];
% Assign header:
tmp = regexp(sub,'^#(\s+\w+)+\s+#\s+=','tokens','lineanchors','once');
out(n).hdr = regexp(strtrim(tmp),'\s+','split');
% Assign data:
tmp = regexp(sub,'^=\s+(\s+[\d\.-]+)+\s*$','match','lineanchors');
out(n).dat = cell2mat(textscan([tmp{:}],'=%f%f%f%f'));
end
%
Mohammad
Mohammad on 3 Feb 2015
Edited: Mohammad on 3 Feb 2015
Thank you very much Stephen. This helps a lot !!!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!