Reading mixed format data from '.txt' file in matlab
Show older comments
george
23.91 29.70 19.08 48.00 23.33 10.25 2.28 3.36
23.84 29.88 19.78 48.75 21.76 8.30 2.62
0.08 -0.18 -0.70 -0.75 1.57 1.94 -0.34 5.6 2.3 4.9 5.68
sams
18.90 29.30 15.12 43.20 19.71 8.87 2.22
18.76 31.28 15.15 50.18 16.15 5.96 2.71 21.76 8.30 2.62
0.14 -1.98 -0.03 -6.98 3.56 2.91 -0.49
peter
22.71 78.30 18.27 82.90 21.28 36.08 0.59
21.60 73.83 17.03 84.30 20.11 39.14 0.51
1.10 4.47 1.24 -1.40 1.17 -3.07 0.08
jack
18.56 40.70 14.85 45.30 19.13 11.34 1.69 78.30 18.27 82.90
19.12 26.06 15.30 47.38 16.90 5.71 2.96
-0.56 14.64 -0.45 -2.08 2.23 5.63 -1.27
This is a sample. I want to know how to read the data, when we have different lines and different formats of data.
Thank you for you time,
Ashok.
2 Comments
Walter Roberson
on 16 Aug 2020
Is the number of numeric lines between names always the same?
I notice that the numer of numeric items is not the same for every line. Do you want it to be loaded in as a cell array with a vector for every line, so that the length of the lines can be preserved? Do you want shorter lines to be padded out with zeros so that every line is stored as the same length? Do you want shorter lines to be padded with NaN?
For the above sample, what output would you want?
JAMMI ASHOK
on 16 Aug 2020
Accepted Answer
More Answers (1)
per isakson
on 17 Aug 2020
Edited: per isakson
on 18 Aug 2020
Here is an alternative
Running
>> out = cssm('cssm.txt')
outputs
out =
struct with fields:
george: [3×11 double]
sams: [3×10 double]
peter: [3×7 double]
jack: [3×10 double]
>> out.peter
ans =
22.71 78.3 18.27 82.9 21.28 36.08 0.59
21.6 73.83 17.03 84.3 20.11 39.14 0.51
1.1 4.47 1.24 -1.4 1.17 -3.07 0.08
>> out.peter(2,4)
ans =
84.3
>> out.sams(:,6:end)
ans =
8.87 2.22 NaN NaN NaN
5.96 2.71 21.76 8.3 2.62
2.91 -0.49 NaN NaN NaN
where
function out = cssm( ffs )
%%
chr = fileread( ffs );
% getting rid of carrige return simplifies the following code
chr = strrep( chr, char(13), '' );
% Split the text string with the a single name on a row as delimiter.
% Convert from class char to string, because I want to use strings.
% Allow name be a valid Matlab variable name; allow trailing space (\x20)
[ data, names ] = strsplit( string(chr), '(?m)^[a-zA-Z]\w+\x20*$' ...
, 'DelimiterType','RegularExpression' );
% Since the files starts with a delimiter (name) there will be a leading
% empty data block. Delete it.
data(1) = [];
% The first character of the data blocks will be newline. Skip it.
% Would "extractAfter(data,1)" be better?
data = extractAfter( data, newline );
%%
% The lines of a data block contains different numbers of columns. One way
% to cope with this is to add many empty columns and read the fithteen first
% columns. textscan() can handle too many but not too few (I thought).
data = strrep( data, newline, ",,,,,,,,,,,,,,,"+newline );
for jj = 1 : numel(names)
% Read the fithteen first columns and skip the rest. Fithteen is a
% magic nymber that I chose. Using both white-space and comma as
% delimiter seems to work fine. However, I'm not sure whether the
% documentations says so.
num = textscan( data(jj), '%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%*[^\n]' ...
, 'Delimiter',{','}, 'Collectoutput',true );
num = num{1};
% Delete columns with only NaNs (and hope such columns cannot occur
% intentionally).
num( :, all(isnan(num),1) ) = [];
% Assign the result of a block to an output structure.
out(1).(names(jj)) = num;
end
end
In response to comments
I've modified my function with the goal
- It shall be straight forward to add code to handle new types of blocks without breaking the handling of existing types
And I've added the text, "<missing>" to the jack block of the file cssm.txt. Now
>> out = cssm('cssm.txt');
>> out.sams(:,6:end)
ans =
8.87 2.22 NaN NaN NaN
5.96 2.71 21.76 8.3 2.62
2.91 -0.49 NaN NaN NaN
>> out.jack
ans =
" 18.56 40.70 14.85 45.30 19.13 11.34 1.69 78.30 18.27 82.90
19.12 26.06 15.30 47.38 16.90 5.71 2.96 <missing>
-0.56 14.64 -0.45 -2.08 2.23 5.63 -1.27
"
>>
where
function out = cssm( ffs )
%%
chr = fileread( ffs );
% getting rid of carrige return simplifies the follow code
chr = strrep( chr, char(13), '' );
% split the text string with the a single name on a row as delimiter
% convert from class char to string, because I want to use strings;
% allow name be a valid Matlab variable name; allow trailing space (\x20)
[ data, names ] = strsplit( string(chr), '(?m)^[a-zA-Z]\w+\x20*$' ...
, 'DelimiterType','RegularExpression' );
% Since the files starts with a delimiter (name) there will be a leading
% empty data block. Delete it.
data(1) = [];
% The first character of the data blocks will be newline. Skip it.
% Would "extractAfter(data,1)" be better?
data = extractAfter( data, newline );
for jj = 1 : numel(names)
if all( ismember( char(data(jj)), [newline,' +-.0123456789']' ) )
block_type = "pure_numeric";
else
block_type = "unidentified";
end
switch block_type
case "pure_numeric"
block = pure_numeric_data_( data(jj) );
otherwise
block = unidentified_data_( data(jj) );
end
% Assign the result of a block to an output structure.
out(1).(names(jj)) = block;
end
end
function num = pure_numeric_data_( data ) %
%%
% The lines of a data block contains different numbers of columns. One way
% to cope with this is to add many empty columns and read the fithteen first
% columns. textscan() can handle too many but not too few (I thought).
%
% Read the fithteen first columns and skip the rest. Fithteen is a
% magic nymber that I chose. Using both white-space and comma as
% delimiter seems to work fine. However, I'm not sure whether the
% documentations says so.
data = strrep( data, newline, ",,,,,,,,,,,,,,,"+newline );
num = textscan( data, '%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%*[^\n]' ...
, 'Delimiter',{','}, 'Collectoutput',true );
num = num{1};
% Delete columns with only NaNs (and hope such columns cannot occur
% intentionally).
num( :, all(isnan(num),1) ) = [];
end
function blk = unidentified_data_( data ) %
blk = data;
end
and folded the functions looks like this

8 Comments
Walter Roberson
on 17 Aug 2020
this has the same weakness as my code in assuming numeric fields.
per isakson
on 17 Aug 2020
"when we have different lines and different formats of data" I was about to says something about this sentence, but failed.
I often find it difficult to envision the context of the question. I suspect that it's a XY-problems. Then I do my answer more as an exercise of my own interest. This time using strings.
Regarding "assuming numeric fields", I don't want to write a new import() function. Over how many releses has TMW struggled with that function? It is reasonable to assume that OP prioritized numerics, since she/he provided four such blocks and no other.
One tactic is to give a limited answer and see if OP returns and explains what she/he actually wants.
Walter Roberson
on 17 Aug 2020
I got most of the way in handling text inside of columns, but stopped when I realized that I would either have to write a real loop or else end up evaluating some expressions twice in a cellfun inside a cellfun, or else write a real function.
Walter Roberson
on 17 Aug 2020
And I didn't feel like redeveloping my csv2table function that I posted a few years ago, which worried about datetime objects as well, and worried about whether the user had a new enough matlab release for various purposes. Decided I would wait to see if the user even needed those things.
JAMMI ASHOK
on 17 Aug 2020
per isakson
on 17 Aug 2020
Edited: per isakson
on 18 Aug 2020
I'm not familiar with file formats for input and output of FEM analysis.
Searching for "tag:FEM" in the File Exchange gives 100+ hits, one of them ANSYSimport, Imports ANSYS mesh and results data.
The requirements on an import function ought to say something (more is better) on how the results should be organised in Matlab variables. How shall the data be used; typical uses cases?
You are welcome with questions regarding details. To me the entire function is large for this forum.
per isakson
on 18 Aug 2020
Edited: per isakson
on 18 Aug 2020
I added some kind of response to my answer. It's more about programming style.
In your case the statement
[ data, names ] = strsplit( string(chr), '(?m)^[a-zA-Z]\w+\x20*$' ...
, 'DelimiterType','RegularExpression' );
must be modified. Maybe, it suffies to match lines that starts with "#", '(?m)^#.+$'. The output name, names, is now misleading.
The code block
if all( ismember( char(data(jj)), [newline,' +-.0123456789']' ) )
block_type = "pure_numeric";
else
block_type = "unidentified";
end
can be replaced by code that deduce the value of block_type and some appropriate field names from the now misnamed variable names.
Use profile() to decide whether the function is becomming too slow.
JAMMI ASHOK
on 18 Aug 2020
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!