Import Irregular and nonpaterned text data into a matrix form

1 view (last 30 days)
I am sure this is covered elsewhere but I have been unable to find it. I am interested in creating a parser that will extract data from a text file and put it into a matrix where it will be viewable in Excel. A sample of my data is below:
Failure Report
Time: 00:12:34
Fault ID: Converter fail
Fault Description: Channel 4 of the converter has failed during tuning
\n
Failure Report
Time: 00:12:37
Fault ID: Comparator 4 Fail
Fault Description: comparator 4 has failed
\n
Failure Report
Time: 00:12:39
Fault ID: Converter fail in practice
Fault Description: Channel 4 of the converter has failed when in mode 5
\n
Failure Report
Time: 00:12:45
Fault ID: Converter 12 Fail
Fault Description: Converter 12 has failed because x = -2 and y = -4
So far I have used an if statement with regexp to find the 'Failure Report' and then go into that message. I can easily extract the time (because it is consistent) with textscan, but I am having problems wrapping my head around how to pull out the Fault ID and the Fault Description because they are irregular in size and type of data in them. Does anyone have a suggestion on how I could go about getting this information in a format like this?
Time________________Fault ID________________Failure Report
00:12:34____________Converter fail__________Channel 4 of the converter has failed during tuning
00:12:37____________Comparator 4 Fail________Comparator 4 has failed
Any help would be greatly appreciated.
~Jenn
  2 Comments
Jennifer
Jennifer on 24 Oct 2013
I finally got a chance to try this in my code and I got the error: "Error using cell fun Input #2 expected to be a cell array, was char instead"
There is no value for x but there are values for ii, len and isin. Suggestions?
Kelly Kearney
Kelly Kearney on 28 Oct 2013
How did you read in your file? I assumed that your data would be a cell array, with one string per line, such as would be created by, say
fid = fopen('file.txt');
data = textscan(fid, '%s', 'delimiter', '\n');
filetext = data{1};
but it seems you have a character array instead, from, maybe
filetext = fileread('file.txt');
Try running:
filetext = regexp(filetext, '\n', 'split')';
then trying the parse portion of my code.

Sign in to comment.

Answers (1)

Kelly Kearney
Kelly Kearney on 20 Sep 2013
Assuming I've interpreted your file format correctly, you probably don't need regexp. My parsing below assumes that 1) each entry has the same info (i.e. time, id, descrip, etc) 2) All pieces of info are on their own line, and 3) All entries of interest consist of a key phrase followed by a colon, a space, and the string of interest.
% Data (assuming file is read into a cell array)
filetext = {...
'Failure Report'
'Time: 00:12:34'
'Fault ID: Converter fail'
'Fault Description: Channel 4 of the converter has failed during tuning'
'\n'
''
'Failure Report'
'Time: 00:12:37'
'Fault ID: Comparator 4 Fail'
'Fault Description: comparator 4 has failed'
'\n'
''
'Failure Report'
'Time: 00:12:39'
'Fault ID: Converter fail in practice'
'Fault Description: Channel 4 of the converter has failed when in mode 5'
'\n'};
% Parse
markers = {'Time', 'Fault ID', 'Fault Description'};
for ii = 1:length(markers)
len = length(markers{ii});
isin = strncmp(filetext, markers{ii}, len);
data(:,ii) = cellfun(@(x) x(len+3:end), filetext(isin), 'uni', 0);
end
  1 Comment
Jennifer
Jennifer on 20 Sep 2013
Thank you Kelly,
I will give it a shot and see if it works out for me. I did simplify the problem a little by removing the other reports that will be randomly interspersed in the text file. For instance there would be 'System Report' and 'Health Report' as well as the 'Failure Report' that I specified previously.
Your second and third assumption are correct each entry will have its own line and each entry will have its phrase followed by : and then a tab (didn't come out well in my formatting) followed then by the entry of interest.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!