Using TEXTSCAN to import an ASCII file with a header and blank lines between different data sets
Show older comments
I have several text files that represent a house and each file has several data sets that represent a room within the house.
The text file looks similar to the following but a majority of the data has been deleted. Each zone has 1440 lines of data and each house has a different number of zones:
project: House1_1 Tue Mar 19 12:30:42 2013
description:
date time time Ozone
of day [s] [kg/kg]
level: firstfloor zone: bedroom1
Jan01 00:00:00 0 0.000e+000
Jan01 00:01:00 60 1.487e-009
Jan01 00:02:00 120 5.330e-009
Jan01 00:03:00 180 1.084e-008
Jan01 23:57:00 86220 1.575e-007
Jan01 23:58:00 86280 1.575e-007
Jan01 23:59:00 86340 1.575e-007
Jan01 24:00:00 86400 1.575e-007
level: firstfloor zone: kitchen
Jan01 00:00:00 0 0.000e+000
Jan01 00:01:00 60 1.483e-009
Jan01 00:02:00 120 5.315e-009
Jan01 00:03:00 180 1.081e-008
Jan01 23:57:00 86220 1.564e-007
Jan01 23:58:00 86280 1.564e-007
Jan01 23:59:00 86340 1.564e-007
Jan01 24:00:00 86400 1.564e-007
level: firstfloor zone: bedroom2
Jan01 00:00:00 0 0.000e+000
Jan01 00:01:00 60 1.486e-009
Jan01 00:02:00 120 5.321e-009
Jan01 00:03:00 180 1.081e-008
Jan01 23:57:00 86220 1.549e-007
Jan01 23:58:00 86280 1.549e-007
Jan01 23:59:00 86340 1.549e-007
Jan01 24:00:00 86400 1.549e-007
The final goal is to generate a graph of ozone concentration versus time for each house that contains all of the zones for that house. Presently I am having trouble importing the data. I can use the following code to open the first zone in one file. I only need the data from the fourth column. I do not need the first 9 lines (header info) or the 3 lines in between zones but I need the data for each zone to be its own data set.
fid=fopen('House1-1.txt');
temp=textscan(fid,'%*s %*s %*d %f','Headerlines',9);
fclose(fid);
I can not figure out how to create a loop to read to the end of each file and get the data for each zone into its own array. I also need the loop to read each house file within the folder. Any help would be appreciated.
Accepted Answer
More Answers (3)
per isakson
on 27 Mar 2013
Edited: per isakson
on 27 Mar 2013
Here is one of many alternate solutions.
>> [ header, block_head, block_data ] = cssm()
header =
' project: House1_1 Tue Mar 19 12:30:42 2013'
''
' description: '
''
' date time time Ozone'
' of day [s] [kg/kg]'
''
block_head =
' level: firstfloor zone: bedroom1'
'zone: kitchen'
'zone: bedroom2'
block_data =
[8x1 double]
[8x1 double]
[8x1 double]
>>
The values of block_head are obviously corrupted.
where cssm is
function [ header, block_head, block_data ] = cssm()
fid = fopen( 'cssm.txt' );
% cac = textscan( fid, '%[^\n]' ); swallows empty lines
cac = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
ixs = find( strncmp( 'level:', cac{:}, 6 ) );
fid = fopen( 'cssm.txt' );
header = cell( ixs(1)-1, 1 );
for ii = 1 : ixs(1)-1
header{ii} = fgetl( fid );
end
nnblock = numel( ixs );
ixs(end+1) = size( cac{:}, 1 );
block_head = cell( nnblock, 1 );
block_data = cell( nnblock, 1 );
for iib = 1 : nnblock
block_head{iib} = fgetl( fid );
block_data(iib) = textscan(fid,'%*s%*s%*d%f', ixs(iib+1)-ixs(iib) );
end
fclose( fid );
end
and cssm.txt consist of the data line in your question.
.
Next try without reading block_head:
>> [ header, block_head, block_data ] = cssm()
header =
' project: House1_1 Tue Mar 19 12:30:42 2013'
''
' description: '
''
' date time time Ozone'
' of day [s] [kg/kg]'
block_head =
[]
[]
[]
block_data =
[8x1 double]
[8x1 double]
[8x1 double]
where cssm is
function [ header, block_head, block_data ] = cssm()
fid = fopen( 'cssm.txt' );
% cac = textscan( fid, '%[^\n]' ); swallows empty lines
cac = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
ixs = find( strncmp( 'level:', cac{:}, 6 ) );
fid = fopen( 'cssm.txt' );
header = cell( ixs(1)-2, 1 );
for ii = 1 : ixs(1)-2
header{ii} = fgetl( fid );
end
nnblock = numel( ixs );
ixs(end+1) = size( cac{:}, 1 ) + 2;
block_head = cell( nnblock, 1 );
block_data = cell( nnblock, 1 );
for iib = 1 : nnblock
block_data(iib) = textscan( fid, '%*s%*s%*d%f' ...
, ixs(iib+1)-ixs(iib)-3 ...
, 'Headerlines', 3 );
end
fclose( fid );
end
.
One more try:
>> [ header, block_head, block_data ] = cssm()
header =
' project: House1_1 Tue Mar 19 12:30:42 2013'
''
' description: '
''
' date time time Ozone'
' of day [s] [kg/kg]'
block_head =
{3x1 cell}
{3x1 cell}
{3x1 cell}
block_data =
[8x1 double]
[8x1 double]
[8x1 double]
>> block_head{1}
ans =
''
'level: firstfloor zone: bedroom1'
''
>> block_head{2}
ans =
''
''
'level: firstfloor zone: kitchen'
>> block_head{3}
ans =
''
''
'level: firstfloor zone: bedroom2'
block_head contains two successive empty "lines" in block_head 2 and 3. However, the data file does nowhere display an empty line after another empty line. I find this strange.
where
function [ header, block_head, block_data ] = cssm()
fid = fopen( 'cssm.txt' );
% cac = textscan( fid, '%[^\n]' ); swallows empty lines
cac = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
ixs = find( strncmp( 'level:', cac{:}, 6 ) );
fid = fopen( 'cssm.txt' );
header = cell( ixs(1)-2, 1 );
for ii = 1 : ixs(1)-2
header{ii} = fgetl( fid );
end
nnblock = numel( ixs );
ixs(end+1) = size( cac{:}, 1 ) + 2;
block_head = cell( nnblock, 1 );
block_data = cell( nnblock, 1 );
for iib = 1 : nnblock
block_head(iib) = textscan( fid, '%s', 3, 'Delimiter', '\n' );
block_data(iib) = textscan( fid, '%*s%*s%*d%f' ...
, ixs(iib+1)-ixs(iib)-3 ...
, 'Headerlines', 0 );
end
fclose( fid );
end
.
Discussion:
There must be a better way to handle empty lines.
Kristia
on 27 Mar 2013
6 Comments
Don't copy the >> from our code; they represent the prompt in the command window. For my code, execute the following
buffer = fileread('House1-1.txt') ;
pattern = 'level:\s*(?<level>\S+)\s+zone:\s*(?<zone>\S+)\s*(?<data>.*?)(?=($|level))' ;
blocks = regexp(buffer, pattern, 'names') ;
If it works, the variable blocks is a struct array, which is an array of structs (variables with fields).
length(blocks)
will give you the number of structs present in the array, and you can do as follows for accessing e.g. the field level of struct 1:
blocks(1).level
There are two other fields: zone and data. EDIT: You can process data as follows:
D = textscan(blocks(1).data, '%s %d:%d:%d %d %f') ;
and you will see that D is a cell array that contains the data parsed.
The issue if you are just beginning with MATLAB is that you are dealing with a file that has a 2 levels structure, which is not the easiest thing to manage.
Per's solution is the standard approach I would say for files with some structure. My approach is based on pattern matching (using regular expressions); it is less standard for files with some structure, but I thought that the outcome of REGEXP would be simpler for you to process (I'm not sure about that though).
=== EDIT ===
Here is a more complete (working) example..
buffer = fileread('House1-1.txt') ;
pattern = 'level:\s*(?<level>\S+)\s+zone:\s*(?<zone>\S+)\s*(?<data>.*?)(?=($|level))' ;
blocks = regexp(buffer, pattern, 'names') ;
for k = 1 : length(blocks)
D = textscan(blocks(k).data, '%s %d:%d:%d %d %f') ;
figure(k) ;
plot(D{5}, D{6}) ;
grid on ;
title(sprintf('Level = %s, zone = %s\n', blocks(k).level, blocks(k).zone));
xlabel('Time [s]') ;
ylabel('Ozone [kg/kg]') ;
end
But again, it won't be simple if you just started MATLAB, as it mixes regular expressions, struct arrays, cell arrays, etc.
per isakson
on 27 Mar 2013
Edited: per isakson
on 27 Mar 2013
Kristia
on 29 Mar 2013
Cedric
on 29 Mar 2013
You're welcome. If you copy and paste the code that I wrote in my EDIT above in an M-file, it should be working directly as it is (if the M-file is saved in the same directory as the file House1-1.txt).
Kristia
on 1 Apr 2013
You're welcome! Don't forget to [ Accept ] one of the answers if it helped, and if you accept mine, don't forget to /\ vote for Per Isakson's answer as well, because he took time to write and test a quite complete answer that is indeed the standard way for processing this kind of file structure (my answer is more compact, but less standard).
Gabriel Felix
on 24 May 2020
I had to use \n at the end of each line. Without it I couldn't make textscan() work properly, even thoug the "HeaderLines" was configured according to the text file lines. This was the only solution I found after struggling with the code for an intire day.
This was the text:
!
!
! alfa (graus) = 5.0
!
! Id. x/s z/s alfai cl c*cl/cmed cdi cmc/4
! (graus)
1 .246 .050 -1.209 .255 .332 .00538 .0170
2 .292 .150 -1.098 .259 .319 .00496 .0545
3 .339 .250 -.925 .254 .297 .00410 .0944
4 .385 .350 -.741 .243 .268 .00315 .1341
5 .432 .450 -.561 .227 .235 .00223 .1714
6 .479 .550 -.393 .206 .199 .00141 .2034
7 .525 .650 -.238 .181 .163 .00075 .2266
8 .572 .750 -.101 .152 .126 .00027 .2362
9 .619 .850 .014 .116 .089 -.00003 .2236
10 .659 .938 .103 .074 .052 -.00013 .1693
!
! CL asa = .208
! CDi asa = .00258
! e (%) = 88.9
! CMc/4 asa = .1339
My code:
%! alfa (graus) = 5.0
P = textscan(fid,'! alfa (graus) = %f','Delimiter',' ','MultipleDelimsAsOne',true,'headerLines',2,'CollectOutput',1);
alpha(1) = P{1};
%! CL asa = .208
P = textscan(fid,'! CL asa = %f\n','Delimiter',' ','MultipleDelimsAsOne',true,'CollectOutput',1,'headerLines',4+n);
CL(1) = P{1};
%! CDi asa = .00258
P = textscan(fid,'! CDi asa = %f\n','Delimiter',' ','MultipleDelimsAsOne',true,'CollectOutput',1,'headerlines',0);
CDi(1) = P{1};
%! CMc/4 asa = .1339
P = textscan(fid,'! CMc/4 asa = %f','Delimiter',' ','MultipleDelimsAsOne',true,'CollectOutput',1,'HeaderLines',2);
Cmc4(1) = P{1};
Categories
Find more on Data Import and Export in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!