Importing line-by-line from a text file

18 views (last 30 days)
Hi I have a problem choosing what kind of tool to import data from a text file. The data looks something like this although the lines are like 4000 and
35, 0401-BA1,1,6,8,00000,0,01,053,50530,'Channel ',,
35, 0401-CB1,1,6,8,00000,0,01,051,50510,'Spillway ',,
35, 0401-EB1,1,3,2,00052,0,01,051,50510,'RioGrande ',,
35, 0401-FS1,1,6,8,00203,0,01,011,27340,'Spillway ',,
35, 0401-MB1,1,6,8,00000,0,01,037,0 ,'Canal ',,
I used the fscanf , textscan but I don't understand how to tackle the spaces. for the second space is it like %15c ?? Also How do I switch to another line should I use /n? Please help me here guys. Thank you
  1 Comment
Star Strider
Star Strider on 3 Apr 2015
Attaching the text file (or a representative part of it) would be nice. (Use the ‘paperclip’ icon.)

Sign in to comment.

Accepted Answer

per isakson
per isakson on 3 Apr 2015
Edited: per isakson on 4 Apr 2015
Hint
str = '35, 0401-BA1,1,6,8,00000,0,01,053,50530,''Channel '',,';
fmt = '%d%s%d%d%d%d%d%d%d%d%s%s';
cac = textscan( str, fmt ...
, 'Delimiter' , ',' ...
, 'CollectOutput', true );
  • " how to tackle the spaces" &nbsp depends on how you want the spaces in the resulting string variable values
  • "line-by-line" &nbsp what exactly do you mean?
&nbsp
Addendum I
This illustrates an approach. However, due to various mismatches between the header row and the data rows the result is not useful.
sas = cssm()
returns
1x4000 struct array with fields:
STATE_CODE_001
STRUCTURE_NUMBER_008
RECORD_TYPE_005A
ROUTE_PREFIX_005B
SERVICE_LEVEL_005C
ROUTE_NUMBER_005D
DIRECTION_005E
HIGHWAY_DISTRICT_002
COUNTY_CODE_003
PLACE_CODE_004
....
where
function sas = cssm
ffs = 'c:\m\cssm\NM14.txt';
fid = fopen( ffs );
str = fgetl( fid );
col_head_list = parse_header( str );
sas = repmat( cell2struct( repmat({''},1,112), col_head_list(1:112), 2 ), 1,4000 );
jj = 0;
while not( feof(fid) )
str = fgetl( fid );
buf = parse_data_row( str );
jj = jj + 1;
sas(1,jj) = cell2struct( buf(1:112), col_head_list(1:112), 2 );
end
fclose( fid );
end
function cac = parse_header( str )
cac = strsplit( str, ',' );
cac = strtrim( cac );
end
function cac = parse_data_row( str )
cac = strsplit( str, ',' );
cac = strtrim( cac );
end
&nbsp
Addendum II
I was fooled by the function, strsplit, which as default treats multiple delimiters as one. In my world that only applies to space. Anyhow, now I have fixed the function cssm.
Read the data to a structure
>> sas=cssm;
find all bridges, which have STATUS_WITH_10YR_RULE equal to '2'.
>> is_sw10yrEq2 = strcmp('2',{sas.STATUS_WITH_10YR_RULE});
>> sum(is_sw10yrEq2)
ans =
324
there are 324 of them. List their building year
>> {sas(is_sw10yrEq2).YEAR_BUILT_027}
ans =
Columns 1 through 9
'1915' '1936' '1928' '1931' '1920' '1951' '1933' '1939' '1939'
Columns 10 through 18
'1924' '1931' '1931' '1931' '1932' '1932' '1936' '1937' '1936'
....
where
function sas = cssm
ffs = 'c:\m\cssm\NM14.txt';
fid = fopen( ffs );
str = fgetl( fid );
colhead_list = parse_header( str );
n_colhead = length( colhead_list );
sas = repmat( cell2struct ...
( repmat({''},1,n_colhead), colhead_list, 2 ) ...
, 1,4000 );
jj = 0;
while not( feof(fid) )
str = fgetl( fid );
buf = parse_data_row( str );
jj = jj + 1;
sas(1,jj) = cell2struct( buf, colhead_list, 2 );
end
fclose( fid );
sas(1,jj+1:end) = [];
end
function cac = parse_header( str )
cac = strsplit( str, ',', 'CollapseDelimiters',false );
cac = strtrim( cac );
end
function cac = parse_data_row( str )
cac = strsplit( str, ',', 'CollapseDelimiters',false );
cac = strtrim( cac );
end
&nbsp
Addendum III
Caveat: I hardly used the new (couple of years old), data type table. Nevertheless, table is targeted at problems like yours and I think you should
  • use Import Data interactively to read from a file of format B (one header line, comma separated, varying width) to a table. Make sure to uncheck Multiple Delimiters as One and select table. I've tested; it worked fine.
  • spend time with the documentation of table instead of fighting file formats.
&nbsp
Addendum IV
It is straightforward to read the file of format B (one header line, comma separated, varying width) with textscan
>> cac = cssm
cac =
{3951x134 cell}
where
function cac = cssm
ffs = 'c:\m\cssm\NM14.txt'; % one header line, comma separated, varying width
fid = fopen( ffs );
str = fgetl( fid );
chl = strtrim( strsplit( str, ',', 'CollapseDelimiters',false ) );
len = length( chl );
frm = repmat( '%s', 1, len );
cac = textscan( fid, frm, 'CollectOutput',true );
fclose( fid );
end
  7 Comments
adrooney
adrooney on 3 Apr 2015
The C option is much more easy. It has got fixed widths. I tried to uses textscan but not successful. Could you give your ideas on the C option. I am attaching the field widths in an xls sheet
per isakson
per isakson on 3 Apr 2015
Edited: per isakson on 4 Apr 2015
"[...] ideas on the C option" &nbsp With Matlab, fixed widths should be avoided and when one cannot:
cac = textscan( fid, '%w1s%w2s%w3s%w4s%w5s...', .... );
where w1,w2,w3, etc. are the widths of the columns, e.g. '%8s%12s%30s'. Convert the "numeric columns" to numerical variables in a second step.

Sign in to comment.

More Answers (0)

Categories

Find more on Data Import and Export in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!