Textscan: how to ignore single '-' characters, while preserving '-' in negative numbers?

8 views (last 30 days)
I use the following code to read the block below.
fid = fopen('data.csv');
C = textscan(fid,'%s%f%f%f%f%f%f%f%f','headerlines',1,'delimiter',';');
fclose(fid);
Because of the single '-' characters in data.csv this does not work yet. I want to ignore single '-' characters from the input and use NaN values there.
How can I read single '-' characters as NaN? I tried 'TreatAsEmpty' but this leads to the situation where negative values are transformed to positive. Because negative values also include a '-' character, and 'TreatAsEmpty' also removes these.
Block:
Headerline
01-01-2006 (00 uur);-;-1.61;-;-0.70;-;1;-;239
01-01-2006 (01 uur);-;-1.66;-;-0.70;-;-;-;1108
01-01-2006 (02 uur);-;-1.68;-;-0.75;-;1;-;1827
01-01-2006 (03 uur);-;-1.64;-;-0.77;-;-;-;-
01-01-2006 (04 uur);-;-1.62;-;-0.74;-;-;-;-
01-01-2006 (05 uur);-;-1.61;-;-0.74;-;1;-;2053
01-01-2006 (06 uur);-;-1.66;-;-0.75;-;-;-;2870
01-01-2006 (07 uur);-;-1.68;-;-0.80;-;0;-;3585
01-01-2006 (08 uur);-;-1.64;-;-0.80;-;-;-;-
01-01-2006 (09 uur);-;-1.63;-;-0.79;-;-;-;-
01-01-2006 (10 uur);-;-1.62;-;-0.77;-;-;-;-
01-01-2006 (11 uur);-;-1.62;-;-0.74;-;1;-;3967
[EDITED, Jan, code and file contents formatted]

Accepted Answer

per isakson
per isakson on 13 Sep 2012
Edited: per isakson on 13 Sep 2012
Try this
str = fileread('cssm.txt');
str = strrep( str, '-;', 'nan;' );
nl = [char(13),char(10)];
str = regexprep( str, [';-\s*',nl], [';nan',nl] );
C = textscan( str,'%s%f%f%f%f%f%f%f%f','headerlines',1,'delimiter',';');
where cssm.txt contains the rows of text in the question
The approach is
  1. read the whole file as text
  2. replace the "-", which stands for missing, with NaN
  3. parse the modified string with textscan
Note: the value of the variable, nl, must match the end of line characters in your file.
Jan, thanks for formatting the question.
  4 Comments
per isakson
per isakson on 13 Sep 2012
Edited: per isakson on 13 Sep 2012
@Roel, I guess you need to change
nl = [char(13),char(10)];
to
nl = [char(10)];
according to my "Note:". You could check with
double( str(1:80) )
and look for the number "10". Is it preceeded by "13" or not?
.
@Matt, your expression is better; it is shorter and more robust. It actually checks whether "-" is followed by a digit. I tried the approach, but made a mistake:(. Thus replace
str = strrep( str, '-;', 'nan;' );
nl = [char(13),char(10)];
str = regexprep( str, [';-\s*',nl], [';nan',nl] );
by
str = regexprep(str,'-(?!\d)','nan')

Sign in to comment.

More Answers (0)

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!