MATLAB Answers


Textscan: how to ignore single '-' characters, while preserving '-' in negative numbers?

Asked by R V
on 13 Sep 2012

I use the following code to read the block below.

fid = fopen('data.csv');
C = textscan(fid,'%s%f%f%f%f%f%f%f%f','headerlines',1,'delimiter',';');

Because of the single '-' characters in data.csv this does not work yet. I want to ignore single '-' characters from the input and use NaN values there.

How can I read single '-' characters as NaN? I tried 'TreatAsEmpty' but this leads to the situation where negative values are transformed to positive. Because negative values also include a '-' character, and 'TreatAsEmpty' also removes these.


01-01-2006 (00 uur);-;-1.61;-;-0.70;-;1;-;239
01-01-2006 (01 uur);-;-1.66;-;-0.70;-;-;-;1108
01-01-2006 (02 uur);-;-1.68;-;-0.75;-;1;-;1827
01-01-2006 (03 uur);-;-1.64;-;-0.77;-;-;-;-
01-01-2006 (04 uur);-;-1.62;-;-0.74;-;-;-;-
01-01-2006 (05 uur);-;-1.61;-;-0.74;-;1;-;2053
01-01-2006 (06 uur);-;-1.66;-;-0.75;-;-;-;2870
01-01-2006 (07 uur);-;-1.68;-;-0.80;-;0;-;3585
01-01-2006 (08 uur);-;-1.64;-;-0.80;-;-;-;-
01-01-2006 (09 uur);-;-1.63;-;-0.79;-;-;-;-
01-01-2006 (10 uur);-;-1.62;-;-0.77;-;-;-;-
01-01-2006 (11 uur);-;-1.62;-;-0.74;-;1;-;3967

[EDITED, Jan, code and file contents formatted]

  1 Comment

Jan Simon
on 13 Sep 2012

Yes, per.



No products are associated with this question.

1 Answer

Answer by per isakson
on 13 Sep 2012
Edited by per isakson
on 13 Sep 2012
 Accepted answer

Try this

    str = fileread('cssm.txt'); 
    str = strrep( str, '-;', 'nan;' );
    nl  = [char(13),char(10)];
    str = regexprep( str, [';-\s*',nl], [';nan',nl] );
    C = textscan( str,'%s%f%f%f%f%f%f%f%f','headerlines',1,'delimiter',';'); 

where cssm.txt contains the rows of text in the question

The approach is

  1. read the whole file as text
  2. replace the "-", which stands for missing, with NaN
  3. parse the modified string with textscan

Note: the value of the variable, nl, must match the end of line characters in your file.

Jan, thanks for formatting the question.


Matt Tearle
on 13 Sep 2012

How about this

str = fileread('hyphens.txt')
str = regexprep(str,'-(?!\d)','nan')
C = textscan(str,'%s%f%f%f%f%f%f%f%f','headerlines',1,'delimiter',';');

@Roel, I guess you need to change

    nl  = [char(13),char(10)];


    nl  = [char(10)];

according to my "Note:". You could check with

    double( str(1:80) )

and look for the number "10". Is it preceeded by "13" or not?


@Matt, your expression is better; it is shorter and more robust. It actually checks whether "-" is followed by a digit. I tried the approach, but made a mistake:(. Thus replace

    str = strrep( str, '-;', 'nan;' );
    nl  = [char(13),char(10)];
    str = regexprep( str, [';-\s*',nl], [';nan',nl] );


    str = regexprep(str,'-(?!\d)','nan')
on 14 Sep 2012

Brilliant! this works very well. Thanks a lot Per and Matt!

Join the 15-year community celebration.

Play games and win prizes!

Learn more
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

MATLAB Academy

New to MATLAB?

Learn MATLAB today!