Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Textscan: how to ignore single '-' characters, while preserving '-' in negative numbers?

Asked by R V on 13 Sep 2012

I use the following code to read the block below.

fid = fopen('data.csv');
C = textscan(fid,'%s%f%f%f%f%f%f%f%f','headerlines',1,'delimiter',';');
fclose(fid);

Because of the single '-' characters in data.csv this does not work yet. I want to ignore single '-' characters from the input and use NaN values there.

How can I read single '-' characters as NaN? I tried 'TreatAsEmpty' but this leads to the situation where negative values are transformed to positive. Because negative values also include a '-' character, and 'TreatAsEmpty' also removes these.

Block:

Headerline
01-01-2006 (00 uur);-;-1.61;-;-0.70;-;1;-;239
01-01-2006 (01 uur);-;-1.66;-;-0.70;-;-;-;1108
01-01-2006 (02 uur);-;-1.68;-;-0.75;-;1;-;1827
01-01-2006 (03 uur);-;-1.64;-;-0.77;-;-;-;-
01-01-2006 (04 uur);-;-1.62;-;-0.74;-;-;-;-
01-01-2006 (05 uur);-;-1.61;-;-0.74;-;1;-;2053
01-01-2006 (06 uur);-;-1.66;-;-0.75;-;-;-;2870
01-01-2006 (07 uur);-;-1.68;-;-0.80;-;0;-;3585
01-01-2006 (08 uur);-;-1.64;-;-0.80;-;-;-;-
01-01-2006 (09 uur);-;-1.63;-;-0.79;-;-;-;-
01-01-2006 (10 uur);-;-1.62;-;-0.77;-;-;-;-
01-01-2006 (11 uur);-;-1.62;-;-0.74;-;1;-;3967

[EDITED, Jan, code and file contents formatted]

1 Comment

Jan Simon on 13 Sep 2012

Yes, per.

R V

Tags

Products

No products are associated with this question.

1 Answer

Answer by per isakson on 13 Sep 2012
Edited by per isakson on 13 Sep 2012
Accepted answer

Try this

    %%
    str = fileread('cssm.txt'); 
    %%
    str = strrep( str, '-;', 'nan;' );
    %%
    nl  = [char(13),char(10)];
    str = regexprep( str, [';-\s*',nl], [';nan',nl] );
    %%
    C = textscan( str,'%s%f%f%f%f%f%f%f%f','headerlines',1,'delimiter',';'); 

where cssm.txt contains the rows of text in the question

The approach is

  1. read the whole file as text
  2. replace the "-", which stands for missing, with NaN
  3. parse the modified string with textscan

Note: the value of the variable, nl, must match the end of line characters in your file.

Jan, thanks for formatting the question.

4 Comments

Matt Tearle on 13 Sep 2012

How about this

str = fileread('hyphens.txt')
str = regexprep(str,'-(?!\d)','nan')
C = textscan(str,'%s%f%f%f%f%f%f%f%f','headerlines',1,'delimiter',';');
per isakson on 13 Sep 2012

@Roel, I guess you need to change

    nl  = [char(13),char(10)];

to

    nl  = [char(10)];

according to my "Note:". You could check with

    double( str(1:80) )

and look for the number "10". Is it preceeded by "13" or not?

.

@Matt, your expression is better; it is shorter and more robust. It actually checks whether "-" is followed by a digit. I tried the approach, but made a mistake:(. Thus replace

    str = strrep( str, '-;', 'nan;' );
    %%
    nl  = [char(13),char(10)];
    str = regexprep( str, [';-\s*',nl], [';nan',nl] );

by

    str = regexprep(str,'-(?!\d)','nan')
R V on 14 Sep 2012

Brilliant! this works very well. Thanks a lot Per and Matt!

per isakson

Contact us