How to replace all 1x1 cells containing 'NAN' with 'NaN'?

Question

0 votes

Attached is a cell array that I need to convert to an array of doubles (for plotting). The values were pulled from textfile (into a table) where Not-A-Number values were represented by the string "NAN". This isn't prefered since matlab's str2num function only recognizes 'NaN' and 'nan' and will only convert those two specific string values to NaN value.

I've tried a couple of things and was surprised this didn't work:

find(strcmp(bin1, 'NAN')); %  0×1 empty double column vector
find(strcmp(bin1, '"NAN"')); %  0×1 empty double column vector

So the tasks at hand are:

Find all cell arrays containing the string 'NAN' and substitute it with 'NaN' or 'nan'
Convert the cell array of 'NaN' strings and double into an array of doubles.

________________

Unrelated but this was the call I'd used to convert the table of strings to a cell array

bin1 = cellfun(@str2num, table2cell(table.WH.bin1.speed), 'UniformOutput', false); 

And this was how I got a table of string values from the textfile:

table = readtable(filename, delimitedTextImportOptions('Delimiter', ','));

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

dpb on 7 Aug 2019

Edited: dpb on 9 Aug 2019

Open in MATLAB Online

0 votes

opt=detectImportOptions('Equinor.txt');
t=readtable('Equinor.txt',opt);
>> t.TIMESTAMP=datetime(t.TIMESTAMP,'InputFormat','uuuu-MM-dd HH:mm:ss')
t =
  13×22 table
         TIMESTAMP          RECORD    BinNum    Depth    EastComp    NorthComp    Speed    Direction    VerticalVel    ErrorVel    Corr_1    Corr_2    Corr_3    Corr_4    Echo_1    Echo_2    Echo_3    Echo_4    Pgp_1    Pgp_2    Pgp_3    Pgp_4
    ____________________    ______    ______    _____    ________    _________    _____    _________    ___________    ________    ______    ______    ______    ______    ______    ______    ______    ______    _____    _____    _____    _____
    08-Jun-2018 14:09:58     806      1         -8.8       NaN        NaN          NaN       NaN         NaN           NaN           6         6         6         5        33        34        39        40       0        0        100       0   
    08-Jun-2018 14:19:58     807      1         -8.8       NaN        NaN          NaN       NaN         NaN           NaN           6         6         6         6        33        34        39        41       0        0        100       0   
    08-Jun-2018 14:29:58     808      1         -8.8       NaN        NaN          NaN       NaN         NaN           NaN           5         6         6         6        33        34        39        40       0        0        100       0   
    08-Jun-2018 14:39:58     809      1         -8.8       NaN        NaN          NaN       NaN         NaN           NaN           6         6         6         6        33        34        39        41       0        0        100       0   
    03-Oct-2018 09:29:54    1498      1         -9.1       NaN        NaN          NaN       NaN         NaN           NaN           6         5         6         6        38        38        42        40       0        0        100       0   
    03-Oct-2018 10:29:54    1504      1         -9.1       NaN        NaN          NaN       NaN         NaN           NaN           6         6         6         5        38        38        42        40       0        0        100       0   
    03-Oct-2018 11:29:54    1510      1         15.4     -25.7        8.4           27     288.1        -3.5           2.9         125       125       120       125       107       106       118       108       0        0          3      96   
    03-Oct-2018 12:09:54    1514      1         15.4     -33.1       15.7         36.6     295.4         1.5           5.1         126       126       123       126       106       105       118       106       0        0          0      99   
    03-Oct-2018 12:19:54    1515      1         15.4     -34.1       15.9         37.6       295         0.7           6.3         126       127       122       126       104       104       115       106       0        0          0      98   
    03-Oct-2018 12:29:54    1516      1         15.4     -36.6       10.8         38.2     286.4         1.2           7.1         127       126       121       127       104       103       117       105       0        0          0      98   
    03-Oct-2018 12:39:54    1517      1         15.4     -38.3        6.3         38.8     279.3        -1.3           6.6         127       126       122       127       104       103       117       104       2        0          0      98   
    03-Oct-2018 12:49:54    1518      1         15.4     -37.6        3.5         37.8     275.3         0.7           5.9         126       127       123       127       103       102       116       105       2        0          0      97   
    03-Oct-2018 12:59:54    1519      1         15.4     -41.3        2.8         41.4     273.9        -1.3           5.6         126       126       121       125       104       102       118       104       4        0          3      92   
>> 

NB: I trimmed a big chunk of the "NAN" section out of the posted file...

12 Comments
Show 10 older comments Hide 10 older comments

dpb on 8 Aug 2019

Edited: dpb on 8 Aug 2019

Open in MATLAB Online

The point is to read the file, correct? Use the tools MATLAB provides for the purpose...reading numeric data as text just makes it hard (as you've discovered).

If you just read the file with readtable, as you did, you confuse readtable royally, because the file has several header lines that aren't data that you're reading as if they were. readtable is pretty smart in deducing what is in a file and making guesses as to what it should do, but it needs some bread crumbs along the way to get things right for files that aren't relatively clean--and while this one isn't really bad, it isn't "plain vanilla!", either.

As demonstrated, the more thorough probing that detectImportOptions does got it figured out; readtable tried, but just isn't quite up to the task on its own.

>> table = readtable('Equinor.txt', 'Delimiter', ',')
table =
  15×22 table
          TIMESTAMP          RECORD    BinNum    Depth     EastComp    NorthComp    Speed     Direction    VerticalVel    ErrorVel    Corr_1    Corr_2    Corr_3    Corr_4    Echo_1    Echo_2    Echo_3    Echo_4    Pgp_1    Pgp_2    Pgp_3    Pgp_4
    _____________________    ______    ______    ______    ________    _________    ______    _________    ___________    ________    ______    ______    ______    ______    ______    ______    ______    ______    _____    _____    _____    _____
    'TS'                     'RN'      ''        ''        ''          ''           ''        ''           ''             ''          ''        ''        ''        ''        ''        ''        ''        ''        ''       ''       ''       ''   
    ''                       ''        ''        ''        ''          ''           ''        ''           ''             ''          ''        ''        ''        ''        ''        ''        ''        ''        ''       ''       ''       ''   
    '2018-06-08 14:09:58'    '806'     '1'       '-8.8'    'NAN'       'NAN'        'NAN'     'NAN'        'NAN'          'NAN'       '6'       '6'       '6'       '5'       '33'      '34'      '39'      '40'      '0'      '0'      '100'    '0'  
    '2018-06-08 14:19:58'    '807'     '1'       '-8.8'    'NAN'       'NAN'        'NAN'     'NAN'        'NAN'          'NAN'       '6'       '6'       '6'       '6'       '33'      '34'      '39'      '41'      '0'      '0'      '100'    '0'  
    '2018-06-08 14:29:58'    '808'     '1'       '-8.8'    'NAN'       'NAN'        'NAN'     'NAN'        'NAN'          'NAN'       '5'       '6'       '6'       '6'       '33'      '34'      '39'      '40'      '0'      '0'      '100'    '0'  
    '2018-06-08 14:39:58'    '809'     '1'       '-8.8'    'NAN'       'NAN'        'NAN'     'NAN'        'NAN'          'NAN'       '6'       '6'       '6'       '6'       '33'      '34'      '39'      '41'      '0'      '0'      '100'    '0'  
    '2018-10-03 09:29:54'    '1498'    '1'       '-9.1'    'NAN'       'NAN'        'NAN'     'NAN'        'NAN'          'NAN'       '6'       '5'       '6'       '6'       '38'      '38'      '42'      '40'      

Everything is a char variable and you've got those other lines in there that just don't belong at all.

If you would just be willing to use even one or two of the input options to match the file if you're still adamant about ignoring the more powerful tools TMW gave you for the purpose--

>> table = readtable('Equinor.txt', 'Delimiter', ',','headerlines',3,'ReadVariableNames',1)
table =
  13×22 table
            Var1             Var2    Var3    Var4     Var5       Var6      Var7      Var8       Var9     Var10    Var11    Var12    Var13    Var14    Var15    Var16    Var17    Var18    Var19    Var20    Var21    Var22
    _____________________    ____    ____    ____    _______    ______    ______    _______    ______    _____    _____    _____    _____    _____    _____    _____    _____    _____    _____    _____    _____    _____
    '2018-06-08 14:09:58'     806    1       -8.8    'NAN'      'NAN'     'NAN'     'NAN'      'NAN'     'NAN'      6        6        6        5       33       34       39       40      0        0        100       0   
    '2018-06-08 14:19:58'     807    1       -8.8    'NAN'      'NAN'     'NAN'     'NAN'      'NAN'     'NAN'      6        6        6        6       33       34       39       41      0        0        100       0   
    '2018-06-08 14:29:58'     808    1       -8.8    'NAN'      'NAN'     'NAN'     'NAN'      'NAN'     'NAN'      5        6        6        6       33       34       39       40      0        0        100       0   
    '2018-06-08 14:39:58'     809    1       -8.8    'NAN'      'NAN'     'NAN'     'NAN'      'NAN'     'NAN'      6        6        6        6       33       34       39       41      0        0        100       0   
    '2018-10-03 09:29:54'    1498    1       -9.1    'NAN'      'NAN'     'NAN'     'NAN'      'NAN'     'NAN'      6        5        6        6       38       38       42       40      0        0        100       0   
    '2018-10-03 10:29:54'    1504    1       -9.1    'NAN'      'NAN'     'NAN'     'NAN'      'NAN'     'NAN'      6        6        6        5       38       38       42       40      0        0        100       0   
    '2018-10-03 11:29:54'    1510    1       15.4    '-25.7'    '8.4'     '27'      '288.1'    '-3.5'    '2.9'    125      125      120      125      107      106      118      108      0        0          3      96   
    '2018-10-03 12:09:54'    1514    1       15.4    '-33.1'    '15.7'    '36.6'    '295.4'    '1.5'     '5.1'    126      126      123      126      106      105      118      106      0        0          0      99   
    '2018-10-03 12:19:54'    1515    1       15.4    '-34.1'    '15.9'    '37.6'    '295'      '0.7'     '6.3'    126      127      122      126      104      104      115      106      0        0          0      98   
    '2018-10-03 12:29:54'    1516    1       15.4    '-36.6'    '10.8'    '38.2'    '286.4'    '1.2'     '7.1'    127      126      121      127      104      103      117      105      0        0          0      98   
    '2018-10-03 12:39:54'    1517    1       15.4    '-38.3'    '6.3'     '38.8'    '279.3'    '-1.3'    '6.6'    127      126      122      127      104      103      117      104      2        0          0      98   
    '2018-10-03 12:49:54'    1518    1       15.4    '-37.6'    '3.5'     '37.8'    '275.3'    '0.7'     '5.9'    126      127      123      127      103      102      116      105      2        0          0      97   
    '2018-10-03 12:59:54'    1519    1       15.4    '-41.3'    '2.8'     '41.4'    '273.9'    '-1.3'    '5.6'    126      126      121      125      104      102      118      104      4        0          3      92   
>> table.Var5=str2double(table.Var5)
table =
  13×22 table
            Var1             Var2    Var3    Var4    Var5      Var6      Var7      Var8       Var9     Var10    Var11    Var12    Var13    Var14    Var15    Var16    Var17    Var18    Var19    Var20    Var21    Var22
    _____________________    ____    ____    ____    _____    ______    ______    _______    ______    _____    _____    _____    _____    _____    _____    _____    _____    _____    _____    _____    _____    _____
    '2018-06-08 14:09:58'     806    1       -8.8      NaN    'NAN'     'NAN'     'NAN'      'NAN'     'NAN'      6        6        6        5       33       34       39       40      0        0        100       0   
    '2018-06-08 14:19:58'     807    1       -8.8      NaN    'NAN'     'NAN'     'NAN'      'NAN'     'NAN'      6        6        6        6       33       34       39       41      0        0        100       0   
    '2018-06-08 14:29:58'     808    1       -8.8      NaN    'NAN'     'NAN'     'NAN'      'NAN'     'NAN'      5        6        6        6       33       34       39       40      0        0        100       0   
    '2018-06-08 14:39:58'     809    1       -8.8      NaN    'NAN'     'NAN'     'NAN'      'NAN'     'NAN'      6        6        6        6       33       34       39       41      0        0        100       0   
    '2018-10-03 09:29:54'    1498    1       -9.1      NaN    'NAN'     'NAN'     'NAN'      'NAN'     'NAN'      6        5        6        6       38       38       42       40      0        0        100       0   
    '2018-10-03 10:29:54'    1504    1       -9.1      NaN    'NAN'     'NAN'     'NAN'      'NAN'     'NAN'      6        6        6        5       38       38       42       40      0        0        100       0   
    '2018-10-03 11:29:54'    1510    1       15.4    -25.7    '8.4'     '27'      '288.1'    '-3.5'    '2.9'    125      125      120      125      107      106      118      108      0        0          3      96   
    '2018-10-03 12:09:54'    1514    1       15.4    -33.1    '15.7'    '36.6'    '295.4'    '1.5'     '5.1'    126      126      123      126      106      105      118      106      0        0          0      99   
    '2018-10-03 12:19:54'    1515    1       15.4    -34.1    '15.9'    '37.6'    '295'      '0.7'     '6.3'    126      127      122      126      104      104      115      106      0        0          0      98   
    '2018-10-03 12:29:54'    1516    1       15.4    -36.6    '10.8'    '38.2'    '286.4'    '1.2'     '7.1'    127      126      121      127      104      103      117      105      0        0          0      98   
    '2018-10-03 12:39:54'    1517    1       15.4    -38.3    '6.3'     '38.8'    '279.3'    '-1.3'    '6.6'    127      126      122      127      104      103      117      104      2        0          0      98   
    '2018-10-03 12:49:54'    1518    1       15.4    -37.6    '3.5'     '37.8'    '275.3'    '0.7'     '5.9'    126      127      123      127      103      102      116      105      2        0          0      97   
    '2018-10-03 12:59:54'    1519    1       15.4    -41.3    '2.8'     '41.4'    '273.9'    '-1.3'    '5.6'    126      126      121      125      104      102      118      104      4        0          3      92   
>> 

is much easier than trying to do character substitutions...particularly when you've managed to get the data out of the table and into a more deeply-nested cell array that contains string() components in conjunction with non-character cells that have to be treated differently by logicals.

It's possible to write such things using cellfun and isstr() combined with isstring() (yes, Virginia, they're different!) but it's just complexity being introduced for no good reason.

I repeat...use the tools made for the job!!! :)

dpb on 8 Aug 2019

Edited: dpb on 8 Aug 2019

Open in MATLAB Online

OK, now that it's clear not leading down the wrong path, to convert as you initially asked from your cell array takes two steps...

>> bin1(850:860)        % display the portion of array w/ "issues"...
ans =
  11×1 cell array
    {["NAN"  ]}
    {["NAN"  ]}
    {["NAN"  ]}
    {["NAN"  ]}
    {["NAN"  ]}
    {[     27]}
    {[36.6000]}
    {[37.6000]}
    {[38.2000]}
    {[38.8000]}
    {[37.8000]}
>> b=bin1(850:860);    % just save the interesting part--all needed to illustrate
>> is=cellfun(@(s) isstring(s) & s=="NAN",b)
Error using  == 
Comparison between double and string is not supported.
Error in @(s)isstring(s)&s=="NAN" 
>>

NB: this is the kind of thing you run into when you have mixed data types in the cell array and need to take actions that only are valid for one of those types...also NB: isstring and NOT istr -- the latter returns all false because you have a string, not a cellstr (note the "" and not ''). Having additional logic to determine that and/or deal with both is yet another complexity and why you don't want to go this route....

Anyway, back to regularly-scheduled broadcast:

>> is=cellfun(@(s) isstring(s) && s=="NAN",b)  % NB '&&' short-circuit operator
is =
  11×1 logical array
   1
   1
   1
   1
   1
   0
   0
   0
   0
   0
   0
>> b(is)={'NaN'}  % Finally, can set to a cellstr of fixed value
b =
  11×1 cell array
    {'NaN'    }
    {'NaN'    }
    {'NaN'    }
    {'NaN'    }
    {'NaN'    }
    {[     27]}
    {[36.6000]}
    {[37.6000]}
    {[38.2000]}
    {[38.8000]}
    {[37.8000]}
>> 

Just for the expedient of trying to not use detectImportOptions -- while I'll grant it is somewhat obtuse and complex, it's still far simpler than the alternative...

And, this as yet doesn't deal w/ the double() inside a cell inside a cell combined with a cellstr()...

IOW, "don't do that!" :)

dpb on 9 Aug 2019

Edited: dpb on 9 Aug 2019

Open in MATLAB Online

"... I don't have control over the formatting of the table (which is decided by a 3rd party datalogger..."

That's often the case -- BUT the format for the table will be fixed, correct? If that is so, you go through the pain of building the correct import object once and save it -- then you have it for all time going forward.

"Another table stored a single of byte data in ASCII: "00001000" which detectImportOptions interpreted as the number 1000."

Again, as long as you know which table you're reading, you can fix up the initial options object to match how you wish the data to be imported. If the software wrote the quoted string, unless there was a record ahead of that in the column that was numeric, it would have been interpreted as character. If that were the case, then also fixing the beginning data line might be needed or whatever in order to read/skip the header properly. That's one of the features shown in the earlier example.

In setvartype, the choices include for numeric data to be interpreted as one of

    Data	                    MATLAB® Data Type
Text           'char' or 'string'
Numeric        'single','double','int8|16|32|64', 'uint8|16|32|64'

Since the ML function bin2dec does expect string input, then forcing that column to be 'char' would be the correct choice to override the default interpretation as decimal input.

You may need to develop a library of these import objects to fit the set of files you routinely deal with, but having done so your subsequent coding should become much cleaner.

dpb on 10 Aug 2019

Open in MATLAB Online

"If the software wrote the quoted string, unless there was a record ahead of that in the column that was numeric, it would have been interpreted as character."

Well, that assumption actually isn't correct -- it seems that detectImportOptions is looking farther into the file before making that judgement. If the first record is string or character but the second is numeric, detectImportOptions returns numeric for the column. If the first two records are character, then numeric following that, the character form 'sticks'.

The conclusion is for such a file you'll have to ensure the correct format for each column if you can't guarantee the quoted string...and, since the testing here is pretty minimal so is possible other conditions could cause the symptom, it's basically required to make sure the VariableTypes option is set correctly when there's any doubt about how a particular column could be interpreted to ensure one reads it as expected.

>> type minh.txt
00001000, 1.234
00001011, 2.345
>> detectImportOptions('minh.txt')
ans = 
  DelimitedTextImportOptions with properties:
   Format Properties:
     ...
    Variable Import Properties: 	Set types by name using setvartype
                VariableNames: {'Var1', 'Var2'}
                VariableTypes: {'char', 'double'}
        SelectedVariableNames: {'Var1', 'Var2'}
              VariableOptions: Show all 2 VariableOptions 
	Access VariableOptions sub-properties using setvaropts/getvaropts
   Location Properties:
                     DataLine: 1
            VariableNamesLine: 0
               RowNamesColumn: 0
            VariableUnitsLine: 0
     VariableDescriptionsLine: 0
>> type minh.txt
"00001000", 1.234
00001011, 2.345
>> detectImportOptions('minh.txt')
ans = 
   Format Properties:
   ...
       Variable Import Properties: 	Set types by name using setvartype
                VariableNames: {'Var1', 'Var2'}
                VariableTypes: {'double', 'double'}
        SelectedVariableNames: {'Var1', 'Var2'}
              VariableOptions: Show all 2 VariableOptions 
	Access VariableOptions sub-properties using setvaropts/getvaropts
   Location Properties:
                     DataLine: 1
            VariableNamesLine: 0
               RowNamesColumn: 0
            VariableUnitsLine: 0
     VariableDescriptionsLine: 0
>> type minh.txt
"00001000", 1.234
"00001011", 2.345
>> detectImportOptions('minh.txt')
ans = 
  DelimitedTextImportOptions with properties:
   Format Properties:
   ...
       Variable Import Properties: 	Set types by name using setvartype
                VariableNames: {'Var1', 'Var2'}
                VariableTypes: {'char', 'double'}
        SelectedVariableNames: {'Var1', 'Var2'}
              VariableOptions: Show all 2 VariableOptions 
	Access VariableOptions sub-properties using setvaropts/getvaropts
   Location Properties:
                     DataLine: 1
            VariableNamesLine: 0
               RowNamesColumn: 0
            VariableUnitsLine: 0
     VariableDescriptionsLine: 0
>> type minh.txt
'00001000', 1.234
'00001011', 2.345
>> detectImportOptions('minh.txt')
ans = 
  DelimitedTextImportOptions with properties:
   Format Properties:
                    Delimiter: {','}
                   Whitespace: '\b\t '
                   LineEnding: {'\n'  '\r'  '\r\n'}
                 CommentStyle: {}
    ConsecutiveDelimitersRule: 'split'
        LeadingDelimitersRule: 'keep'
                EmptyLineRule: 'skip'
                     Encoding: 'windows-1252'
   Replacement Properties:
                  MissingRule: 'fill'
              ImportErrorRule: 'fill'
             ExtraColumnsRule: 'addvars'
   Variable Import Properties: 	Set types by name using setvartype
                VariableNames: {'Var1', 'Var2'}
                VariableTypes: {'char', 'double'}
        SelectedVariableNames: {'Var1', 'Var2'}
              VariableOptions: Show all 2 VariableOptions 
	Access VariableOptions sub-properties using setvaropts/getvaropts
   Location Properties:
                     DataLine: 1
            VariableNamesLine: 0
               RowNamesColumn: 0
            VariableUnitsLine: 0
     VariableDescriptionsLine: 0
>> 

Minh Tran on 12 Aug 2019

"The conclusion is for such a file you'll have to ensure the correct format for each column if you can't guarantee the quoted string...and, since the testing here is pretty minimal so is possible other conditions could cause the symptom, it's basically required to make sure the VariableTypes option is set correctly when there's any doubt about how a particular column could be interpreted to ensure one reads it as expected."

dpb, thank you for digging into this. I'll play around with those test cases you mentioned and the VariableTypes property to get a better feel for the parser.

Since the ML function bin2dec does expect string input, then forcing that column to be 'char' would be the correct choice to override the default interpretation as decimal input.

You may need to develop a library of these import objects to fit the set of files you routinely deal with, but having done so your subsequent coding should become much cleaner.

That is a fine strategy.

dpb on 12 Aug 2019

Remember the options object is just a structure so you can simply SAVE the variable once created and debugged. You can either name variables to be identifiable and load specific ones based on the input file by name (or via an array or struct via field names) or use just one variable name and save multiple files to load...whatever seems to best fit with the rest of your application.

Sign in to comment.

Answer 2

dpb on 7 Aug 2019

Open in MATLAB Online

1 vote

>> load bin1
>> whos bin1
  Name         Size              Bytes  Class    Attributes
  bin1      8396x1             1122684  cell               
>> bin1{1}
ans = 
    "NAN"
>> whos ans
  Name      Size            Bytes  Class     Attributes
  ans       1x1               134  string              
>> dat=str2double(bin1);
>> whos dat
  Name         Size            Bytes  Class     Attributes
  dat       8396x1             67168  double              
>> dat(1)
ans =
   NaN
>> 

"More better" would be to attach a short portion of the text file and let's read it correctly first instead of having to fix up a mess later...

1 Comment
Show -1 older comments Hide -1 older comments

Minh Tran on 7 Aug 2019

Edited: Minh Tran on 7 Aug 2019

I included an additional attachment: Equinor_EngData_WHBB_Bin1_partial.txt

str2double(bin1) converted all values to NaN.

The data become non-NaN on row bin1{855}.

Sign in to comment.

How to replace all 1x1 cells containing 'NAN' with 'NaN'?

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

12 Comments
Show 10 older comments Hide 10 older comments

More Answers (1)

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Products

Release

Tags

Community Treasure Hunt

How to replace all 1x1 cells containing 'NAN' with 'NaN'?

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

12 Comments Show 10 older comments Hide 10 older comments

More Answers (1)

1 Comment Show -1 older comments Hide -1 older comments

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

12 Comments
Show 10 older comments Hide 10 older comments

1 Comment
Show -1 older comments Hide -1 older comments