How to replace all 1x1 cells containing 'NAN' with 'NaN'?

Attached is a cell array that I need to convert to an array of doubles (for plotting). The values were pulled from textfile (into a table) where Not-A-Number values were represented by the string "NAN". This isn't prefered since matlab's str2num function only recognizes 'NaN' and 'nan' and will only convert those two specific string values to NaN value.
I've tried a couple of things and was surprised this didn't work:
find(strcmp(bin1, 'NAN')); % 0×1 empty double column vector
find(strcmp(bin1, '"NAN"')); % 0×1 empty double column vector
So the tasks at hand are:
  1. Find all cell arrays containing the string 'NAN' and substitute it with 'NaN' or 'nan'
  2. Convert the cell array of 'NaN' strings and double into an array of doubles.
________________
Unrelated but this was the call I'd used to convert the table of strings to a cell array
bin1 = cellfun(@str2num, table2cell(table.WH.bin1.speed), 'UniformOutput', false);
And this was how I got a table of string values from the textfile:
table = readtable(filename, delimitedTextImportOptions('Delimiter', ','));

 Accepted Answer

dpb
dpb on 7 Aug 2019
Edited: dpb on 9 Aug 2019
opt=detectImportOptions('Equinor.txt');
t=readtable('Equinor.txt',opt);
>> t.TIMESTAMP=datetime(t.TIMESTAMP,'InputFormat','uuuu-MM-dd HH:mm:ss')
t =
13×22 table
TIMESTAMP RECORD BinNum Depth EastComp NorthComp Speed Direction VerticalVel ErrorVel Corr_1 Corr_2 Corr_3 Corr_4 Echo_1 Echo_2 Echo_3 Echo_4 Pgp_1 Pgp_2 Pgp_3 Pgp_4
____________________ ______ ______ _____ ________ _________ _____ _________ ___________ ________ ______ ______ ______ ______ ______ ______ ______ ______ _____ _____ _____ _____
08-Jun-2018 14:09:58 806 1 -8.8 NaN NaN NaN NaN NaN NaN 6 6 6 5 33 34 39 40 0 0 100 0
08-Jun-2018 14:19:58 807 1 -8.8 NaN NaN NaN NaN NaN NaN 6 6 6 6 33 34 39 41 0 0 100 0
08-Jun-2018 14:29:58 808 1 -8.8 NaN NaN NaN NaN NaN NaN 5 6 6 6 33 34 39 40 0 0 100 0
08-Jun-2018 14:39:58 809 1 -8.8 NaN NaN NaN NaN NaN NaN 6 6 6 6 33 34 39 41 0 0 100 0
03-Oct-2018 09:29:54 1498 1 -9.1 NaN NaN NaN NaN NaN NaN 6 5 6 6 38 38 42 40 0 0 100 0
03-Oct-2018 10:29:54 1504 1 -9.1 NaN NaN NaN NaN NaN NaN 6 6 6 5 38 38 42 40 0 0 100 0
03-Oct-2018 11:29:54 1510 1 15.4 -25.7 8.4 27 288.1 -3.5 2.9 125 125 120 125 107 106 118 108 0 0 3 96
03-Oct-2018 12:09:54 1514 1 15.4 -33.1 15.7 36.6 295.4 1.5 5.1 126 126 123 126 106 105 118 106 0 0 0 99
03-Oct-2018 12:19:54 1515 1 15.4 -34.1 15.9 37.6 295 0.7 6.3 126 127 122 126 104 104 115 106 0 0 0 98
03-Oct-2018 12:29:54 1516 1 15.4 -36.6 10.8 38.2 286.4 1.2 7.1 127 126 121 127 104 103 117 105 0 0 0 98
03-Oct-2018 12:39:54 1517 1 15.4 -38.3 6.3 38.8 279.3 -1.3 6.6 127 126 122 127 104 103 117 104 2 0 0 98
03-Oct-2018 12:49:54 1518 1 15.4 -37.6 3.5 37.8 275.3 0.7 5.9 126 127 123 127 103 102 116 105 2 0 0 97
03-Oct-2018 12:59:54 1519 1 15.4 -41.3 2.8 41.4 273.9 -1.3 5.6 126 126 121 125 104 102 118 104 4 0 3 92
>>
NB: I trimmed a big chunk of the "NAN" section out of the posted file...

12 Comments

I need to preserve the NaNs. The dimension of the table should be equal to the number of rows and columns in the data file (minus the header rows). I appreciate the convenience of:
opt=detectImportOptions('Equinor.txt');
but I need to know how to search-and-substitute strings in a cellarray.
Suppose, for whatever reason, that data file represented NaN values with the string "$$$". How would I find and substitute all rows from the 'Speed' column which contain '$$$' with 'NaN'?
" search-and-substitute strings in a cellarray. "
Why go at it the hard way? You can declare a missing value and substitute NaN in the import object if that's needed.
But the above retains the NaN; I just arbitrarily shortened the input file...there's no need for any more records to illustrate the same thing.
"Why go at it the hard way?"
Something to be learned and something to fall back on but I suppose the task of 'search and substitute' should be asked with a more simple example in a different question.
delimitedTextImportOptions does the job in this case but the call obscures a lot of what goes on.
The point is to read the file, correct? Use the tools MATLAB provides for the purpose...reading numeric data as text just makes it hard (as you've discovered).
If you just read the file with readtable, as you did, you confuse readtable royally, because the file has several header lines that aren't data that you're reading as if they were. readtable is pretty smart in deducing what is in a file and making guesses as to what it should do, but it needs some bread crumbs along the way to get things right for files that aren't relatively clean--and while this one isn't really bad, it isn't "plain vanilla!", either.
As demonstrated, the more thorough probing that detectImportOptions does got it figured out; readtable tried, but just isn't quite up to the task on its own.
>> table = readtable('Equinor.txt', 'Delimiter', ',')
table =
15×22 table
TIMESTAMP RECORD BinNum Depth EastComp NorthComp Speed Direction VerticalVel ErrorVel Corr_1 Corr_2 Corr_3 Corr_4 Echo_1 Echo_2 Echo_3 Echo_4 Pgp_1 Pgp_2 Pgp_3 Pgp_4
_____________________ ______ ______ ______ ________ _________ ______ _________ ___________ ________ ______ ______ ______ ______ ______ ______ ______ ______ _____ _____ _____ _____
'TS' 'RN' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' ''
'' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' '' ''
'2018-06-08 14:09:58' '806' '1' '-8.8' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' '6' '6' '6' '5' '33' '34' '39' '40' '0' '0' '100' '0'
'2018-06-08 14:19:58' '807' '1' '-8.8' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' '6' '6' '6' '6' '33' '34' '39' '41' '0' '0' '100' '0'
'2018-06-08 14:29:58' '808' '1' '-8.8' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' '5' '6' '6' '6' '33' '34' '39' '40' '0' '0' '100' '0'
'2018-06-08 14:39:58' '809' '1' '-8.8' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' '6' '6' '6' '6' '33' '34' '39' '41' '0' '0' '100' '0'
'2018-10-03 09:29:54' '1498' '1' '-9.1' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' '6' '5' '6' '6' '38' '38' '42' '40'
Everything is a char variable and you've got those other lines in there that just don't belong at all.
If you would just be willing to use even one or two of the input options to match the file if you're still adamant about ignoring the more powerful tools TMW gave you for the purpose--
>> table = readtable('Equinor.txt', 'Delimiter', ',','headerlines',3,'ReadVariableNames',1)
table =
13×22 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22
_____________________ ____ ____ ____ _______ ______ ______ _______ ______ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____
'2018-06-08 14:09:58' 806 1 -8.8 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 6 6 6 5 33 34 39 40 0 0 100 0
'2018-06-08 14:19:58' 807 1 -8.8 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 6 6 6 6 33 34 39 41 0 0 100 0
'2018-06-08 14:29:58' 808 1 -8.8 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 5 6 6 6 33 34 39 40 0 0 100 0
'2018-06-08 14:39:58' 809 1 -8.8 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 6 6 6 6 33 34 39 41 0 0 100 0
'2018-10-03 09:29:54' 1498 1 -9.1 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 6 5 6 6 38 38 42 40 0 0 100 0
'2018-10-03 10:29:54' 1504 1 -9.1 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 6 6 6 5 38 38 42 40 0 0 100 0
'2018-10-03 11:29:54' 1510 1 15.4 '-25.7' '8.4' '27' '288.1' '-3.5' '2.9' 125 125 120 125 107 106 118 108 0 0 3 96
'2018-10-03 12:09:54' 1514 1 15.4 '-33.1' '15.7' '36.6' '295.4' '1.5' '5.1' 126 126 123 126 106 105 118 106 0 0 0 99
'2018-10-03 12:19:54' 1515 1 15.4 '-34.1' '15.9' '37.6' '295' '0.7' '6.3' 126 127 122 126 104 104 115 106 0 0 0 98
'2018-10-03 12:29:54' 1516 1 15.4 '-36.6' '10.8' '38.2' '286.4' '1.2' '7.1' 127 126 121 127 104 103 117 105 0 0 0 98
'2018-10-03 12:39:54' 1517 1 15.4 '-38.3' '6.3' '38.8' '279.3' '-1.3' '6.6' 127 126 122 127 104 103 117 104 2 0 0 98
'2018-10-03 12:49:54' 1518 1 15.4 '-37.6' '3.5' '37.8' '275.3' '0.7' '5.9' 126 127 123 127 103 102 116 105 2 0 0 97
'2018-10-03 12:59:54' 1519 1 15.4 '-41.3' '2.8' '41.4' '273.9' '-1.3' '5.6' 126 126 121 125 104 102 118 104 4 0 3 92
>> table.Var5=str2double(table.Var5)
table =
13×22 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22
_____________________ ____ ____ ____ _____ ______ ______ _______ ______ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____
'2018-06-08 14:09:58' 806 1 -8.8 NaN 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 6 6 6 5 33 34 39 40 0 0 100 0
'2018-06-08 14:19:58' 807 1 -8.8 NaN 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 6 6 6 6 33 34 39 41 0 0 100 0
'2018-06-08 14:29:58' 808 1 -8.8 NaN 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 5 6 6 6 33 34 39 40 0 0 100 0
'2018-06-08 14:39:58' 809 1 -8.8 NaN 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 6 6 6 6 33 34 39 41 0 0 100 0
'2018-10-03 09:29:54' 1498 1 -9.1 NaN 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 6 5 6 6 38 38 42 40 0 0 100 0
'2018-10-03 10:29:54' 1504 1 -9.1 NaN 'NAN' 'NAN' 'NAN' 'NAN' 'NAN' 6 6 6 5 38 38 42 40 0 0 100 0
'2018-10-03 11:29:54' 1510 1 15.4 -25.7 '8.4' '27' '288.1' '-3.5' '2.9' 125 125 120 125 107 106 118 108 0 0 3 96
'2018-10-03 12:09:54' 1514 1 15.4 -33.1 '15.7' '36.6' '295.4' '1.5' '5.1' 126 126 123 126 106 105 118 106 0 0 0 99
'2018-10-03 12:19:54' 1515 1 15.4 -34.1 '15.9' '37.6' '295' '0.7' '6.3' 126 127 122 126 104 104 115 106 0 0 0 98
'2018-10-03 12:29:54' 1516 1 15.4 -36.6 '10.8' '38.2' '286.4' '1.2' '7.1' 127 126 121 127 104 103 117 105 0 0 0 98
'2018-10-03 12:39:54' 1517 1 15.4 -38.3 '6.3' '38.8' '279.3' '-1.3' '6.6' 127 126 122 127 104 103 117 104 2 0 0 98
'2018-10-03 12:49:54' 1518 1 15.4 -37.6 '3.5' '37.8' '275.3' '0.7' '5.9' 126 127 123 127 103 102 116 105 2 0 0 97
'2018-10-03 12:59:54' 1519 1 15.4 -41.3 '2.8' '41.4' '273.9' '-1.3' '5.6' 126 126 121 125 104 102 118 104 4 0 3 92
>>
is much easier than trying to do character substitutions...particularly when you've managed to get the data out of the table and into a more deeply-nested cell array that contains string() components in conjunction with non-character cells that have to be treated differently by logicals.
It's possible to write such things using cellfun and isstr() combined with isstring() (yes, Virginia, they're different!) but it's just complexity being introduced for no good reason.
I repeat...use the tools made for the job!!! :)
"The point is to read the file, correct? Use the tools MATLAB provides for the purpose...reading numeric data as text just makes it hard (as you've discovered) ... If you would just be willing to use even one or two of the input options to match the file if you're still adamant about ignoring the more powerful tools TMW gave you for the purpose"
Point taken. I'd rather not muck around with code to parse data. Thanks again for showing me the error of my ways :)
dpb
dpb on 8 Aug 2019
Edited: dpb on 11 Aug 2019
You're welcome...sometimes "tough love" is the right thing! <VBG>
But, it took TMW some 30 years to finally develop these tools for reading files with other than the crude fscanf and textread which are powerful in their own right but take a lot of effort; particularly if there were fixed-width fields because the C input parsing is far from ideal for numeric/scienfific data at least as often as not...
So, when there are finally these tools available, unless there are truly other compelling reasons to not use them, it only makes sense to do so.
OK, now that it's clear not leading down the wrong path, to convert as you initially asked from your cell array takes two steps...
>> bin1(850:860) % display the portion of array w/ "issues"...
ans =
11×1 cell array
{["NAN" ]}
{["NAN" ]}
{["NAN" ]}
{["NAN" ]}
{["NAN" ]}
{[ 27]}
{[36.6000]}
{[37.6000]}
{[38.2000]}
{[38.8000]}
{[37.8000]}
>> b=bin1(850:860); % just save the interesting part--all needed to illustrate
>> is=cellfun(@(s) isstring(s) & s=="NAN",b)
Error using ==
Comparison between double and string is not supported.
Error in @(s)isstring(s)&s=="NAN"
>>
NB: this is the kind of thing you run into when you have mixed data types in the cell array and need to take actions that only are valid for one of those types...also NB: isstring and NOT istr -- the latter returns all false because you have a string, not a cellstr (note the "" and not ''). Having additional logic to determine that and/or deal with both is yet another complexity and why you don't want to go this route....
Anyway, back to regularly-scheduled broadcast:
>> is=cellfun(@(s) isstring(s) && s=="NAN",b) % NB '&&' short-circuit operator
is =
11×1 logical array
1
1
1
1
1
0
0
0
0
0
0
>> b(is)={'NaN'} % Finally, can set to a cellstr of fixed value
b =
11×1 cell array
{'NaN' }
{'NaN' }
{'NaN' }
{'NaN' }
{'NaN' }
{[ 27]}
{[36.6000]}
{[37.6000]}
{[38.2000]}
{[38.8000]}
{[37.8000]}
>>
Just for the expedient of trying to not use detectImportOptions -- while I'll grant it is somewhat obtuse and complex, it's still far simpler than the alternative...
And, this as yet doesn't deal w/ the double() inside a cell inside a cell combined with a cellstr()...
IOW, "don't do that!" :)
NB: this is the kind of thing you run into when you have mixed data types in the cell array and need to take actions that only are valid for one of those types...also NB: isstring and NOT istr -- the latter returns all false because you have a string, not a cellstr (note the "" and not ''). Having additional logic to determine that and/or deal with both is yet another complexity and why you don't want to go this route....
I'll keep in mind to avoid mixed data types in tabulated data. It's a subtle point I've overlooked in the past but I can see how it complicates parsing.
In this case, I don't have control over the formatting of the table (which is decided by a 3rd party datalogger and its associated software) nor the values they contain (which comes from a 3rd party sensor).
Another table stored a single of byte data in ASCII: "00001000" which detectImportOptions interpreted as the number 1000. Luckily, I don't need to use that data for my immediate task.
"... I don't have control over the formatting of the table (which is decided by a 3rd party datalogger..."
That's often the case -- BUT the format for the table will be fixed, correct? If that is so, you go through the pain of building the correct import object once and save it -- then you have it for all time going forward.
"Another table stored a single of byte data in ASCII: "00001000" which detectImportOptions interpreted as the number 1000."
Again, as long as you know which table you're reading, you can fix up the initial options object to match how you wish the data to be imported. If the software wrote the quoted string, unless there was a record ahead of that in the column that was numeric, it would have been interpreted as character. If that were the case, then also fixing the beginning data line might be needed or whatever in order to read/skip the header properly. That's one of the features shown in the earlier example.
In setvartype, the choices include for numeric data to be interpreted as one of
Data MATLAB® Data Type
Text 'char' or 'string'
Numeric 'single','double','int8|16|32|64', 'uint8|16|32|64'
Since the ML function bin2dec does expect string input, then forcing that column to be 'char' would be the correct choice to override the default interpretation as decimal input.
You may need to develop a library of these import objects to fit the set of files you routinely deal with, but having done so your subsequent coding should become much cleaner.
"If the software wrote the quoted string, unless there was a record ahead of that in the column that was numeric, it would have been interpreted as character."
Well, that assumption actually isn't correct -- it seems that detectImportOptions is looking farther into the file before making that judgement. If the first record is string or character but the second is numeric, detectImportOptions returns numeric for the column. If the first two records are character, then numeric following that, the character form 'sticks'.
The conclusion is for such a file you'll have to ensure the correct format for each column if you can't guarantee the quoted string...and, since the testing here is pretty minimal so is possible other conditions could cause the symptom, it's basically required to make sure the VariableTypes option is set correctly when there's any doubt about how a particular column could be interpreted to ensure one reads it as expected.
>> type minh.txt
00001000, 1.234
00001011, 2.345
>> detectImportOptions('minh.txt')
ans =
DelimitedTextImportOptions with properties:
Format Properties:
...
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2'}
VariableTypes: {'char', 'double'}
SelectedVariableNames: {'Var1', 'Var2'}
VariableOptions: Show all 2 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Location Properties:
DataLine: 1
VariableNamesLine: 0
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
>> type minh.txt
"00001000", 1.234
00001011, 2.345
>> detectImportOptions('minh.txt')
ans =
Format Properties:
...
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2'}
VariableTypes: {'double', 'double'}
SelectedVariableNames: {'Var1', 'Var2'}
VariableOptions: Show all 2 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Location Properties:
DataLine: 1
VariableNamesLine: 0
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
>> type minh.txt
"00001000", 1.234
"00001011", 2.345
>> detectImportOptions('minh.txt')
ans =
DelimitedTextImportOptions with properties:
Format Properties:
...
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2'}
VariableTypes: {'char', 'double'}
SelectedVariableNames: {'Var1', 'Var2'}
VariableOptions: Show all 2 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Location Properties:
DataLine: 1
VariableNamesLine: 0
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
>> type minh.txt
'00001000', 1.234
'00001011', 2.345
>> detectImportOptions('minh.txt')
ans =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {','}
Whitespace: '\b\t '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'windows-1252'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2'}
VariableTypes: {'char', 'double'}
SelectedVariableNames: {'Var1', 'Var2'}
VariableOptions: Show all 2 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Location Properties:
DataLine: 1
VariableNamesLine: 0
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
>>
"The conclusion is for such a file you'll have to ensure the correct format for each column if you can't guarantee the quoted string...and, since the testing here is pretty minimal so is possible other conditions could cause the symptom, it's basically required to make sure the VariableTypes option is set correctly when there's any doubt about how a particular column could be interpreted to ensure one reads it as expected."
dpb, thank you for digging into this. I'll play around with those test cases you mentioned and the VariableTypes property to get a better feel for the parser.
Since the ML function bin2dec does expect string input, then forcing that column to be 'char' would be the correct choice to override the default interpretation as decimal input.
You may need to develop a library of these import objects to fit the set of files you routinely deal with, but having done so your subsequent coding should become much cleaner.
That is a fine strategy.
Remember the options object is just a structure so you can simply SAVE the variable once created and debugged. You can either name variables to be identifiable and load specific ones based on the input file by name (or via an array or struct via field names) or use just one variable name and save multiple files to load...whatever seems to best fit with the rest of your application.

Sign in to comment.

More Answers (1)

>> load bin1
>> whos bin1
Name Size Bytes Class Attributes
bin1 8396x1 1122684 cell
>> bin1{1}
ans =
"NAN"
>> whos ans
Name Size Bytes Class Attributes
ans 1x1 134 string
>> dat=str2double(bin1);
>> whos dat
Name Size Bytes Class Attributes
dat 8396x1 67168 double
>> dat(1)
ans =
NaN
>>
"More better" would be to attach a short portion of the text file and let's read it correctly first instead of having to fix up a mess later...

1 Comment

I included an additional attachment: Equinor_EngData_WHBB_Bin1_partial.txt
str2double(bin1) converted all values to NaN.
The data become non-NaN on row bin1{855}.

Sign in to comment.

Products

Release

R2019a

Asked:

on 7 Aug 2019

Commented:

dpb
on 12 Aug 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!