Stats functions for tables containing "nan" and "inf" (mean, mode, std)

Hi, I am trying to find the mean and std of a column of a table. Unfortunately the presence of "NAN" and "inf" are giving me an infinite mean and NAN std.
Those data points i would like to be ignored, as i know what they mean with respect to the experiment, and are not relevant to the mean and std i am looking for.
What I have is:
clear all, clc, close all
%T9: Attempt at periodic mechanical mirror tilt (screws were able to be
%tilted, both for x and y by roughly 3pi/2 rads, forward and then back)
%Load data CSV
T9t=readtable('LF_BS_MECHTILT_T9.csv');
T9 = T9t( ~any (isnan(T9t) | isinf(T9t), 2),:)
figure
subplot(211)
plot(T9{:,{'ms'}} , str2double(T9{:,{'X1_um_','Y1_um_'}}))
legend({'x1(um)', 'y1(um)'}, 'location', "best")
title('Test 9: Attempt at periodic mechanical mirror tilt (Delta t from human error)')
xlabel('Time (seconds)');
ylabel('Distance From Centre (microns)');grid on, grid minor;
subplot(212)
plot(T9{:,{'ms'}} , T9{:,{'I1_V_'}})
legend({'Intensity (Volts)'}, 'location', "best")
title('Test 9 (I(V))')
xlabel('Time (seconds)');
ylabel('Intensity (Volts)');grid on, grid minor;
T9NCMX=mean(str2double(T9{1:9590, {'X1_um_'}}), 'omitnan');
T9NCMY=mean(str2double(T9{1:9590, {'Y1_um_'}}), 'omitnan');
T9NCSTDX=std(str2double(T9{1:9590, {'X1_um_'}}), 'omitnan');
T9NCSTDY=std(str2double(T9{1:9590, {'Y1_um_'}}), 'omitnan');
T9CMX=mean(str2double(T9{9590:17869, {'X1_um_'}}), 'omitnan');
T9CMY=mean(str2double(T9{9590:17869, {'Y1_um_'}}), 'omitnan');
T9CSTDX=std(str2double(T9{9590:17869, {'X1_um_'}}), 'omitnan');
T9CSTDY=std(str2double(T9{9590:17869, {'Y1_um_'}}), 'omitnan');
You can probably tell i have already tried quite a lot of different things and have thus gotten myself lost in a long list of incremental problemsolving attempts.
The original problem and solution that lead me to this format before trying to do stats on it can be found here:
The context is likely to be valuable if you are kind enough to help.
The relevant data file is attached, if you require anything else to be able to help then please let me know.
Thank you.

Answers (2)

filename = 'https://www.mathworks.com/matlabcentral/answers/uploaded_files/668733/LF_BS_MECHTILT_T9.csv';
T9t = readtable(filename);
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
T9 = rmmissing(T9t);
subset = T9{:,{'X1_um_','Y1_um_'}};
whos subset
Name Size Bytes Class Attributes subset 16346x2 261536 double
So you are already extracting double but you are trying to str2double() that, which is going to give you nan.
Note also the code repair for removing missing data.

4 Comments

I see you are running this on R2021a, this is not the behaviour in 2019b, which is what i am using. This was also explained in the link i provided at the bottom of the original post and is specified on the right hand side of the question itself under "Release"
I take it i am actually best just updating MATLAB? this was part of the issue in the first instance
str2double is actually necessary, in this version, i couldn't do anything with the data containing "nan"s until i converted from string to double. This does not return NAN for all elements when i am using the plot function
Tested in R2019b.
(The bit about USER is testing to see whether the code is being run in MATLAB Answers, in which case the file is to be read from the URL, and otherwise the file is to be read locally.)
You were getting NaN because of the str2double(), which were not necessary if you did a simple options detection.
You were also getting NaN because you had four lines with infinities, one of which is line 6702 of your file.
67460 inf inf 0.01 0.001 0.000 0.12 4.97 4.97 5.01 5.02
NAME = getenv('USER');
if strcmp(NAME, 'mluser')
filename = 'https://www.mathworks.com/matlabcentral/answers/uploaded_files/668733/LF_BS_MECHTILT_T9.csv';
else
filename = 'LF_BS_MECHTILT_T9.csv';
end
opt = detectImportOptions(filename);
T9t = readtable(filename, opt);
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
T9t = standardizeMissing(T9t, inf, 'Datavariables', {'X1_um_', 'Y1_um_'});
T9 = rmmissing(T9t);
subplot(2,1,1);
plot(T9{:,{'ms'}} , T9{:,{'X1_um_','Y1_um_'}});
legend({'x1(um)', 'y1(um)'}, 'location', "best");
title('Test 9: Attempt at periodic mechanical mirror tilt (Delta t from human error)');
xlabel('Time (seconds)');
ylabel('Distance From Centre (microns)'); grid on; grid minor;
subplot(2,1,2);
plot(T9{:,{'ms'}} , T9{:,{'I1_V_'}});
legend({'Intensity (Volts)'}, 'location', "best");
title('Test 9 (I(V))')
xlabel('Time (seconds)');
ylabel('Intensity (Volts)');grid on, grid minor;
Nsize = 9590;
last = min(17869, height(T9));
T9NCMX = mean(T9{1:Nsize, {'X1_um_'}}, 'omitnan') %#ok<*NOPTS>
T9NCMX = -153.1838
T9NCMY = mean(T9{1:Nsize, {'Y1_um_'}}, 'omitnan')
T9NCMY = -235.9715
T9NCSTDX = std(T9{1:Nsize, {'X1_um_'}}, 'omitnan')
T9NCSTDX = 374.1832
T9NCSTDY = std(T9{1:Nsize, {'Y1_um_'}}, 'omitnan')
T9NCSTDY = 333.0656
T9CMX = mean(T9{Nsize+1:last, {'X1_um_'}}, 'omitnan')
T9CMX = -13.2349
T9CMY = mean(T9{Nsize+1:last, {'Y1_um_'}}, 'omitnan')
T9CMY = -10.2050
T9CSTDX = std(T9{Nsize+1:last, {'X1_um_'}}, 'omitnan')
T9CSTDX = 84.6210
T9CSTDY = std(T9{Nsize+1:last, {'Y1_um_'}}, 'omitnan')
T9CSTDY = 77.1382
Also, the file you supplied was shorter than how many you were trying to read... or at least was after removing missing items.
Speaking of which... you use a fixed 9590 to break the data, but since missing data is being removed, you should find probably find some other way. For example,
NAME = getenv('USER');
if strcmp(NAME, 'mluser')
filename = 'https://www.mathworks.com/matlabcentral/answers/uploaded_files/668733/LF_BS_MECHTILT_T9.csv';
else
filename = 'LF_BS_MECHTILT_T9.csv';
end
opt = detectImportOptions(filename);
T9t = readtable(filename, opt);
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
firstdataline = opt.DataLines(1);
T9t.LineNo = (1:height(T9t)).' + (firstdataline - 1);
T9t = standardizeMissing(T9t, inf, 'Datavariables', {'X1_um_', 'Y1_um_'});
T9 = rmmissing(T9t);
subplot(2,1,1);
plot(T9{:,{'ms'}} , T9{:,{'X1_um_','Y1_um_'}});
legend({'x1(um)', 'y1(um)'}, 'location', "best");
title('Test 9: Attempt at periodic mechanical mirror tilt (Delta t from human error)');
xlabel('Time (seconds)');
ylabel('Distance From Centre (microns)'); grid on; grid minor;
subplot(2,1,2);
plot(T9{:,{'ms'}} , T9{:,{'I1_V_'}});
legend({'Intensity (Volts)'}, 'location', "best");
title('Test 9 (I(V))')
xlabel('Time (seconds)');
ylabel('Intensity (Volts)');grid on, grid minor;
Nend = 9590; Mend = 17869;
Nend = find(T9.LineNo <= Nend, 1, 'last')
Nend = 8884
T9.LineNo(Nend)
ans = 9590
Mend = find(T9.LineNo <= Mend, 1, 'last')
Mend = 16338
T9.LineNo(Mend)
ans = 17869
Nrange = 1 : Nend;
Mrange = Nend+1:Mend;
T9NCMX = mean(T9{Nrange, {'X1_um_'}}, 'omitnan') %#ok<*NOPTS>
T9NCMX = -164.2345
T9NCMY = mean(T9{Nrange, {'Y1_um_'}}, 'omitnan')
T9NCMY = -253.0831
T9NCSTDX = std(T9{Nrange, {'X1_um_'}}, 'omitnan')
T9NCSTDX = 385.8182
T9NCSTDY = std(T9{Nrange, {'Y1_um_'}}, 'omitnan')
T9NCSTDY = 338.2043
T9CMX = mean(T9{Mrange, {'X1_um_'}}, 'omitnan')
T9CMX = -13.3237
T9CMY = mean(T9{Mrange, {'Y1_um_'}}, 'omitnan')
T9CMY = -11.1980
T9CSTDX = std(T9{Mrange, {'X1_um_'}}, 'omitnan')
T9CSTDX = 85.0398
T9CSTDY = std(T9{Mrange, {'Y1_um_'}}, 'omitnan')
T9CSTDY = 83.9901

Sign in to comment.

thank you,
What is "options detection" ?
"You were also getting NaN because you had four lines with infinities, one of which is line 6702 of your file.
67460 inf inf 0.01 0.001 0.000 0.12 4.97 4.97 5.01 5.02"
This i was aware of, telling matlab not count them was the issue.

5 Comments

great, thanks! I really appreciate your help.
I am currently upgrading to 2021a.
I appreciate your help, but fear that my smooth-brain will require further assistance once i am up and running.
"Speaking of which... you use a fixed 9590 to break the data, but since missing data is being removed, you should find probably find some other way. For example"
Oh this is interesting, i must not understand how MATLAB handles that then. My assumption had been:
if we have an array with 1000elements, and there are some nans and infs in the mix at, say, elements 4 and 8, that matlab was going through each element saying "yep i can count that one" until reaching elements 4 where it says: "no not you", and then moves on to element 5, still considering it to be element 5 and moving through the array like that.
So, what you're saying is that it looks at the whole array as one thing, removes elements 4 and 5, which leaves us with 998 elements, but it still counts them from 1-998 rather than 1-1000 while skipping nan and inf?
thanks
Go back to your original code. Right after reading the table, you had
T9 = T9t( ~any (isnan(T9t) | isinf(T9t), 2),:)
which selects only certain rows of T9t, discarding the rest. But it does not leave "holes" when it does that: the rows after the discarded entries "fall down" to fill the holes. So you might know something was line 9450 originally, but it has "fallen down" to lower-numbered slot, and MATLAB does not keep track of where it used to be. The kind of numeric indexing you are using is always relative to what is actually present.
The workaround is to attach row numbers before discarding rows, so you can figure out what row something was originally.
Ah, yes you are correct, i was being stupid, that line was the last thing i could think of trying before needing to ask for help, so i had kind of forgotten i had tried it there.
Thank you this has really helped contextualise what i have done, and not understood.

Sign in to comment.

Products

Release

R2019b

Asked:

on 29 Jun 2021

Commented:

on 30 Jun 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!