Repeat for-loop from the beginning after recalculating values

14 views (last 30 days)
Ok, I am having a hard time figuring this out. I have a 6 X 6 matrix of values, in this case, pressure measurements in psi (P31.mat attached). I'm trying to apply rejection criteria for the outliers in this set of pressures. Each column of P31 is from a separate pressure transducer (6 measurements from each). I calculate the mean, standard deviation, and the number of elements (not including NaN) in each column using the first loop. In the second loop I calculate the deviation of each measurement (delta_31) from the mean (of the column).
In the third loop I use if-statements to determine a value for tau, according to the number of elements in each column. Then I multiply that tau value by the standard deviation of that column and compare each deviation value (delta_31). If delta_31(i,j) is greater than the calculated value for tau*stddev, then replace that delta value with a NaN, as well as the corresponding (i,j) value in P31. I then recalculate the stdev, mean, and number of elements. The problem is I'm trying to repeat the loop again to determine if there is another outlier in that column and I can't seem to get it. I thought that by stating i = 1, the loop should begin again, but it doesn't. Can anyone help? I apologize if this is confusing.
for k = 1:size(P31,2)
x_31(k) = mean(P31(:,k),'omitnan');
SDP_31(k) = std(P31(:,k),'omitnan');
n_31(k) = numel(P31(:,k)) - sum(isnan(P31(:,k)));
nu_31(k) = n_31(k) - 1;
end
for j = 3:size(P31,2)
for i = 1:size(P31,1)
delta_31(i,j) = abs(P31(i,j) - x_31(j));
end
for j = 3:size(P31,2)
for i = 1:size(P31,1)
% Conditional if-statements to determine the values for tau
if n_31(j) == 3
tau_31 = 1.150;
elseif n_31(j) == 4
tau_31 = 1.393;
elseif n_31(j) == 5
tau_31 = 1.572;
elseif n_31(j) ==6
tau_31 = 1.656;
elseif n_31(j) == 7
tau_31 = 1.711;
end
% Calculate tau*S to compare with the delta's
tauS_31(j) = tau_31*SDP_31(j);
% if-statement to replace outliers with NaN and recalculate
% everything. This actually works, but only for eliminating one
% outlier from the column.
if delta_31(i,j) > tauS_31(j)
delta_31(i,j) = NaN;
P31(i,j) = NaN;
x_31(j) = mean(P31(:,j),'omitnan');
SDP_31(j) = std(P31(:,j),'omitnan');
n_31(j) = numel(P31(:,j)) - sum(isnan(P31(:,j)));
for k = 1:size(P31,1)
delta_31(k,j) = abs(P31(k,j) - x_31(j));
end
i = 1; % this is supposed to start the i-loop again, but doesn't
else
i = i + 1;
end
end
end
  3 Comments
dpb
dpb on 6 Sep 2018
"I thought that by stating i = 1, the loop should begin again..."
No, that is pretty-much a standard implementation in (most) all (modern) programming languages--the loop limit is stored before the loop begins and in Matlab any change to the loop index variable is overwritten by for for the next iteration from a local copy.
dpb
dpb on 6 Sep 2018
You could use one of those routines simply by saving the original index order to rearrange the order back to original when done.
The Thompson tau is one-at-a-time so it's sufficient on each iteration to only test the largest absolute difference; the ordering is some algorithm's way to look at the two end points each iteration rather than find max(abs(difference)) each pass.

Sign in to comment.

Accepted Answer

dpb
dpb on 6 Sep 2018
Edited: dpb on 6 Sep 2018
Directly addressing the Q?, here's a basic function to do the removal--
function x=thompsontau(x)
% Return array x w/ outliers replaced w/ NaN from input x
% Each column in x is assumed to be variable, rows are observations
%
% dpbozarth -- 5Sep2018
[r,c]=size(x); % row, col dimensions
cols=1:c; % later lookup for columns if outliers exist
mn=mean(x,'omitnan'); % mean by column
sd=std(x,'omitnan'); % std deviation
n=sum(isfinite(x)); % finite observations
[mx,ix]=max(abs(x-mn)); % value, location (row) of max deviation
isOut=mx>fntau(n).*sd; % column locations are outliers if T
if any(isOut)
indxOut=sub2ind([r c],ix(isOut),cols(isOut)); % locations in x of outliers
x(indxOut)=nan; % mark as outlier
end
end
function tau = fntau(n)
% returns critical Thompson tau for n observations
tcrit=tinv(1-0.05/2,n-2); % critical t value
tau=tcrit./sqrt(n)./sqrt(n-2+tcrit.*tcrit).*(n-1);
end
This vectorizes your loop approach for the first case; your mission, should you choose to accept, is to now encapsulate this to do the subsequent tests until any(isOut) is false.
This could be done by calling the above in a loop or calling it recursively inside this function; will leave that as "exercise for the student" (which seems only appropriate for a t-test based algorithm :) )
NB: This operates on the full input array; if there's reason to not operate on some columns of an array, simply pass the subset. Or, as in this case, the first two columns will never have an outlier show up so it doesn't matter much, but for some datasets obviously that might not be so.
But, don't build such "magic constants" into functions that makes them unsuitable for other use as a general practice.
PS: HINT The recursive solution is by far the neater here and only takes inserting three additional code lines, all of which are consecutive, and two of which are Matlab branching keywords. :) (And, actually, the latter two aren't absolutely mandatory but I think it makes the code more legible to include them).
NB2: fntau uses tinv which requires Statistics TB. Lacking that, will have to revert to using a lookup table or finding another replacement for the t-value.
  12 Comments
PATRICK WAYNE
PATRICK WAYNE on 7 Sep 2018
Seriously, it works for all of the data sets. I've gone through each one step by step using the debugging feature to make sure it is doing what it is supposed to. All of them come out exactly like they should. Some run through 7 times, some only 3, etc. Remember that it resets the value of i if it detects an outlier. Also, when I run for multiple data sets, i also resets back to 1 before the end of the j-th loop. It is most definitely working.
dpb
dpb on 8 Sep 2018
Ah! My old eyes had missed that, indeed.
I still think just
if any(isOut)
...
x=thompsontau(x); % recursively check for any more
end
is much simpler.

Sign in to comment.

More Answers (2)

dpb
dpb on 6 Sep 2018
Edited: dpb on 6 Sep 2018
The first loop can be eliminated entirely--use the Matlab array syntax:
x_31=mean(P31,'omitnan');
SDP_31=std(P31,'omitnan');
n_31=sum(isfinite(P31));
nu_31=n_31-1;
I edited your code above to clean up the indenting...looks to me there's a missing end at some point; I didn't try to figure out where that might belong; the trivial assumption that may or may not be right is it's the last one for the big for...need to check on that.
Moving right along...
for j = 3:size(P31,2)
for i = 1:size(P31,1)
delta_31(i,j) = abs(P31(i,j) - x_31(j));
end
for j = 3:size(P31,2)
for i = 1:size(P31,1)
...
Oh. You've got nested loops on j...perhaps the missing end belongs here before the second j loop?
Like
for j = 3:size(P31,2)
for i = 1:size(P31,1)
delta_31(i,j) = abs(P31(i,j) - x_31(j));
end
end
for j = 3:size(P31,2)
for i = 1:size(P31,1)
...
maybe? If that's so then again use vector syntax--
delta_31(:,3:end)=abs(P31(:,3:end)-x_31);
Now you're down to the removal criteria, but what's the deal with only using the 3:end sensors instead of all six? I don't see what that's all about??? If you have outliers, seems like all sensors would be wanted to be checked.
I'm running out of time/energy here this evening but one last little detail to make code a little easier --
if n_31(j) == 3
tau_31 = 1.150;
elseif n_31(j) == 4
tau_31 = 1.393;
elseif n_31(j) == 5
tau_31 = 1.572;
elseif n_31(j) ==6
tau_31 = 1.656;
elseif n_31(j) == 7
tau_31 = 1.711;
end
% put this at very beginning of your script/function...
tau=[1.150, 1.393, 1.572, 1.656, 1.711]; % lookup table values
Then to use, simply write
tau_31(j)=tau(n_31(j)-2);
I'm sure most if not all of the remainder can also be vectorized with, probably, one loop on the outside to do the multiple comparisons.
Take a look at the way the preceding loops were reduced and give the rest a shot; I'll try to look in again in the morning.
  1 Comment
dpb
dpb on 6 Sep 2018
For tau, go back to definition...
tcrit=@(n) tinv(1-0.05/2,n-2); % t distribution t(alpha/2,n-2)
fntau=@(n) tcrit(n)./sqrt(n)./sqrt(n-2+tcrit(n).*tcrit(n)).*(n-1);
Then when you need tau, just write
tau=fntau(n);
This has been vectorized so can pass a vector of N

Sign in to comment.


Steven Lord
Steven Lord on 7 Sep 2018
I'm not familiar with that particular method. Is it close enough to one of the methods supported by the isoutlier function that you can use that function?
If it is not but is used often in a particular application area to detect outliers in data associated with that area, consider filing an enhancement request through Technical Support (using the Contact Us link in the upper-right corner of this page) describing that application area, giving more information and/or a reference for that algorithm, and asking for it to be added to isoutlier.
  3 Comments
PATRICK WAYNE
PATRICK WAYNE on 10 Sep 2018
Edited: PATRICK WAYNE on 10 Sep 2018
I used a textbook from years ago to reference this method: "Introduction to Engineering Experimentation", by Wheeler and Ganji (3rd Ed.), chapter 6. The other Statistics textbooks I have do not cover it at all.
dpb
dpb on 11 Sep 2018
I never ran across that text; the similar subject-matter I used was Bevington, Data Reduction and Error Analysis for the Physical Sciences, but it didn't touch on the subject. I see there is a revised edition now, but it seems to be virtually identical in content with just some references to there now being such a thing as a personal computer and available software.

Sign in to comment.

Products


Release

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!