apparently the command rmoutliers does not do the job correctly

Hello,
It seems to me that the command "rmoutliers" has some problems. To make things clear I explain using an example as bellow:
>> a=[1 2 3 4 1000 6;2 5000 3 4 0 1];a
[b1,idx1] = rmoutliers(a(1,:));[b2,idx2] = rmoutliers(a(2,:));[b,idx] = rmoutliers(a,2);
idx1
idx2
idx
a =
1 2 3 4 1000 6
2 5000 3 4 0 1
idx1 =
0 0 0 0 1 0
idx2 =
0 1 0 0 0 0
idx =
0 0 0 0 0 0
This does not make sense to me. so, if I am right then this command needs to be corrected. The second problem with this command is that it apparently can no longer have 3 outputs (unlike what is written in the corresponding matlab page). So, if you try the following then you get error message:
[b,idx,u] = rmoutliers(a,2);
Any comment?
thsnks in advance!
Babak

3 Comments

I give you 2 numbers:
0.001 10000
please tell me which one ist the outlier - can you do this? Or wil you need at least a third number to decide which one is the outlier?
It is only possible to decide this with at least 3 numbers:
0.001 10000 0.006
or
0.001 10000 10029
So, if I understood correctly the command [b,idx] = rmoutliers(a,2); should find outliers in each row and then remove the corresponding columns? If my understanding is correct then this command does not do the job correctly
"So, if I understood correctly the command [b,idx] = rmoutliers(a,2); should find outliers in each row..."
No. Nowhere in the RMOUTLIERS documentation is it stated that RMOUTLIERS checks anything other than columns: the documentation states for a matrix "...then rmoutliers detects outliers in each column of A separately..."
"...and then remove the corresponding columns?"
Yes. The DIM argument is specifically described as "specifies the dimension of A for which to remove entries when an outlier is detected using any of the previous syntaxes. For example, rmoutliers(A,2) removes columns instead of rows for a matrix A" (bold emphasis added). Note that the DIM description does not state that it changes which dimension the MEDIAN is calculated over, all this option changes is whether rows/columns are removed.
Simple solution: transpose the input matrix.

Sign in to comment.

 Accepted Answer

Rather than going directly to rmoutliers I recommend using isoutlier to detect the outliers then process the resulting logical array.
a=[1 2 3 4 1000 6;2 5000 3 4 0 1];
[b1,idx1] = rmoutliers(a(1,:));
[b2,idx2] = rmoutliers(a(2,:));
bRow = isoutlier(a, 2)
bRow = 2×6 logical array
0 0 0 0 1 0 0 1 0 0 0 0
columnsWithOutliers = any(bRow, 1)
columnsWithOutliers = 1×6 logical array
0 1 0 0 1 0
originalData = a % Make a copy so you can compare the original and processed data
originalData = 2×6
1 2 3 4 1000 6 2 5000 3 4 0 1
a(:, columnsWithOutliers) = []
a = 2×4
1 3 4 6 2 3 4 1

1 Comment

Hi Steven,
Thanks a lot. Indeed, 'isoutlier' is a very useful command and does the job very good.
I do not understand why the commands 'isoutlier' and 'rmoutliers' are not consistant with each other. The following backs up my claim:
>> a=[1 2 3 4 1000 6;2 5000 3 4 0 1];
isoutlier(a, 2)
[c,d]=rmoutliers(a,2)
0 0 0 0 1 0
0 1 0 0 0 0
c =
1 2 3 4 1000 6
2 5000 3 4 0 1
d =
0 0 0 0 0 0
Anyway, it is like this !!!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!