File Exchange

image thumbnail


version 1.0 (4.07 KB) by

code to remove outliers from a mulitvariate dataset using the median



No License

MEDOUTLIERFILT - remove outliers from a multivariate data set using the
median of each column

[stats_data, filtered_data] = medoutlierfilt(x,outlier_cut,plot_state)
removes possible outliers froma data set, X, by specifiying a cut off.
OUTLIER_CUT is a cut off is a multiple of the inter quartile range above
Q3 and below Q1, default value is the same as BOXPLOT function.

Plot_state = 1 for on, 0 for off, DEFAULT = ON

load count.dat;
[stats, filtered_data] = medoutlierfilt(count,1,1)

Inspired by quartile.m by Chris D. Larson
Colin Clarke 2006
Cranfield Univeristy

Statistics toolbox required for boxplot, if not specifiy boxplot off

As always comments and suggestions welcome!

Comments and Ratings (3)


what do i do if i dont want to delete the outliers but i want to change the value...

Ido Y

In general the code produces the expected results. Except for the above comments, I would suggest changing the plot lines to boxplot(x,'notch','on', 'whisker',outlier_cut). This way the user's choice of the outlier cut is visualized.

John D'Errico

Fairly good help, missing at least one item of importance. The one I noted is that while the variable outlier_cut has a default value, this default is not indicated in the help. What good is an undocumented default? I'd also like to be told if the outlier_cut variable must be a scalar, and what legal range of values it can tke on.

Next, suppose that someone wishes to supply a value for plot_state, but is willing to allow outlier_cut to take on its default value. They would like to call your code as

x_filt = medoutlierfilt(x,[],plot_state)

This would be consistent with the operation of most functions in matlab. The default checks in this code are purely in the form of

if nargin < 3
plot_state = 1;

Better is to use a check like

if (nargin < 3) || isempty(plot_state)
plot_state = 1;
if (nargin < 2) || isempty(outlier_cut)
outlier_cut = 1.5;

I did like that an example was provided, as well as an H1 line, although the H1 line was wrapped, cutting off part of it from the sight of lookfor. I also liked that the author included his name, plus an attribution to a prior code. There are a reasonable number of internal comments to make this code readable, even by my standards.

One comment about the example provided - it fails to run. The example in the help has two output arguments, but medoutlierfilt only returns one. Oops. I'll bet that an earlier version of the code had two outputs.

The arguments could also benefit from error checking. What legal values can these variables take on? What sizes may they be? What if someone accidentally passes in a vector?

One of the things we all should do is look at the mlint output on our codes. Mlint flags quite a few lines in this code. (Mlint is a very helpful tool, and since mlint flags can now be seen in the editor, there is no reason to not use it.)

Mlint points out that a few variables are built without benefit of preallocation. Some of these variables are clearly of a known size, so preallocate them.

Finally, I'll note that while the user can turn off the plots when they are unwanted, it might be useful to allow the user to also turn off the stats display that is written out at the end. This could be done easily enough by allowing the plot_state variable to take on one of 4 values: [0,1,2,3]. Dec2bin will unencode this input, and then will allow any combination of plots and display to be generated.

MATLAB Release
MATLAB 7.2 (R2006a)

Inspired by: Quartile & Percentile Calculation

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video

Win prizes and improve your MATLAB skills

Play today