File Exchange

image thumbnail

Peirce's method

version 1.1.0.0 (49.7 KB) by Marc
Applies Peirce's method for outlier rejection

0 Downloads

Updated 17 Nov 2015

View License

One of several approaches to outlier rejection, Peirce's method is more general than Chauvenet's method.
As constructed this works on univariate data only.
Implementation of algorithm laid out here: Stephen M. Ross, Journal of Engineering Technology, Fall 2003.

Function is in one file.
Supporting files include:
example/test script
2 sample images of function output (eg, cleaned data).

Comments and Ratings (7)

ashegay

Works well! I've replaced a line of code and added a function to deal with samples sizes above 60.

Replace the n=60 line in the if function with:
peirce_r(n,:) = [n,getR(n)];

--------------------------------------------------------------- FUNCTION --------------------------------------------------------------------------
% This function gets the R values for sample sizes above 60 in order to use
% in the Peirce outlier criterion formulation. The equation to determine
% the R values comes from:
% Porter, K., Hamburger, R., & Kennedy, R. (2007). Practical development
% and application of fragility functions. In Structural engineering research frontiers (pp. 1-16).
%
% By Alex Shegay

function R_values = getR(n)
% n = number of points in the sample
% R_values = 1x10 array of R values for 1-10 outliers

a=[0.4094 0.4393 0.4565 0.4680 0.477 0.4842 0.4905 0.4973 0.5046];
b=[0.991 0.6069 0.3725 0.2036 0.0701 -0.0401 -0.1358 -0.2242 -0.3079];

R_values = a.*log(n)+b;

Jacky

Marc

Updated the file with the bug fix for n > 65. Thanks Andy and other for the feedback.

Andy

Hello group,

This code is missing a line that enables it to assume n=60 when n>60. While it may be inappropriate to make this assumption, at least the code will run to completion by adding this line. Here is the original code:
------------------------
% find row index to use in table for this sample
n_ind = [];
if n < 61 && n > 2
n_ind = find(peirce_r(:,1) == n);
else
if n >= 61
n = 60;
warning('n > 60; using r values for n==60');
end
if n < 2
error('this function does not support samples this small... please rethink what you are doing');
return;
end
end
--------------------------

add this line after the warning statement: n_ind = find(peirce_r(:,1) == n); and you end up with this:

------------------------------
% find row index to use in table for this sample
n_ind = [];
if n < 61 && n > 2
n_ind = find(peirce_r(:,1) == n);
else
if n >= 61
n = 60;
warning('n > 60; using r values for n==60');
n_ind = find(peirce_r(:,1) == n); % this line is missing from downloaded version - schauer added 140915
end
if n < 2
error('this function does not support samples this small... please rethink what you are doing');
return;
end
end
---------------------------------

which is lines 42 through 56.

andy

Most tables for Peirce's R-value only go up to N = 60. It would be great if you could implement the method without the use of the table.

David Reed

Hi Marc,

Just thought you should know that your code fails to define n_ind if the original data is over 60.

Other than that its a great function.

MarC .

Here is some code that uses the fxn. Copy and paste-able.

% generate random data:

pop_mean = 20;
pop_sd = 2;

% 1) to see how method works on random samples drawn from a normal distribution, use
% this dataset
sample_data1 = pop_mean + pop_sd*randn(1, 25);

% 2) to see how method works on random data that has built-in tendency to
% include outliers, use this method
sample_data2 = pop_mean + pop_sd*randn(1, 20);
% add in 1-4 obs of potentially outlying data
funny_data1 = pop_mean + 2*pop_sd + 3*pop_sd*rand(1, 2 + round(rand(1)));
funny_data2 = pop_mean + 4*pop_sd*randn(1,4);

sample_data2 = [sample_data2 funny_data2];

% 3) Ross example data; contains 2 points which will be rejected by Peirce's
% method

Ross_data = [101.2, 90.0, 99.0, 102.0, 103.0, 100.2, 89.0, 98.1, 101.5, 102.0];

% pick which dataset should be used

%mydata = Ross_data;
mydata = sample_data2;

% shuffle data, for the fun of it
mydata = mydata(randperm(length(mydata)));

% apply peirce criterion

[cleaned_data outlier_data] = apply_peirce(mydata);

% prep data for display

% make filter -- to help with display purposes -- (might want to make this
% part of apply_peirce.m)
pass_peirce = zeros(size(mydata));
for j=1:length(cleaned_data)
pass_peirce = pass_peirce | mydata == cleaned_data(j);
end

obs_index = 1:length(mydata);

% plot data (kept == blue circles) and outliers (red x's)

figure(1);

plot(obs_index, mydata, '.k');
hold on;
plot(obs_index(pass_peirce), mydata(pass_peirce), 'ob', 'MarkerSize', 10);
plot(obs_index(~pass_peirce), mydata(~pass_peirce), 'xr', 'MarkerSize', 15);
hold off;

set(gca, 'XLim', [0 length(mydata)+1]);
set(gca, 'YLim', [min(mydata)-1 max(mydata)+1]);

xlabel('Observation Index');
ylabel('Variable of Interest');

title('Sample Data as Categorized by Peirce Criterion')
legend('datapoint', 'included', 'outlier', 'Location', 'SouthOutside')

Updates

1.1.0.0

bug fix -- thanks community! Should now work properly with n > 65.

Also - added the sample script file as well.

MATLAB Release Compatibility
Created with R2008a
Compatible with any release
Platform Compatibility
Windows macOS Linux