One of several approaches to outlier rejection, Peirce's method is more general than Chauvenet's method.

As constructed this works on univariate data only.

Implementation of algorithm laid out here: Stephen M. Ross, Journal of Engineering Technology, Fall 2003.

Function is in one file.

Supporting files include:

example/test script

2 sample images of function output (eg, cleaned data).

Marc (2020). Peirce's method (https://www.mathworks.com/matlabcentral/fileexchange/21562-peirce-s-method), MATLAB Central File Exchange. Retrieved .

1.1.0.0 | bug fix -- thanks community! Should now work properly with n > 65. Also - added the sample script file as well. |

Created with
R2008a

Compatible with any release

Create scripts with code, output, and formatted text in a single executable document.

ashegayWorks well! I've replaced a line of code and added a function to deal with samples sizes above 60.

Replace the n=60 line in the if function with:

peirce_r(n,:) = [n,getR(n)];

--------------------------------------------------------------- FUNCTION --------------------------------------------------------------------------

% This function gets the R values for sample sizes above 60 in order to use

% in the Peirce outlier criterion formulation. The equation to determine

% the R values comes from:

% Porter, K., Hamburger, R., & Kennedy, R. (2007). Practical development

% and application of fragility functions. In Structural engineering research frontiers (pp. 1-16).

%

% By Alex Shegay

function R_values = getR(n)

% n = number of points in the sample

% R_values = 1x10 array of R values for 1-10 outliers

a=[0.4094 0.4393 0.4565 0.4680 0.477 0.4842 0.4905 0.4973 0.5046];

b=[0.991 0.6069 0.3725 0.2036 0.0701 -0.0401 -0.1358 -0.2242 -0.3079];

R_values = a.*log(n)+b;

JackyMarcUpdated the file with the bug fix for n > 65. Thanks Andy and other for the feedback.

AndyHello group,

This code is missing a line that enables it to assume n=60 when n>60. While it may be inappropriate to make this assumption, at least the code will run to completion by adding this line. Here is the original code:

------------------------

% find row index to use in table for this sample

n_ind = [];

if n < 61 && n > 2

n_ind = find(peirce_r(:,1) == n);

else

if n >= 61

n = 60;

warning('n > 60; using r values for n==60');

end

if n < 2

error('this function does not support samples this small... please rethink what you are doing');

return;

end

end

--------------------------

add this line after the warning statement: n_ind = find(peirce_r(:,1) == n); and you end up with this:

------------------------------

% find row index to use in table for this sample

n_ind = [];

if n < 61 && n > 2

n_ind = find(peirce_r(:,1) == n);

else

if n >= 61

n = 60;

warning('n > 60; using r values for n==60');

n_ind = find(peirce_r(:,1) == n); % this line is missing from downloaded version - schauer added 140915

end

if n < 2

error('this function does not support samples this small... please rethink what you are doing');

return;

end

end

---------------------------------

which is lines 42 through 56.

andy

Jonathan ListerMost tables for Peirce's R-value only go up to N = 60. It would be great if you could implement the method without the use of the table.

David ReedHi Marc,

Just thought you should know that your code fails to define n_ind if the original data is over 60.

Other than that its a great function.

MarC .Here is some code that uses the fxn. Copy and paste-able.

% generate random data:

pop_mean = 20;

pop_sd = 2;

% 1) to see how method works on random samples drawn from a normal distribution, use

% this dataset

sample_data1 = pop_mean + pop_sd*randn(1, 25);

% 2) to see how method works on random data that has built-in tendency to

% include outliers, use this method

sample_data2 = pop_mean + pop_sd*randn(1, 20);

% add in 1-4 obs of potentially outlying data

funny_data1 = pop_mean + 2*pop_sd + 3*pop_sd*rand(1, 2 + round(rand(1)));

funny_data2 = pop_mean + 4*pop_sd*randn(1,4);

sample_data2 = [sample_data2 funny_data2];

% 3) Ross example data; contains 2 points which will be rejected by Peirce's

% method

Ross_data = [101.2, 90.0, 99.0, 102.0, 103.0, 100.2, 89.0, 98.1, 101.5, 102.0];

% pick which dataset should be used

%mydata = Ross_data;

mydata = sample_data2;

% shuffle data, for the fun of it

mydata = mydata(randperm(length(mydata)));

% apply peirce criterion

[cleaned_data outlier_data] = apply_peirce(mydata);

% prep data for display

% make filter -- to help with display purposes -- (might want to make this

% part of apply_peirce.m)

pass_peirce = zeros(size(mydata));

for j=1:length(cleaned_data)

pass_peirce = pass_peirce | mydata == cleaned_data(j);

end

obs_index = 1:length(mydata);

% plot data (kept == blue circles) and outliers (red x's)

figure(1);

plot(obs_index, mydata, '.k');

hold on;

plot(obs_index(pass_peirce), mydata(pass_peirce), 'ob', 'MarkerSize', 10);

plot(obs_index(~pass_peirce), mydata(~pass_peirce), 'xr', 'MarkerSize', 15);

hold off;

set(gca, 'XLim', [0 length(mydata)+1]);

set(gca, 'YLim', [min(mydata)-1 max(mydata)+1]);

xlabel('Observation Index');

ylabel('Variable of Interest');

title('Sample Data as Categorized by Peirce Criterion')

legend('datapoint', 'included', 'outlier', 'Location', 'SouthOutside')