Exclude data from fit
outliers = excludedata(xdata,ydata,
MethodName
,MethodValue)
outliers = excludedata(xdata,ydata,
identifies
data to be excluded from a fit using the specified MethodName
,MethodValue)MethodName
and MethodValue
. outliers
is
a logical vector, with 1
marking predictors (xdata
)
to exclude and 0
marking predictors to include.
Supported MethodName
and MethodValue
pairs
are given in the table below.
You can use the output outliers
as an input
to the fit
function in the Exclude
namevalue
pair argument. You can alternatively use the Exclude
argument
to specify excluded data as:
An expression describing a logical vector, e.g., x > 10.
A vector of integers indexing the points you want to exclude, e.g., [1 10 25].



 A fourelement vector specifying the edges of a closed
box in the xyplane, outside of which data is to
be excluded from a fit. The vector has the form 
 A twoelement vector specifying the endpoints of a closed
interval on the xaxis, outside of which data is
to be excluded from a fit. The vector has the form 
 A vector of indices specifying the data points to be excluded. 
 A twoelement vector specifying the endpoints of a closed
interval on the yaxis, outside of which data is
to be excluded from a fit. The vector has the form 
Load the vote counts and county names for the state of Florida from the 2000 U.S. presidential election:
load flvote2k
Use the vote counts for the two major party candidates, Bush and Gore, as predictors for the vote counts for thirdparty candidate Buchanan, and plot the scatters:
plot(bush,buchanan,'rs') hold on plot(gore,buchanan,'bo') legend('Bush data','Gore data')
Assume a model where a fixed proportion of Bush or Gore voters choose to vote for Buchanan:
f = fittype({'x'}) f = Linear model: f(a,x) = a*x
Exclude the data from absentee voters, who did not use the controversial "butterfly" ballot:
absentee = find(strcmp(counties,'Absentee Ballots')); nobutterfly = excludedata(bush,buchanan,... 'indices',absentee);
Perform a bisquare weights robust fit of the model to the two data sets, excluding absentee voters:
bushfit = fit(bush,buchanan,f,... 'Exclude',nobutterfly,'Robust','on'); gorefit = fit(gore,buchanan,f,... 'Exclude',nobutterfly,'Robust','on');
Robust fits give outliers a low weight, so large residuals from a robust fit can be used to identify the outliers:
figure plot(bushfit,bush,buchanan,'rs','residuals') hold on plot(gorefit,gore,buchanan,'bo','residuals')
The residuals in the plot above can be computed as follows:
bushres = buchanan  feval(bushfit,bush); goreres = buchanan  feval(gorefit,gore);
Large residuals can be identified as those outside the range [500
500]
:
bushoutliers = excludedata(bush,bushres,... 'range',[500 500]); goreoutliers = excludedata(gore,goreres,... 'range',[500 500]);
The outliers for the two data sets correspond to the following counties:
counties(bushoutliers) ans = 'MiamiDade' 'Palm Beach' counties(goreoutliers) ans = 'Broward' 'MiamiDade' 'Palm Beach'
MiamiDade and Broward counties correspond to the largest predictor values. Palm Beach county, the only county in the state to use the "butterfly" ballot, corresponds to the largest residual values.
You can combine data exclusion rules using logical operators.
For example, to exclude data inside the box [1
1 1 1]
or outside the domain [2
2]
, use:
outliers1 = excludedata(xdata,ydata,'box',[1 1 1 1]); outliers2 = excludedata(xdata,ydata,'domain',[2 2]); outliers = ~outliers1outliers2;
You can visualize the combined exclusion rule using random data:
xdata = 3 + 6*rand(1,1e4); ydata = 3 + 6*rand(1,1e4); plot(xdata(~outliers),ydata(~outliers),'.') axis ([3 3 3 3]) axis square