| Curve Fitting Toolbox™ | ![]() |
outliers = excludedata(xdata,ydata,MethodName,MethodValue)
outliers = excludedata(xdata,ydata,MethodName,MethodValue) identifies data to be excluded from a fit using the specified MethodName and MethodValue. outliers is a logical vector, with 1 marking predictors (xdata) to exclude and 0 marking predictors to include. Supported MethodName and MethodValue pairs are given in the table below.
MethodName | MethodValue |
|---|---|
'box' | A four-element vector specifying the edges of a closed box in the xy-plane, outside of which data is to be excluded from a fit. The vector has the form [xmin xmax ymin ymax]. |
'domain' | A two-element vector specifying the endpoints of a closed interval on the x-axis, outside of which data is to be excluded from a fit. The vector has the form [xmin xmax]. |
'indices' | A vector of indices specifying the data points to be excluded. |
'range' | A two-element vector specifying the endpoints of a closed interval on the y-axis, outside of which data is to be excluded from a fit. The vector has the form [ymin ymax]. |
You can combine data exclusion rules using logical operators. For example, to exclude data inside the box [-1 1 -1 1] or outside the domain [-2 2], use:
outliers1 = excludedata(xdata,ydata,'box',[-1 1 -1 1]); outliers2 = excludedata(xdata,ydata,'domain',[-2 2]); outliers = ~outliers1|outliers2;
You can visualize the combined exclusion rule using random data:
xdata = -3 + 6*rand(1,1e4); ydata = -3 + 6*rand(1,1e4); plot(xdata(~outliers),ydata(~outliers),'.') axis ([-3 3 -3 3]) axis square

Load the vote counts and county names for the state of Florida from the 2000 U.S. presidential election:
load flvote2k
Use the vote counts for the two major party candidates, Bush and Gore, as predictors for the vote counts for third-party candidate Buchanan, and plot the scatters:
plot(bush,buchanan,'rs')
hold on
plot(gore,buchanan,'bo')
legend('Bush data','Gore data')

Assume a model where a fixed proportion of Bush or Gore voters choose to vote for Buchanan:
f = fittype({'x'})
f =
Linear model:
f(a,x) = a*xExclude the data from absentee voters, who did not use the controversial "butterfly" ballot:
absentee = find(strcmp(counties,'Absentee Ballots'));
nobutterfly = excludedata(bush,buchanan,...
'indices',absentee);Perform a bisquare weights robust fit of the model to the two data sets, excluding absentee voters:
bushfit = fit(bush,buchanan,f,...
'Exclude',nobutterfly,'Robust','on');
gorefit = fit(gore,buchanan,f,...
'Exclude',nobutterfly,'Robust','on');Robust fits give outliers a low weight, so large residuals from a robust fit can be used to identify the outliers:
figure plot(bushfit,bush,buchanan,'rs','residuals') hold on plot(gorefit,gore,buchanan,'bo','residuals')

The residuals in the plot above can be computed as follows:
bushres = buchanan - feval(bushfit,bush); goreres = buchanan - feval(gorefit,gore);
Large residuals can be identified as those outside the range [-500 500]:
bushoutliers = excludedata(bush,bushres,...
'range',[-500 500]);
goreoutliers = excludedata(gore,goreres,...
'range',[-500 500]);The outliers for the two data sets correspond to the following counties:
counties(bushoutliers)
ans =
'Miami-Dade'
'Palm Beach'
counties(goreoutliers)
ans =
'Broward'
'Miami-Dade'
'Palm Beach'Miami-Dade and Broward counties correspond to the largest predictor values. Palm Beach county, the only county in the state to use the "butterfly" ballot, corresponds to the largest residual values.
![]() | differentiate | feval | ![]() |
| © 1984-2008- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |