Code covered by the BSD License  

Highlights from
Tests to Identify Outliers in Data Series

5.0

5.0 | 1 rating Rate this file 75 Downloads (last 30 days) File Size: 243 KB File ID: #28501

Tests to Identify Outliers in Data Series

by

 

This document includes several statistical tests to identify outliers in data series.

| Watch this File

File Information
Description

There are several definitions for outliers. One of the more widely accepted interpretations on outliers comes from Barnett and Lewis, which defines outlier as “an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data”. However, the identification of outliers in data sets is far from clear given that suspicious observations may arise from low probability values from the same distribution or perfectly valid extreme values (tails) for example.

One alternative to minimize the effect of outliers is the use of robust statistics, which would solve the dilemma of removing/modifying observations that appear to be suspicious. When robust statistics are not practical for the problem in question, it is important to investigate and record the causes of the possible outliers, removing only the data points clearly identified as outliers.

Situations where the outliers causes are only partially identified require sound judgment and a realistic assessment of the practical implications of retaining outliers. Given that their causes are not clearly determined, they should still be used in the data analysis. Depending on the time and computing power constrains, it is often possible to make an informal assessment of the impact of the outliers by carrying out the analysis with and without the suspicious outliers.

This document shows different techniques to identify suspicious observations that would require further analysis and also tests to determine if some observations are outliers. Nevertheless, it would be dangerous to blindly accept the result of a test or technique without the judgment of an expert given the underlying assumptions of the methods that may be violated by the real data.

The following tests have been implemented:

• Z-scores
• Modified Z-scores
• Boxplot
• Adjusted Boxplot
• Generalized ESD Procedure
• Grubbs test
• Exponential Smoothing
• Kimber test for exponential distribution
• Moving Window Filtering Algorithm

Also, test files are available to check if the program is functioning on the specific platform.

I hope it will help.

Best wishes,

Francisco Alcaraz

MATLAB release MATLAB 7.8 (R2009a)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (1)
19 Nov 2011 Juan Camilo Peña  

Contact us