There are several definitions for outliers. One of the more widely accepted interpretations on outliers comes from Barnett and Lewis, which defines outlier as “an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data”. However, the identification of outliers in data sets is far from clear given that suspicious observations may arise from low probability values from the same distribution or perfectly valid extreme values (tails) for example.
One alternative to minimize the effect of outliers is the use of robust statistics, which would solve the dilemma of removing/modifying observations that appear to be suspicious. When robust statistics are not practical for the problem in question, it is important to investigate and record the causes of the possible outliers, removing only the data points clearly identified as outliers.
Situations where the outliers causes are only partially identified require sound judgment and a realistic assessment of the practical implications of retaining outliers. Given that their causes are not clearly determined, they should still be used in the data analysis. Depending on the time and computing power constrains, it is often possible to make an informal assessment of the impact of the outliers by carrying out the analysis with and without the suspicious outliers.
This document shows different techniques to identify suspicious observations that would require further analysis and also tests to determine if some observations are outliers. Nevertheless, it would be dangerous to blindly accept the result of a test or technique without the judgment of an expert given the underlying assumptions of the methods that may be violated by the real data.
The following tests have been implemented:
• Z-scores
• Modified Z-scores
• Boxplot
• Adjusted Boxplot
• Generalized ESD Procedure
• Grubbs test
• Exponential Smoothing
• Kimber test for exponential distribution
• Moving Window Filtering Algorithm
Also, test files are available to check if the program is functioning on the specific platform.
I hope it will help.
Best wishes,
Francisco Alcaraz |