How to check and remove outliers when it is Non-normal distribution
Show older comments
I found that many people say z-score and mapstd standardization is good to detect outlier. But z-score is useful when only it is normal distribution. When I found my data doesn't follow normal distribution. What should I do? (1)Should i transform my data(boxcox,Johnson transformation) into normal distribution and use z-score to detect outlier? (2)After transformation and remove the outliers, should I use my transformed data or original data(outliers removed in both data) to be the input of neural network? I found that if I input my transformed data(Johnson transformation) into neural network, it works worse than the original data.How come is it?
Can anybody help.Thanks a lot.
Accepted Answer
More Answers (1)
Greg Heath
on 18 Nov 2015
2 votes
Regardless of the distribution, I find that a combination of zscore with plots of original and transformed data is sufficient for me to detect outliers. Whether points are deleted or replaced by a reduced value depends on how I interpret the plots.
If you have doubts you can always make multiple models based on original and modified data.
Hope this helps.
Thank you for formally accepting my answer
Greg
2 Comments
J1
on 19 Nov 2015
Greg Heath
on 19 Nov 2015
Outliers are usually isolated points that are the result of bad measurements or bad transcriptions. Therefore they should be removed. However, if you plot the data, very often you can guess the approximate true value of the measurement. Then you have the option of replacing the outlier with the approximation.
Categories
Find more on Hypothesis Tests in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!