Kindly, plz tell about the difference between noise and outlier in data mining????.....
I have read about it from internet but I am confusing both of them.....
No products are associated with this question.
I guess noise causes outliers, but not all outliers are caused by noise :)
Noise is anything that is not the "true" signal. It may have values close to your true signal. An outlier is something that is much different than the other values. The vast majority of time outliers are noise but sometimes a data point that is true signal can be an outlier. For example if I measured the IQ of my local high school plus Stephen Hawking. Stephen would be an outlier even though I accurately measured his IQ. But if I measured Stephen's IQ as 90, then that would be noise since his real IQ is much higher than that.
When a small piece of noise exceeds the standard deviation of the other noise by a factor of 3 (or 5?), it is called an outlier. An example:
x = [1,2,1,1,1,2,1,1,1,2,21,1,2,2]
This is a measurement of the number of persons you find public telephon cabins. There must be a certain noise, but one value is obviously an outlier. When you create a statistic about the measurement, there are some scientific reasons to remove the outlier before you calculate mean and standard deviation of the measurement.