How to calculate the number of outliers from a data set according to the required maximum deviation of the remaining values

2 views (last 30 days)
Dear friends,
I'm looking for a simple promgram to do the following work.
Given a dataset of N elements:{x1,x2,...,xN}. The max-min value of this dataset is larger than D. Now I'm allowed to remove K elements of the data set, so that the remaining M-K elements satisfiy max-min<D. The problem is how to calculate the minimum possible value of K.
Now I have a program by sorting the data set and then using a "while" loop to remove one by one until finding K. But this method is toooo slow when my dataset is large, for example when N is several millions.
Does anyone have a better solution? This is more like a mathmatical problem to solve.
Thanks.
  6 Comments
Cris LaPierre
Cris LaPierre on 12 May 2021
You could explore options interactively using the Remove Outliers task in a live script. See here for more info. Once you find the appropriate settings, you can convert the task to code and reuse that in your script (or just keep the task).
Jeff Miller
Jeff Miller on 13 May 2021
You don't need to sort but just keep track of the current min and max after each exclusion, and this might speed things up a bit. I think it will depend on whether K is small relative to N.
Can there be ties between different x elements?

Sign in to comment.

Answers (0)

Categories

Find more on Matrices and Arrays in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!