Calculate Sn (like median absolute deviation) using matlab.

Question

Witold on 18 Apr 2013

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/72597-calculate-sn-like-median-absolute-deviation-using-matlab

Edited: pete on 25 Apr 2016

Hi, I'm trying to compute Sn value based on : "Alternatives to the Median Absolute Deviation Peter J. Rousseeuw and Christophe Croux" paperwork.

According to this article:

Sn = c*median_i{median_j|xi - xj|}

For each i we compute the median of {|xi-xj|; j = 1,...,n}. This yields n numbers, the median of which gives our final estimate Sn. C is constant value.

I have Y matrix of size 10000 x 2000 (i = 2000, j = 10000) Because of the size, the simplest loop method:

for i = 1 : size(Y,2)
    for j = 1 : size(Y,1)
        a(j) = median( abs( Y(:,i)-Y(j,i) ) );
    end
    Sn(i) = c * median(a);
end

is time consuming, so is no good at all. I'm matlab newbie and I don't know how to use _repmat _ function which will be - as I guess - very helpful. Can I ask you for help, how to write this so that the computation time will be comparable to _mad _ function?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Andrei Bobrov on 18 Apr 2013

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/72597-calculate-sn-like-median-absolute-deviation-using-matlab#answer_82735

Edited: Andrei Bobrov on 18 Apr 2013

s = [3 1130;
     4 1527;
     3  907;
     2  878;
     4  995];
k = permute(abs( bsxfun(@minus,s,permute(s,[3 2 1])) ),[1 3 2]);
out = squeeze(median(median(k)));

ADD

out = zeros(1,size(s,2));
for jj = 1:size(s,2)
    out(jj) = median(median(abs(bsxfun(@minus,s(:,jj),s(:,jj)'))));
end

ADD2 use function mad from Statistics Toolbox

out = mad(s,1);

2 Comments
Show NoneHide None

Witold on 18 Apr 2013

Thank you very much, that's what I was thinking about, unfortunately I got the error message:

Error using bsxfun
Out of memory. Type HELP MEMORY for your options.

memory available: win 8 x64 / 8GB/ r1012b

Maximum possible array:      6163 MB (6.462e+09 bytes) *
Memory available for all arrays:      6163 MB (6.462e+09 bytes) *
Memory used by MATLAB:      1009 MB (1.058e+09 bytes)
Physical Memory (RAM):      8157 MB (8.553e+09 bytes)
*  Limited by System Memory (physical + swap file) available.

should I give up? or mayby try to split columns of the matrix

Andrei Bobrov on 18 Apr 2013

see ADD

Sign in to comment.

Answer 2

Image Analyst on 18 Apr 2013

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/72597-calculate-sn-like-median-absolute-deviation-using-matlab#answer_82715

I think you're totally misinterpreting the formula. You don't look at the difference of all possible pairs of points - that would take forever and isn't what the formula says. Look at Wikipedia: http://en.wikipedia.org/wiki/Median_absolute_deviation For each element, you'd have a million differences. And there are a million elements so you'd have a billion values. No wonder it takes forever! The i and j DO NOT REFER TO ROW AND COLUMN. They refer to element, or "linear index" in MATLAB lingo.

I think what they mean is to take the mean over all elements (pixels) and then take the median of the differences, which is totally different. So

% Get the median value of all elements (pixels).
medianOfWholeMatrix = median(Y(:));
% Find the difference between each element and the overall median.
differenceImage = double(Y) - medianOfWholeMatrix;
% Now take the median of that:
madValue = median(differenceImage (:));

This should be substantially faster, not to mention correct.

3 Comments
Show 1 older commentHide 1 older comment

Image Analyst on 18 Apr 2013

Edited: Image Analyst on 18 Apr 2013

Their adaption of MAD might do something different. You could do the same thing on a column-by-column basis. If you want to do that, you could just extract a column at a time and put it in a loop

for col = 1 : size(Y, 2)
  % Get the median value of all elements (pixels) in one column.
  medianOfWholeColumn = median(Y(:, col));
  % Find the difference between each element and the overall median.
  differenceImage = double(Y(:, col)) - medianOfWholeColumn ;
  % Now take the median of that:
  madValue(col) = median(differenceImage (:));
end
Sn = c * madValue;

Witold on 18 Apr 2013

I'm afraid my interpretation is correct, I founded an example of Sn algorithm: http://www.maplesoft.com/support/help/AddOns/view.aspx?path=Statistics/RousseeuwCrouxSn

consider M 5x2 matrix:

M = [3 1130;
1527;
907;
878;
995]

for each column we calculate x(:,i) - x(i,j) and its median, for 1st column it is:

abs(x(:,1) - x(1,1)) = [0;1;0;1;1] -> med(..) = [1]
abs(x(:,1) - x(2,1)) = [1;0;1;2;0] -> med(..) = [1]
abs(x(:,1) - x(3,1)) = [0;1;0;1;1] -> med(..) = [1]
abs(x(:,1) - x(4,1)) = [1;2;1;0;2] -> med(..) = [1]
abs(x(:,1) - x(5,1)) = [1;0;1;2;0] -> med(..) = [1]

now we calculate median of those medians which is 1, and we move to second column - using my simplest loops we get 117 which is correct answer.

I agree that their adaption of MAD is something different, I was just wondering if there is any method to calculate it faster... it takes up to 5 minutes to compute it for 1000 x ~2000 matrix, which is ok if worse comes to worst, but much over hours(?) for size 10k /: Anyway, thank you for help!

Sign in to comment.

Answer 3

pete on 19 Apr 2016

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/72597-calculate-sn-like-median-absolute-deviation-using-matlab#answer_218535

My suggestion - seems to work:

function Sn = RousseeuwCrouxSn(X)
% Compute the measure of scale 'Sn', from Rousseeuw & Croux (1993)
%
%   A robust alternative to MADn for statistical outlier identification.
%   Unlike MADn, Sn does not make an assumption of symmetry, so in
%   principle should be more robust to skewed distributions.
%
%   The outputs of this function have been validated against equivalent
%   function in Maple(tm).
%
% Example:          X = [1 5 2 2 7 4 1 5]
%                   Sn = RousseeuwCrouxSn(X) % should give 3.015
%
% Requires:         none
%
% See also:         mad.m
%
% Author(s):        Pete R Jones <petejonze@gmail.com>
% 
% Version History:  19/04/2016  PJ  Initial version
%                                               
%
% Copyright 2016 : P R Jones
% *********************************************************************
% 
    % get number of elements
    n = length(X);
      % Set c: bias correction factor for finite sample size
      if n < 10
          cc = [NaN 0.743 1.851 0.954 1.351 0.993 1.198 1.005 1.131];
          c = cc(n);
      elseif mod(n,2)==0 % is odd
          c = n/(n-.9);
      else % is even
          c = 1;
      end
      % compute median difference for each element
      tmp = nan(n,1);
      for i = 1:n
          tmp(i) = median(abs(X(i) - X([1:i-1 i+1:end])));
      end
      % compute median of median differences, and apply finite sample correction
      Sn = c * median(tmp);
  end

2 Comments
Show NoneHide None

Image Analyst on 19 Apr 2016

I don't know what MADn is, but I see nothing in the definition of MAD that requires a symmetric distribution. In fact it's claim to fame is that it's better than linear moments like standard deviation for non-symmetrical distributions with outliers, where as stdev will be influenced by outliers.

pete on 25 Apr 2016

Edited: pete on 25 Apr 2016

Thank you for your comment.

Regarding MADn: This is just MAD scaled by a constant (typically 1.4826). Apologies for not specifying.

Regarding symmetry, here is a quote from Rousseeuw & Croux's 1993 paper:

"the MAD takes a symmetric view on dispersion, because one first estimates a central value (the median) and then attaches equal importance to positive and negative deviations from it. Actually, the MAD corresponds to finding the symmetric interval (around the median) that contains 50% of the data (or 50% of the probability), which does not seem to be a natural approach at asymmetric distributions… In fact, Huber (1981, p. 114) presented the MAD as the symmetrized version of the interquartile range. This implicit reliance on symmetry is at odds with the general theory of M-estimators, in which the symmetry assumption is not needed. Of course, there is nothing to stop us from using the MAD at highly skewed distributions, but it may be rather inefficient and artificial to do so."

If you disagree I'd be interested to hear your thoughts, but I think they put it quite well.

And yes, I agree that MAD is in general very strong, and certainly better than stdev, for the reason you state.

Sign in to comment.

Calculate Sn (like median absolute deviation) using matlab.

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (2)

3 Comments
Show 1 older commentHide 1 older comment

2 Comments
Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

Calculate Sn (like median absolute deviation) using matlab.

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (2)

3 Comments Show 1 older commentHide 1 older comment

2 Comments Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

3 Comments
Show 1 older commentHide 1 older comment

2 Comments
Show NoneHide None