Asked by Witold
on 18 Apr 2013

Hi, I'm trying to compute Sn value based on : "Alternatives to the Median Absolute Deviation Peter J. Rousseeuw and Christophe Croux" paperwork.

According to this article:

Sn = c*median_i{median_j|xi - xj|}

For each i we compute the median of {|xi-xj|; j = 1,...,n}. This yields n numbers, the median of which gives our final estimate Sn. C is constant value.

I have Y matrix of size 10000 x 2000 (i = 2000, j = 10000) Because of the size, the simplest loop method:

for i = 1 : size(Y,2) for j = 1 : size(Y,1) a(j) = median( abs( Y(:,i)-Y(j,i) ) ); end Sn(i) = c * median(a); end

is time consuming, so is no good at all. I'm matlab newbie and I don't know how to use _repmat _ function which will be - as I guess - very helpful. Can I ask you for help, how to write this so that the computation time will be comparable to _mad _ function?

*No products are associated with this question.*

Answer by Andrei Bobrov
on 18 Apr 2013

Edited by Andrei Bobrov
on 18 Apr 2013

Accepted answer

s = [3 1130; 4 1527; 3 907; 2 878; 4 995];

k = permute(abs( bsxfun(@minus,s,permute(s,[3 2 1])) ),[1 3 2]); out = squeeze(median(median(k)));

ADD

out = zeros(1,size(s,2)); for jj = 1:size(s,2) out(jj) = median(median(abs(bsxfun(@minus,s(:,jj),s(:,jj)')))); end

ADD2 use function `mad` from `Statistics Toolbox`

out = mad(s,1);

Witold
on 18 Apr 2013

Thank you very much, that's what I was thinking about, unfortunately I got the error message:

Error using bsxfun Out of memory. Type HELP MEMORY for your options.

memory available: win 8 x64 / 8GB/ r1012b

Maximum possible array: 6163 MB (6.462e+09 bytes) * Memory available for all arrays: 6163 MB (6.462e+09 bytes) * Memory used by MATLAB: 1009 MB (1.058e+09 bytes) Physical Memory (RAM): 8157 MB (8.553e+09 bytes)

* Limited by System Memory (physical + swap file) available.

should I give up? or mayby try to split columns of the matrix

Answer by Image Analyst
on 18 Apr 2013

I think you're totally misinterpreting the formula. You don't look at the difference of all possible pairs of points - that would take forever and isn't what the formula says. Look at Wikipedia: http://en.wikipedia.org/wiki/Median_absolute_deviation For each element, you'd have a million differences. And there are a million elements so you'd have a billion values. No wonder it takes forever! The i and j DO NOT REFER TO ROW AND COLUMN. They refer to element, or "linear index" in MATLAB lingo.

I think what they mean is to take the mean over all elements (pixels) and then take the median of the differences, which is totally different. So

% Get the median value of all elements (pixels). medianOfWholeMatrix = median(Y(:)); % Find the difference between each element and the overall median. differenceImage = double(Y) - medianOfWholeMatrix; % Now take the median of that: madValue = median(differenceImage (:));

This should be substantially faster, not to mention correct.

Image Analyst
on 18 Apr 2013

Their adaption of MAD might do something different. You could do the same thing on a column-by-column basis. If you want to do that, you could just extract a column at a time and put it in a loop

for col = 1 : size(Y, 2) % Get the median value of all elements (pixels) in one column. medianOfWholeColumn = median(Y(:, col)); % Find the difference between each element and the overall median. differenceImage = double(Y(:, col)) - medianOfWholeColumn ; % Now take the median of that: madValue(col) = median(differenceImage (:)); end Sn = c * madValue;

Witold
on 18 Apr 2013

I'm afraid my interpretation is correct, I founded an example of Sn algorithm: http://www.maplesoft.com/support/help/AddOns/view.aspx?path=Statistics/RousseeuwCrouxSn

consider M 5x2 matrix:

M = [3 1130; 4 1527; 3 907; 2 878; 4 995]

for each column we calculate `x(:,i) - x(i,j)` and its median, for 1st column it is:

abs(x(:,1) - x(1,1)) = [0;1;0;1;1] -> med(..) = [1] abs(x(:,1) - x(2,1)) = [1;0;1;2;0] -> med(..) = [1] abs(x(:,1) - x(3,1)) = [0;1;0;1;1] -> med(..) = [1] abs(x(:,1) - x(4,1)) = [1;2;1;0;2] -> med(..) = [1] abs(x(:,1) - x(5,1)) = [1;0;1;2;0] -> med(..) = [1]

now we calculate median of those medians which is 1, and we move to second column - using my simplest loops we get 117 which is correct answer.

I agree that their adaption of MAD is something different, I was just wondering if there is any method to calculate it faster... it takes up to 5 minutes to compute it for 1000 x ~2000 matrix, which is ok if worse comes to worst, but much over hours(?) for size 10k /: Anyway, thank you for help!

Opportunities for recent engineering grads.

## 0 Comments