Thanks for the advice! Somehow when I tested it with a bunch of 0s and 1s the -NaN thing seemed to work, dunno why. I've changed it at least temporarily to be -Inf. I'll look into modifying how the mean/std is done at some point, since this is clearly a placeholder.

Pasco: Do you have an alternative way to do that? I'd be very interested to know, because I couldn't find anything good that still vectorizes the code.

The key is that I make an upper-triangular matrix there, and I need the mean and standard deviation of the upper triangular half. However, if I leave zeros in the bottom half, they are included in the mean. So the only way I saw to do it is to use nanmean and nanstd and set those zeros to NaN instead of 0.

The problem, of course, is that on the rare occasions where you have zeros in your original matrix, those will end up being set to NaN during the triangular matrix step.

Using logical indexing on the yy matrix doesn't preserve the matrix shape, and thus cannot be used. This seems like the fastest way, though I will admit that if you have -NaN in your data for some reason, it will cause problems (but honestly, wouldn't it cause problems anyway?)

The vectorizing ideas are nice, but I think there are two problems with the method

1. When calculating R the actual data should be used (i.e. mean and std of the data) but after you do

[ ys, is ] = sort( abs( y - ybar ));

you construct yy from ys and calculate R using nanmean and nanstd from yy, which is being applied to the magnitude of the difference between the original data and the mean rather than the data itself. You probably need something 'ys = y(is)' before constructing yy

2. GESD rejects at outliers all of the points up to the last R_i > lambda_i so if you have Ri > lambda_i being [1 1 0 1 0 0] you need to reject 4 points, not 3

Lastly, the messing around with -Inf is really ugly. You should use tril to get an array corresponding to the lower entries which you can then insert the NaNs from. For example,

Thanks for the advice! Somehow when I tested it with a bunch of 0s and 1s the -NaN thing seemed to work, dunno why. I've changed it at least temporarily to be -Inf. I'll look into modifying how the mean/std is done at some point, since this is clearly a placeholder.

I didn't study your program so that I can propose alternatives but my comment goes to the fact that

1- There is no such a thing as a "-NaN". NaNs are not positive or negative. They are just ... NaN

2- yy(yy == -NaN) = 0; will never work because by definition all numbers are different from a NaN, including the NaN itself (just try (NaN == NaN)). But you can get what you want with

yy(isnan(yy)) = 0;

As regarding the usage of NaN instead of zeros, well perhaps you can

1- count them and remove them temporarily.
2- compute the means and stand deviation taking into account the true number of non-zero values (that's for sure what nanmean is doing when it ignores the NaNs)

Pasco: Do you have an alternative way to do that? I'd be very interested to know, because I couldn't find anything good that still vectorizes the code.

The key is that I make an upper-triangular matrix there, and I need the mean and standard deviation of the upper triangular half. However, if I leave zeros in the bottom half, they are included in the mean. So the only way I saw to do it is to use nanmean and nanstd and set those zeros to NaN instead of 0.

The problem, of course, is that on the rare occasions where you have zeros in your original matrix, those will end up being set to NaN during the triangular matrix step.

Using logical indexing on the yy matrix doesn't preserve the matrix shape, and thus cannot be used. This seems like the fastest way, though I will admit that if you have -NaN in your data for some reason, it will cause problems (but honestly, wouldn't it cause problems anyway?)

The vectorizing ideas are nice, but I think there are two problems with the method
1. When calculating R the actual data should be used (i.e. mean and std of the data) but after you do
[ ys, is ] = sort( abs( y - ybar ));
you construct yy from ys and calculate R using nanmean and nanstd from yy, which is being applied to the magnitude of the difference between the original data and the mean rather than the data itself. You probably need something 'ys = y(is)' before constructing yy
2. GESD rejects at outliers all of the points up to the last R_i > lambda_i so if you have Ri > lambda_i being [1 1 0 1 0 0] you need to reject 4 points, not 3
Lastly, the messing around with -Inf is really ugly. You should use tril to get an array corresponding to the lower entries which you can then insert the NaNs from. For example,
yy = rot90(triu(repmat(ys, 1, k), k-n), 2)' + tril(nan(size(yy)),-1);

Pasco:
Thanks for the advice! Somehow when I tested it with a bunch of 0s and 1s the -NaN thing seemed to work, dunno why. I've changed it at least temporarily to be -Inf. I'll look into modifying how the mean/std is done at some point, since this is clearly a placeholder.

Paul,
I didn't study your program so that I can propose alternatives but my comment goes to the fact that
1- There is no such a thing as a "-NaN". NaNs are not positive or negative. They are just ... NaN
2- yy(yy == -NaN) = 0; will never work because by definition all numbers are different from a NaN, including the NaN itself (just try (NaN == NaN)). But you can get what you want with
yy(isnan(yy)) = 0;
As regarding the usage of NaN instead of zeros, well perhaps you can
1- count them and remove them temporarily.
2- compute the means and stand deviation taking into account the true number of non-zero values (that's for sure what nanmean is doing when it ignores the NaNs)

Pasco: Do you have an alternative way to do that? I'd be very interested to know, because I couldn't find anything good that still vectorizes the code.
The key is that I make an upper-triangular matrix there, and I need the mean and standard deviation of the upper triangular half. However, if I leave zeros in the bottom half, they are included in the mean. So the only way I saw to do it is to use nanmean and nanstd and set those zeros to NaN instead of 0.
The problem, of course, is that on the rare occasions where you have zeros in your original matrix, those will end up being set to NaN during the triangular matrix step.
Using logical indexing on the yy matrix doesn't preserve the matrix shape, and thus cannot be used. This seems like the fastest way, though I will admit that if you have -NaN in your data for some reason, it will cause problems (but honestly, wouldn't it cause problems anyway?)

Comment only