Code covered by the BSD License  

Highlights from
FILLNANS

4.66667

4.7 | 4 ratings Rate this file 10 Downloads (last 30 days) File Size: 2.27 KB File ID: #15590
image thumbnail

FILLNANS

by Ian Howat

 

15 Jul 2007 (Updated 18 Jul 2007)

FILLNANS replaces all NaNs in array using inverse-distance weighting between non-NaN values.

| Watch this File

File Information
Description

FILLNANS replaces all NaNs in array using inverse-distance weighting.
Y = FILLNANS(X) replaces all NaNs in the vector or array X by inverse-distance weighted interpolation:
       Y = sum(X/D^3)/sum(1/D^3)
where D is the distance (in pixels) from the NaN node to all non-NaN values X. Values farther from a known non-NaN value will tend toward the average of all the values.

Y = FILLNANS(...,'power',p) uses a power of p in the weighting function. The higher the value of p, the stronger the weighting.

Y = FILLNANS(...,'radius',d) only used pixels < d pixels away for weighted averaging.

NOTE: Use in conjunction with INVDISTGRID to grid and interpolate x,y,z data.

See also INPAINT_NANS

MATLAB release MATLAB 7.2 (R2006a)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (9)
16 Jul 2007 John D'Errico

>> Af = fillnans(A);
??? Undefined function or method 'vsum' for input arguments of type 'double'.

Error in ==> fillnans at 34
    D = vsum(rn(k)-r,cn(k)-c);

16 Jul 2007 Ian Howat

This error comes from a call to a non-standard toolbox function. I've uploaded a corrected version. (Make sure the revision is noted on this page before re-uploading).

16 Jul 2007 John D'Errico

I did like the help in fillnans. There is no error checking, but relatively little to check. ;-) It has an H1 line, although part of it spills over into a second line. The code is quite simple.

I did some testing to compare fillnans with my own inpaint_nans tool. The first test has a moderately sparse sampling, with fully 98% of it a NaN.

[X,Y] = meshgrid(0:.02:2);
Z = sin((X+Y)*3).*cos((X-Y)*5);
B = rand(size(X))>.02;
Z(B) = NaN;

tic,Zip = inpaint_nans(Z);toc
% Elapsed time is 8.266630 seconds.

tic,Zf = fillnans(Z);toc
% Elapsed time is 59.740188 seconds.

figure
surf(Z0)
title 'Original surface'

figure
surf(Zip)
title 'Inpaint_nans surface'

figure
surf(Zf)
title 'Fillnans surface'

There is a significant difference in speed between the two, as you should see. You should also note the spiky-ness of the fillnans result. This is a characteristic of an inverse distance interpolation. Changing the value of p will have significant influence on your results. For example:

Zf = fillnans(Z,6);
surf(Zf)
title 'Fillnans surface, p == 6'

This was a bit less spiky.

Lets see what happens for more densely populated arrays.
This time I'll make 40% of it NaNs.

[X,Y] = meshgrid(0:.02:2);
Z = sin((X+Y)*3).*cos((X-Y)*5);
B = rand(size(X))>.6;
Z(B) = NaN;

tic,Zip = inpaint_nans(Z);toc
% Elapsed time is 0.821078 seconds.

tic,Zf = fillnans(Z);toc
% Elapsed time is 47.494882 seconds.

figure
surf(Zip)
title 'Inpaint_nans surface'

figure
surf(Zf)
title 'Fillnans surface'

Inpaint_nans was very much faster here, and nicely smoother. How accurate is fillnans? For the problem above, where Z0 is a well behaved, smooth function, inpaint_nans replicates its surface well, better by almost a factor of 200.

std(Z0(:) - Zip(:))
ans =
    0.0007134

std(Z0(:) - Zf(:))
ans =
      0.13365

Can the speed of fillnans be improved? Sadly, the use of waitbar is itself a real time eater.

>> tic,Zf = fillnans(Z);toc
Elapsed time is 47.233541 seconds.

Next, I modified the code to call waitbar only 100 times over the entire main loop. This just took an extra rem call in an if statement. The waitbar is still updated often enough to see it move, but not too often that it was a time hog.

>> tic,Zf = fillnans(Z);toc
Elapsed time is 18.823048 seconds.

It turns out that the calls to waitbar actually wasted roughly 60% of the entire time running. I'll bet the author makes this enhancement quickly.

One behavior of fillnans that is useful is based on its being a convex combination of the data. So at any location, the predicted value must be bounded by the min and max of the data itself. (Method 4 of inpaint_nans should have a similar behavior, for those who must minimize extrapolation
at all costs.)

Another property of fillnans that may be more useful is based on its underlying algorithm. While fillnans is moderately slow, the time required for its solution is
O(K*L), where K is the number of NaN elements to be interpolated, and L is the number of non-NaN elements. Since inpaint_nans is based on the solution of a sparse system of linear equations, that system may grow huge, making the solution of a truly huge problem impossible in the RAM available. However, fillnans may still succeed eventually, as long as you have the RAM to make a copy of the original array.

So for those who may read my review, your own rating of this tool will be a function of the problems that you pose to it. If your problems are simply too large for inpaint_nans, and you are not too worried about a lesser accuracy, then fillnans may be worth a 5 rating for you. For those of you who have smaller problems, who need more accuracy/smoothness, then your rating will be lower. I've chosen to rate it as a 4.

18 Jul 2007 Ian Howat

I thank John for his thorough and informative review below. I would like to respond to his point about the relative "accuracy" of FILLNANS and INPAINT_NANS. In the example John used, he interpolated between gaps in a smooth sinusoidal function and tested the accuracy by cross-validation of the interpolated points with the missing functions. Since INPAINT_NANS uses linear least-squares, it will do the best job of interpolating smooth gradients. However, as an Earth Scientist, I deal a lot with interpolating data with messy (ie. highly peaked) variograms that do not follow smooth functions with even gradients. Ideally I use Kriging to interpolate these data so I can get some sense of the spatial variance and therefore some idea of the confidence of my interpolation. However, kriging takes too long on big datasets and doesnt do well with edges of arrays, thus inverse-distance interpolation is simple and keeps a tight bound on the interpolated values. To gain some insight into the relative accuracy of methods, I applied and compared FILLNANS and INPAINT_NANS to some "real" data - in this case gridded altimetry measurements over the coast of Greenland. This is the same data that's in the screen shot.
For each method I set 200 random data values in the grid to NaN, applied the fill methods, and then compared the std. deviations between the original data values and the interpolated values. I did this 10 times per method, and used every INPAINT_NANS method and the "best" radius/power settings for FILLNANS.

For FILLNANS(A,'power',6,'radius',20) the average standard deviation for all the tests was 78.7083m with a range of 59.2604 to 105.1010. It took 2.790790 seconds to complete each interpolation.
For INPAINT_NANS(A) (the default appeared to give the best result) the average standard deviation was 68.5119m with a range of 46.2669m to 112.6178m. INPAINT_NANS took 4.948645 seconds for each interpolation.
So we see that INPAINT_NANS still provides, on average, a more "accurate" cross-validation result than FILLNANS, but this accuracy is more sensitive to which data is left out for validation and not more accurate than FILLNANS in every case. FILLNANS is 1.7732 times faster for this example (17391 NAN's, 1698 Non-NaN's), which is a different result than John’s below and likely reflects his and Urs's efficiency recommendations. Finally, INPAINT_NANS results in wild peaks where it extrapolates to the edges while FILL-NANS relaxes to the, in this case, more realistic mean or to where it's too far from a value and returns NaN (probably the most realistic result).
In conclusion, these results support John's point that these methods are suited to different tasks.
For big arrays with sparse data on the edges and strongly peaked variograms, or where you want added control through the radius cut-off function and tighter bounding of values , I would go with FILLNANS. Otherwise, INPAINT_NANS.
Probably the best thing to do is to use both and do a comparison like this. - Ian

19 Jul 2007 John D'Errico

My thanks to the author for his continued modifications. Along the way, he has considerably improved this code. With those enhancements, perhaps it is now time to revise my own rating.

19 May 2008 Janusz Janiczek  
05 Jun 2011 Alejandra Botero

Thanks a lot!!

05 Jun 2011 Alejandra Botero

Thanks a lot!!

05 Jun 2011 Alejandra Botero  
Please login to add a comment or rating.
Updates
16 Jul 2007

Replaced missing function call. This fixes the error:
>> X = fillnans(A); ??? Undefined function or method 'vsum' for input arguments of type 'double'. Error in ==> fillnans at 34 D = vsum(rn(k)-r,cn(k)-c);

17 Jul 2007

Added radius option and 'option',value varargin parser.

Added increment expression to waitbar to reduce number of times its called.
Provided by John D'Errico.

18 Jul 2007

Adopted several code efficiency revisions made by Urs, including removing the waitbar.

18 Jul 2007

Adopted several code efficiency revisions made by Urs, including removing the waitbar.

Tag Activity for this File
Tag Applied By Date/Time
approximation Ian Howat 22 Oct 2008 09:19:23
interpolation Ian Howat 22 Oct 2008 09:19:23
inverse distance Ian Howat 22 Oct 2008 09:19:23
gridding Ian Howat 22 Oct 2008 09:19:23
mathematics Ian Howat 22 Oct 2008 09:19:23

Contact us at files@mathworks.com