interpolating missing data

13 views (last 30 days)
LS
LS on 1 Dec 2011
Hi all,
I'm trying to estimate model parameters in MATLAB using data I collected in the lab, but I didn't measure all of the variables every day (so for some days I only have data for one variable). The data look like this (time; variable 1; variable 2; variable 3):
1 2330000 5.92275000000000e-06 36.2000000000000
2 52900000 2.79773000000000e-07 35.2000000000000
3 357000000 6.69468000000000e-08 26.1000000000000
4 389000000 1.19846000000000e-07 3.38000000000000
5 668000000 7.43263000000000e-08 0.350000000000000
6 1100000000.00000 4.52455000000000e-08 0.230000000000000
7 1530000000.00000 3.24575000000000e-08 0.340000000000000
8 1250000000.00000 3.96000000000000e-08 0.500000000000000
9 1490000000.00000 3.33154000000000e-08 0.360000000000000
10 1850000000.00000 NaN NaN
12 2050000000.00000 2.42585000000000e-08 0.270000000000000
14 2290000000.00000 NaN NaN
17 2120000000.00000 NaN NaN
19 5090000000.00000 9.79568000000000e-09 0.140000000000000
I've found a way to deal with this by replacing the NaN's with 0s, but I really don't want to do that in this case since it would screw up the estimation. I read something about interpolating the missing data using interp1 but I haven't been able to get that to work. Any help would be much appreciated. Thank you!

Accepted Answer

Sven
Sven on 1 Dec 2011
Let's start with your data.
data = [1 2330000 5.92275000000000e-06 36.2000000000000
2 52900000 2.79773000000000e-07 35.2000000000000
3 357000000 6.69468000000000e-08 26.1000000000000
4 389000000 1.19846000000000e-07 3.38000000000000
5 668000000 7.43263000000000e-08 0.350000000000000
6 1100000000.00000 4.52455000000000e-08 0.230000000000000
7 1530000000.00000 3.24575000000000e-08 0.340000000000000
8 1250000000.00000 3.96000000000000e-08 0.500000000000000
9 1490000000.00000 3.33154000000000e-08 0.360000000000000
10 1850000000.00000 NaN NaN
12 2050000000.00000 2.42585000000000e-08 0.270000000000000
14 2290000000.00000 NaN NaN
17 2120000000.00000 NaN NaN
19 5090000000.00000 NaN 0.140000000000000]
Now here's how you can use interp1, looped over each column. I've updated it to handle NaN values on the end that can't be addressed with pure interpolation:
fullData = data;
for c = 2:size(data,2)
nanRows =
fullData(nanRows,c) = interp1(data(~nanRows,1), data(~nanRows,c), data(nanRows,1));
nanRows = isnan(data(:,c));
fullData(nanRows,c) = interp1(data(~nanRows,1), data(~nanRows,c), data(nanRows,1), 'nearest','extrap');
end
  2 Comments
LS
LS on 1 Dec 2011
This is great - thank you! I have one more question though - I deleted the last value in the first column (so now there's a NaN there) and tried using this code to fill in that value as well but the NaN wasn't replaced (but the code does replace all the other NaNs). Is there a problem with interpolating for the final value in a series?
Sven
Sven on 2 Dec 2011
Yes, is is a small annoyance I have with interp1. Note the difference between _interpolation_ and _extrapolation_. For the former, you need a value above *and* below your query point. I assume that what you really want to do is:
1. Interpolate *linearly* for any _internal_ NaNs.
2. Set those NaN values on the outside to their nearest non-NaN neighbour's value.
My two most-used modes for *interp1* are 'linear' or 'nearest'. There's also an 'extrap' option to extrapolate. But since the above points one and two use different _forms_ of interpolation/extrapolation, you can't do this in one line.
What I do is run two interp commands... one to linearly interpolate, and one to 'nearestly' exrapolate. I've updated the answer accordingly.

Sign in to comment.

More Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!