# Why does smoothdata give bad results at the beginning and end of a dataset?

10 views (last 30 days)

Show older comments

When I use smoothdata to process my data set, I am getting bad results at the beginning and end.

At the beginning, the smoothed curve is clearly much higher than what it should be. At the end, it is subtly lower.

It almost feels like the smoothdata function is "reflecting" the dataset to compensate for missing data.

Am I missing some sort of configuration for smoothdata or am I using entirely the wrong function?

##### 1 Comment

Jonas
on 8 Aug 2022

### Accepted Answer

John D'Errico
on 9 Aug 2022

Think about it. This is really classic behavior, completely expected. Imagine the smoothing routine as a mathematical implementation of a thin, moderately flexible beam, that is forced to pass roughly through the data. But the beam will not be TOO flexible, as you don't want it to chase every bump in the curve. Make sense?

A problem though, is when you get near the ends of your data, curvature there is difficult to deal with. The smoother is unable to distinguish between curvature there, and noise. And some smoothers will be better able to predict well near the ends.

I'll create some simpe data to try to explain what happens.

x = linspace(0,2*pi,200);

y1 = cos(x) + randn(size(x))/10;

y1smooth = smoothdata(y1);

plot(x,y1,'ro',x,y1smooth,'b-')

Do you see that where the smooth misses the data the most, is in the regions where the curve exhibits high curvature? Think of a smoothing tool as a low pass filter, it tries to filter out any high frequency stuff, leaving behind only the low frequency stuff. Now, I am sure I could have made smoothdata do a better job here, were I not to simply use it with the defaults, but it exhibits what I want you to see.

A smoothing tool tries to kill off any signal with high curvature, because that is often a symptom of noise. The default method in smoothdata is a moving mean filter, but one that clearly uses a faiirly wide moving window. This is a good scheme for very noisy data. (Different filters have different characteristics. For example, moving median filters are great when you have noisy data that is compromised with outliers.)

But your data has a decent signal to noise ratio. So you either need to use a narrow window in the moving filter, or better yet, to use a different method. Personally, I like Savitsky Golay smoothers for problems with a reasonably strong signal.

y1smoothSG = smoothdata(y1,'sgolay','degree',3);

plot(x,y1,'ro',x,y1smoothSG,'b-')

Never use too high of an order in the Savitsky-Golay. 3 should be an ok compromise here.

##### 2 Comments

### More Answers (1)

William Rose
on 9 Aug 2022

Edited: William Rose
on 9 Aug 2022

[edit: change "on one side" part of my description to "inside" and "outside", which is more clear, I hope]

You are not using the wrong function.

x=1:250;

y=12.2+4.5*sin(x*2*pi/1200)+rand(size(x))/10;

plot(x,y,'-b')

Looks kind of like your data. Try smoothing with default options.

y1=smoothdata(y);

hold on; plot(x,y1,'-r')

This has the same problems as your example, only worse. Here's what's happning, I htink:

smoothdata by default uses a flat moving average window centered over each data point. Near the edges, the window is no longer centered, because it is truncated on the outside but not truncated on the inside, and at the very edges, the window extends only inside, and not at all outside, the data point being smoothed. So the average in this case will behave like what you see in your example and like what is shown above. Try other method options to see if you like one of the others better.

##### 1 Comment

William Rose
on 9 Aug 2022

Edited: William Rose
on 9 Aug 2022

Let's try the other methods. You could also try adjusting the smoothing factor or width, etc.

x=1:250;

y=12.2+4.5*sin(x*2*pi/1200)+rand(size(x))/10;

y1=smoothdata(y,'movmean'); y2=smoothdata(y,'movmedian');

y3=smoothdata(y,'gaussian'); y4=smoothdata(y,'lowess');

y5=smoothdata(y,'loess'); y6=smoothdata(y,'rlowess');

y7=smoothdata(y,'rloess'); y8=smoothdata(y,'sgolay');

plot(x,y,'k.',x,y1,'-r',x,y2,'-g',x,y3,'-b',x,y4,'-c',...

x,y5,'-m',x,y6,'--c',x,y7,'--m',x,y8,'-y');

legend('raw','movmean','movmedian','gaussian','lowess','loess',...

'rlowess','rloess','sgolay','Location','southeast')

xlim([0 70]); ylim([12 13.6])

At the left edge, shown above, methods loess, lowess, their robust versions, and S-G all do much better than movmean, movmedian, gaussian. At the right edge (not shown), methods loess, rloess, and sgolay are the best. In this example.

### See Also

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!