MATLAB Answers

0

What is the best way to smooth and compute the derivatives of noisy data?

Asked by Abdulaziz Gheit on 17 Mar 2019 at 15:41
Latest activity Commented on by Abdulaziz Gheit on 18 Mar 2019 at 15:45
Hello all,
I have a set of experimental data (temperature vs time), and have no idea of the function which satisfies the data behaviour. As a part of analysis, I need to estimate the first derivative using central differences and obtain the maximum gradient.
Because my data is too noisy, I need to filter it before taking the derivative. To keep the precision of data and minimize any distortion, I tried to remove the outliers from my data using a Savitzky–Golay filter. It shows great results, but my data is not quite smoothed as it can be seen in a picture of Savitzky–Golay filter. To compute the exact inflection point further curve fitting is required, as it can be seen in the first derivative picture.
Any suggestions or better approaches , would be highly appreciated as I’m new to Matlab and signal processing. I heard about the cubic spline for better smoothness, but I have no idea about it and not sure if it can solve my problem.
Many thanks in advance

  2 Comments

Your derivative looks like something you'd see with an S-curve - flat at the beginning, steep in the middle, then becoming flatter again - not like what your actual data looks like (something like a quadratic). Please attach your actual data with the paper clip icon, and say what you want to do with the derivative once you have it so we know why it needs to be smoother than it is now.
Sorry about any misunderstanding.
It's just part of my data to present it more clear. here is an excel file contains the original and the smoothed data.

Sign in to comment.

1 Answer

Answer by John D'Errico
on 17 Mar 2019 at 16:21
Edited by John D'Errico
on 17 Mar 2019 at 16:54
 Accepted Answer

The "best" way? That will be completely dependent on your data, on what you know about your data, and on what you are willing to assume about the process that generated the data.
Looking at your data, since I don't have your data, nor do I know anything about the data or the process, I can only guess. Thus, can I assume that what looks like noise really is noise? I had one group of clients who were not willing to make that assumption. Any smoothing at all was a bad thing, because there was potentially information in the high frequencty stuff that they needed to know about. Luckily for them, the noise was not overwhelming.
In general though, numerical differentiation is a noise amplifying process. So you need to kill off the noise to whatever extent possible. You can do so by use of a spline fit (IF you use the right spline model. An interpolating spline would be generally a bad idea for you), or you could use a regression model of some sort, IF you have a viable model for this process.
Again, I don't have your data. So, at best, I can only make some data up for an example. So, let me see, this should look at least a little like what you have shown:
x = linspace(0.5,2.5,100);
y = 19 + exp((x - 0.5)/2)/10 + randn(size(x))/100;
plot(x,y,'.')
untitled.jpg
The picture you show has a fair amount of data, yet it is not that complex of a relationship. As such, a low order polynomial model might be sufficient. Then you could just differentiate the polynomial. For example...
P3 = polyfit(x,y,3)
plot(x,y,'b.',x,polyval(P3,x),'r-')
untitled.jpg
P3d = polyder(P3)
plot(x,polyval(P3d,x),'-')
untitled.jpg
A problem with such an approach is it does not build information that you may have about the underlying functional relationship. That is, do you know that temperature will ALWAYS increase with time? If so, then the first derivative must always be positive, yet a polynomial fit gives you no such assurance. In fact, you might notice that the true underlying function behind my data has an everywhere increasing derivative, yet a cubic polynomial model had a negative second derivative. Worse, what if you use too high an order for a polynomial? That is easy to do.
P7 = polyfit(x,y,7);
plot(x,y,'b.',x,polyval(P7,x),'r-')
untitled.jpg
The same fact applies to a Savitsky-Golay filter. A Savitsky-Golay derivative estimation on noisy data will need to have a long window width, and low order implicit polynomial model. However, remember that Savitsky-Golay does not produce a differentiable or continuous function/derivative estimate. So, using my own movingslope code, that builds polynomial model on a moving window, then returns the derivative, we see:
plot(x,movingslope(y,20,1),'.')
untitled.jpg
Anyway, if we look at the bottom end of the curve, the 7th degree polynomial started to get a little squirrely. So, I might try to use a regression spline, constrained to have the expected shape. Lets see,
slm = slmengine(x,y,'plot','on','increasing','on','concaveup','on','jerk','positive');
So here, a cubic spline, constrained to have first, second, and third derivatives entirely positive over the domain.
untitled.jpg
Better yet, of course, is to use a nonlinear model, IF you know a good choice of model in advance.
But really, what matters is as I said in the beginning. What do you know? Because once you can provide information to the modeling process, you will GREATLY improve the derivative estimation process. Tools like Savitsky-Golay assume almost nothing about the system. A polynomial model assumes relatively little, but somewhat more.

  5 Comments

The negative slope at the beginning of the curve is a technical problem in our system. In deed, the temperature should stay constant all the time before it start raising up as an exponential correlation function of time.
As always, all data has problems. ;-) Sometimes I just hate the real world. Seriously, this is not that bad, as data sets go.
It looks like your data always has an initial transient. So, if we plot one of your curves, just the first 200 points, we see both the transient, as well as a rough picture of the noise.
plot(x(1:200),y(1:200,1),'.')
xlabel 'Time'
ylabel 'Temp'
The problem is that as I said, differentiation is a noise amplification process. First, I'll compare the results of a simple call to gradient, to a Savitsky-Golay style of filter, varying the window length, and the local polynomial order. I've just used my own movingslope utility (which can be found on the file exchange.)
plot(gradient(y(:,1),0.0125),'.')
hold on
plot(movingslope(y(:,1),10,1,0.0125),'r.')
plot(movingslope(y(:,1),100,3,0.0125),'g.')
legend('Gradient','SG: 10 points, linear','SG: 100 points, cubic')
As you can see, the tiny noise in your data was amplified heavily. Also, even a tool like Savitsky-Golay does not produce a smooth derivative estimate, because it makes no presumption of the underlying function being a smooth, differentiable curve. The same thing would apply to a running median filter.
Since all 6 curves had the same x, I packed them all into one array. I can now estimate the noise standard deviation for each curve, as:
sqrt(estimatenoise(y))
ans =
0.016051 0.015394 0.01592 0.016541 0.016655 0.017764
It seems pretty consistent, roughly 0.0165 or so on average. So a band of roughly +/-0.03. ESTIMATENOISE is another of my submissions that can be found on the File Exchange. (I'll return and add links to these codes at the end of this response.)
One simple solution, if you have the curvefitting toolbox (I think SPAPS is in there) is to use asimple smoothing spline. It makes no presumption of monotonicity.
spl = spaps(x,y(:,1),0.03);
fnplt(spl)
hold on
plot(x,y(:,1),'.')
(I've zoomed the curve to show only the first part of the data.)
As you can see, it misses that initial transient, but the remainder of the curve does quite well.
And, here, a plot of the derivative of that spline, again, using tools in the curvefitting toolbox.
fnplt(fnder(spl))
I don't dislike that result at all, and it was fairly simple to arrive at. I might decide to fit only the data after the initial transient, so just delete the data before a time of 0.25 or so, UNLESS that portion of the curve is important to you. If it is, then you need to chase after it.
Now, could I gain as good a result using other tools? Well, yes. I'd do this:
slm = slmengine(x,y(:,1),'plot','on','knots',[0 .25 .5 ,1:2:25],'increasing',[0.5 25],'rightslope',0);
The plot produced by SLMENGINE has the ability to plot the derivative curve. See that SLM did have the ability to handle the initial transient.
Could I have used a polynomial model on this? NO!!!! Absolutely not. You will never get a good result from a polynomial model of this data.
Ok, finally, could I have used a nonlinear regression fit? Well, unless you have asufficiently good nonlinear model that would fit EXACTLY that shape of curve, it is unlikely that you will be as happy as you would with one of the spline options I have discussed. A nonlinear function will almost always miss just a bit in one part or another of that curve. There will be significant lack of fit. And unless you put something in that model to specifically handle that initial transient, it will have problems there too.
So I would use either a smoothing spline, or my own SLM toolbox. Either will probably make you happy.
Links:
SLM:
movingslope:
estimatenoise:
Thank you very much for your help.
You described the problem perfectly. In fact, the initial transient (negative slope) is a technical problem in my system. I don’t care about it, but at the same time I don’t to modify my data. In ideal world, the temperature should be in steady state at all the time before starts raising up continuously to reaches the steady state again. Yes, in the Savitsky-Golay filter, I varied the window length and the polynomial order and got just the same results like yours. With 40 points and 5thpolynomial order Savitsky-Golay works really well but it does not produce a smooth derivative as you said. I applied many filters to my data in order to normalize it without any distortion, but I couldn’t.
I am new to matlab ^_^. I download it as tool box to my pc but I don’t know how to run it. It’d be appreciated if you could let me know how to run it. Is it like cftool?

Sign in to comment.