version 1.14 (668 KB) by
John D'Errico

Least squares spline modeling using shape primitives

If you could only download one curve fitting tool to your laptop on a desert island, this should be it.

For many years I have recommended that people use least squares splines for their curve fits, with a caveat. Splines offer tremendous flexibility to build a curve in any shape or form. They can nicely fit almost any set of data you will throw at them. This same flexibility is their downfall at times too. Like polynomial models, splines can be too flexible if you are not careful. The trick is to bring your knowledge of the system under study to the problem.

As a scientist, engineer, data analyst, etc., you often have knowledge of a process that you wish to model. Sometimes that knowledge comes from physical principles, sometimes it arises from experience, and sometimes the knowledge just comes from looking at a plot of the data. Regardless of the source, we often want to build in this prior knowledge of a process into our modeling efforts. This is perhaps the biggest reason why nonlinear regression tools are used, and I'll argue, the worst reason. If you are fitting a sigmoid function to your data only because it happens to be monotone and your data appear to have that property, then you have made the wrong choice of modeling tool. (If you are fitting a sigmoid because this is known to be the proper model for your process, then go ahead and fit the sigmoid.)

I'll argue the proper tool when you merely need a monotonic curve fit is a least squares spline, but a spline that is properly constrained to have the fundamental shape you know to be there. This is a very Bayesian approach to modeling, and a very useful one in my experience.

The SLM tools provided here give you an easy to use interface to build an infinite number of curve types from data. SLM stands for Shape Language Modeling. The idea is to provide a prescription for a curve fit using a set of shape primitives. If your curve is monotone, then build that information into the model, so you can estimate the monotone curve that best fits your data. What you will find is that once you employ the proper set of constraints, you will wonder why you ever used nonlinear regression in the past!!!

For example, the screenshot for this file was generated for the following data:

x = (sort(rand(1,100)) - 0.5)*pi;

y = sin(x).^5 + randn(size(x))/10;

slm = slmengine(x,y,'plot','on','knots',10,'increasing','on', ...

'leftslope',0,'rightslope',0)

slm =

form: 'slm'

degree: 3

knots: [10x1 double]

coef: [10x2 double]

prescription: [1x1 struct]

x: [100x1 double]

y: [100x1 double]

You can evaluate the spline or its derivatives using slmeval.

slmeval(1.3,slm)

ans =

0.79491

You plot these splines using plotslm.

plotslm(slm)

The plotslm function is nice because it is a simple gui, allowing you to plot the curve, residuals, its derivatives or the integral. You can also evaluate various parameters of the spline, such as the maximum function value over an interval, the minimum or maximum slope, etc.

slmpar(slm,'maxslope')

ans =

1.5481

You provide all this information to slmengine using a property/value pair interface. slmset mediates this interaction, so you can use it to create the set of properties that will be used. The default set of properties and their values are given by slmset. Everything about the shape, slopes, curvature, values, etc., about your function can be controlled by a simple command. SLMENGINE also offers the ability to generate splines of various orders, as well as free knot splines.

For a complete set of examples of the SLM tools in action, see the included published tutorial with this submission. There is also a small treatise included on the concept of Shape Language Modeling for curve fitting.

The SLM toolkit will be considerably improved at some time in the future. I will add a graphical interface. As well, if I have missed any natural shape primitives, please let me know. While I have tried to be very inclusive, surely there is something I've missed. If I can add your favorite to the list above I will try to do so.

Finally, the SLM tools require the optimization toolbox to solve the various estimation problems.

John D'Errico (2021). SLM - Shape Language Modeling (https://www.mathworks.com/matlabcentral/fileexchange/24443-slm-shape-language-modeling), MATLAB Central File Exchange. Retrieved .

Created with
R2007b

Compatible with any release

- AI, Data Science, and Statistics > Curve Fitting > Linear and Nonlinear Regression >
- Science, Engineering and Industry > Sciences > Business, Economics, and Finance > Financial Instruments > Yield Curves >
- AI, Data Science, and Statistics > Curve Fitting > Splines >
- AI, Data Science, and Statistics > Statistics and Machine Learning > Descriptive Statistics and Visualization > Statistical Visualization > Box Plots >

**Inspired:**
noFrillsDevelopment, NLSIG-COVID19Lab

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!Create scripts with code, output, and formatted text in a single executable document.

Thao NguyenJeff DozierWorks for me in R2020b. SLM is probably the most useful toolbox I've gotten from the File Exchange.

Ehsan FarnoAdd on does not work for R2020b.

Marco Lo MonacoIs the SLM engine capable of automatic detection of the best knots position, without having the user providing a set?

Kassandra CostaHi John,

Thanks for this toolbox! It accomplishes what I was looking for my data analysis. One question I had was if there is a way to incorporate uncertainties in the y values of the dataset? I would like to find the breakpoints that are statistically significant given relatively large errors on the y values (±20%). Thank you for your help.

Yunliang QiSaraHello,

I have a dataset of degradation of different systems. Normally a system degrades when its health reaches 80% of its initial value.

However, many of these systems have not reached 80% due to time limitations.

My question is How can I use your toolbox for extrapolation of my data until they reach 80%.

I appreciate any help

Edgar GuevaraExcelent toolbox, really simple to use and plenty examples are provided

Mohamed AbubakrDear John,

It is an amazing tool and I like it.

I am using it to fit a piecewise linear model with predefined knots (from the clustering algorithm) with one dependent and one independent variable.

And I am very interested in how your tool enforces position continuity (C0) at the knots. I mean what kind of constrain or formulation did you imposed on the least square error optimization problem. I have noticed that you have used the Hermit function, but I am not if this relevant to the continuity enforcement at the nodes.

Thanks in advance and keep up the amazing work.

Max TorbensonAn RJgreat tool! yet, i did not manage to perform a piecewise linear fit with constraints on the slopes. let's say i want to fit 3 segments to timeseries data and constrain the slope of the middle segment to be smaller than zero, the first and the last slope are unconstrained. the breakpoints are to be determined by SLM. how can this be done in SLM? thank you!

Slimane GrineSeb SpornJorge SilvaLan Li YuSharan MagaviYuhan QiZakarya MoteaJoe SolomonBrilliant, extremely useful curve-fitting toolbox, possessing many useful capabilities absent from Matlab's Curve Fitting Toolbox. I'd love to see this extended to 3D surface fitting.

dfsfsddfsHarry GreenAbsolutely excellent tool and easy to use. Would be nice to have options to obtain confidence intervals around the optimal knot location for a model with free interior knots though

Diana-Patricia DanciuZhiHDear John,

thanks for this useful tool, but I still cannot obtain a smooth envelope curve for my oscillation data by using these codes below.

[up,up_x,up_y] = slmengine(x,y,'env','sup','plot','off','knots',60);

[dw,dw_x,dw_y] = slmengine(x,y,'env','inf','plot','off','knots',50);

figure, plot(x,[y;up_y;dw_y]);

The problem is detailed at: https://ww2.mathworks.cn/matlabcentral/answers/413510-plot-a-smooth-envelope-for-oscillation-curves.

Would you please help me?

Arezoo GeramiPourG LsMohammad MousaviraadsundarHi John, is it possible to use this function for a set of points in 3D?

Magnus Fagernes IvarsenXueying LuThanks for this useful tool, but I have a question: When I use slm2pp and try to recover the function form of pp (and plot it), they do not look the same as the original data (very different). Do you have any idea why it is the case and how could I get a pp function form that fit the data?

Eric LandahlJotonThank you John for a great tool which have helped me a lot!

I would like to plot a lot of figures using plotslm in a loop and save the plots to a specific folder (in png format). I cannot get that to work using Matlabs saveas function though.I also want to hide the figure plot as i only want to view them in the folder, not as they pop up during the loop.

Matlabs command set(0,'DefaultFigureVisible','off'); doesn´t seem to affect plotslm

Is there any way to do that?

Ted LanghorstsalMert TurkolMert TurkolDear John,

First of all, I'd like to say that this is indeed a very powerful and useful tool. Thank you a lot for the share. In my research, I sometimes encounter replicate values recorded as displacements in experiments, and I know within what bounds the displacement values should lie. I tried testing SLM to fit cubic splines whose values would be staying within the measurement precision bounds at the training input points using the 'errorbar' property of the tool. There have been some cases that fail unless I set the 'errorbar' to its default value "[]", but when I used slmeval to check if the fitted function stayed within my expected measurement error bounds, it indeed did so. This seemed weird to me since there is a function that can be fit within the error bounds using the default option yet it fails to do so when I explicitly define the said bounds using the "errorbar" property even though all other properties are exactly the same. I tried changing the "regularization" property, as well as setting it to "smoothest", but to no avail.

My "errorbar" property vector defined with the 'errbar' variable looks something like this: [0; 0.000226717384185; 0.000226717384185; 0]. The value of "0.000226717384185" here corresponds to what I call the as tick-precision. This should correspond to the case when the left-end and right-end point values are set to exact y-values defined as training input while the interior points are expected to lie within +- tick-precision bounds. Below I will provide a recipe of the constraints I am setting with the property pair values with "caseTimes" being the training input x-values while "caseDisps" are the training y-values. Assume that the number of evenly spaced knots are set to a value like "totPts-totReps" which usually corresponds to 3 or 4 depending on the case I evaluate. The expected behavior of the fitted function should be a concave-down curve:

slm = slmengine(caseTimes, caseDisps, ...

'knots', (totPts-totReps), ...

'errorbar', errbar, ...

'concavedown', 'on', ...

'endconditions', 'notaknot');

This fails to fit the desired function giving the following warning: No feasible solution was found by LSQLIN. This may reflect an inconsistent prescription set.

Now, I set the "errorbar" property to its default (no error-bar defined), and set the "leftvalue" - "rightvalue" properties like below:

slm = slmengine(caseTimes, caseDisps, ...

'knots', (totPts-totReps), ...

'errorbar', [], ...

'leftvalue',caseDisps(1), 'rightvalue',caseDisps(end), ...

'concavedown', 'on', ...

'endconditions', 'notaknot');

The above set of properties work fine, validated with the statement below to check for function values staying within the error-bounds defined with my "tickPrecision" variable:

>> all(abs(slmeval(caseTimes, slm) - caseDisps)) < tickPrecision

ans =

1

I should repeat that I played with the "regularization" parameter while keeping setting the "errorbar" property to vector I had defined earlier in this post, but couldn't get the working result I did with the default [] value. As you can see every other property value pair is exactly the same. I cannot understand why an explicitly set "errorbar" property does not work even though it corresponds to a working case's set of properties. Can you please help in sorting out what might be going wrong?

Thank you for sparing your time to read!

Gert KrugerThanks a lot!

HiWaveSuzie OmanAlso, what is the way to constrain derivative or normal values to be true when fitting?

Suzie OmanIs there a 3d version of this?

John D'Errico@Carrington Metts -

You can just log your x data before calling the tools. That will transform the problem, as if you used a log x axis.

John D'Errico@Kevin Cheng:

Theory? Its just a linear regression spline at heart. To that is added a small penalty that regularizes the spline using the second derivative squared. This part comes from basic smoothing spline theory. You can change the regularization in a few simple ways, but nothing out of the ordinary there.

Finally, I allow the user to apply any of a large set of constraints, that are formulated as prescriptions about the fundamental shape of the curve. This aspect is an innovation that I developed many years ago. All of these prescriptions are ones that can be converted by the spline code into constraints on the spline parameters. Thus monotonicity, curvature constraints, forcing the spline to pass through a point, etc. All are simply written as linear constraints on the spline parameters. The difference is that the user need never formulate the actual equations needed, and that is what makes SLM valuable, since those equations can be quite complicated to write. I've done the hard work for you.

I cannot offer any paper that was written on the ideas here because I never wrote one.

Kevin ChengThanks, it really helps a lot. But, what's the theory behind this program?

RBCarrington MettsThank you for the excellent toolkit! Just one question- is it possible to change the axes of the data before setting the knots and the splines? I'm trying to fit a linear piecewise function to a set of data with loglog axes, but I can't see how to use the slmengine to do so.

Thanks,

Carrington

Scott OttersonHi John,

Thanks for the answer. Yes, I'm already using monotonicity constraints, but in the real data, there are no (knowable) constraints on the 2nd derivative. So, as it stands, I have occasional numerical errors, and frequent very slow convergence.

This is why I asked about penalized splines. The unconstrained penalized spline algorithms are numerically robust, fast to train, and easy to code. There are also monotonicity-constrained versions (the Pya paper I mentioned), but they don't have a way to constrain the values at the endpoints.

I thought I might be able to modify the SLM toolbox to do a Pya-with-endpoint-constraints, but I can't quite understand how the SLM code works.

So, is it possible for you to add penalized, constrained splines? Or alternatively, is there document somewhere describing the inner workings of the existing code?

Thanks,

Scott

JackHi.

First of all, thank you for this practical toolbox. How can I specify minimum distance between knots in free mode? I have some restrictions in my problem. I should have minimum three data points between the knots.

Thanks.

hamed aminitypo in slmengine.m, lines 1012-1017?

---------------------------------------------------------

% left hand side maximum

if ~isempty(prescription.LeftMaxValue)

M = zeros(1,nc);

M(1) = 1;

Mineq = [Mineq;M];

rhsineq = [rhsineq;prescription.LeftMinValue];

end

----------------------------------------------------------

should be:

---------------------------------------------------------

% left hand side maximum

if ~isempty(prescription.LeftMaxValue)

M = zeros(1,nc);

M(1) = 1;

Mineq = [Mineq;M];

rhsineq = [rhsineq;prescription.LeftMaxValue];

end

----------------------------------------------------------

John D'ErricoLooks like I'm not getting notified when comments are made. Sorry.

@Emanuele - No, general inequality constraints cannot be supported. That would be quite difficult in general, since the constraint you describe would correspond to an inequality at every point on the curve. As such, it would be impossible to implement, even using fmincon. At most, one could implement a necessary constraint, that it must be satisfied at some sample set of points. But even if you enforce that requirement at each of some large set of sample points, it may still fail at some location in between.

Monotonicity is indeed a constraint of the general class you describe. But there I was able to use a result from Fritsch & Carlson, allowing me to create sufficient conditions on the spline in an easily written form.

@Scott - a good approach if possible is to apply a monotonicity constraint, or perhaps a global constraint on the second derivative in the form of concavity. These constraints have a huge regularizing impact on the shape of your curve, excluding the nasty stuff that will happen otherwise.

Mallik GurramGeoff StanleyScott OttersonI've had good luck with the SLM toolbox, except for one type of problem. Most of the curve I'm fitting is smooth and nearly flat there are sections with a high derivative -- and the location of the high derivatives is unpredictable. So, I need a lot of knots everywhere.

Under these conditions, I sometimes get ill-conditioning during SLM fitting, and automatic knot selection doesn't work.

On the other hand, if I fit the curve using roughness-penalized splines, I can use a lot of knots (say 100) and ill-conditioning never occurs. But then I don't have SLM's fixed endpoint and monotonicity constraints.

Basic, and easy to implement unconstrained penalized spline algorithms are here:

Eilers, P. H. C. & Marx, B. D. Splines, knots, and penalties Wiley Interdisciplinary Reviews: Computational Statistics, John Wiley & Sons, Inc., 2010, 2, 637-653

and here:

Kagerer, K. A short introduction to splines in least squares regression analysis University of Regensburg, Faculty of Business, Economics and Management Information Systems, 2013

And, an algorithm with montontonicity but not end point constraints is here:

Pya, N. & Wood, S. N. Shape constrained additive models Statistics and Computing, Springer, 2015, 25, 543-559

I considered modifying SLM to implement the Pya algorithm, then adding endpoint constraints. But I've looked at the SLM code and don't quite understand the existing algorithms.

So, is it feasible to add a roughness penalized fitting algorithm to SLM?

Emanuele CattarinuzziDear John,

lately I used SLM to fit some noisy data. I found it very helpful, hence, thanks a lot for your contribution.

I was wondering something about the prescriptions the user can give: one can prescribe constraints on y and its derivatives (yp, ypp, yppp) individually; assuming one has some knowledge on the sign of the derivative of the curvature, one may want to prescribe an inequality involving a non-linear combination of these derivatives, e.g.,

yppp - (3*yp*ypp^2)/(1 + yp^2) > (or <) 0

Such constraints cannot be formalized in the constraint matrix M assembled by slmengine (am I wrong?).

Do you think that setting such non-linear inequality constraint could be addressed by using nonlinconstr() in combination with fmincon()?

Since slmengine already uses fmincon() when the 'interiorknots' are set to be 'free', I was wondering whether non-linear inequality constraints could be approached similarly.

- Emanuele

Longchuan Lione of the best toolboxes I have seen here!

John D'ErricorowJoe - I'm sorry, but there are various ways to store any spline. You apparently expect some other form.

slm = slmengine(0:10,rand(1,11));

bspl = fn2fm(slm2pp(slm),'B-');

bspl

bspl =

form: 'B-'

knots: [0 0 0 0 2 4 6 8 10 10 10 10]

coefs: [0.27526 0.069372 0.71806 0.8488 -0.063009 0.7445 -0.04073 0.71297]

number: 8

order: 4

dim: 1

The code I gave you, which is what you requested, is a code that converts a spline into a b-spline form, using code provided by the Mathworks.

THEY chose how to represent the b-spline.

If you want help on that matter, you need to take it up with them.

doc fn2fm

So if you expected some other form for the coefficients, as I said, talk to TMW.

rowJoeDear John,

thank you very much for your post on 9th May 2016.

In the meantime I followed your post without any success. If I am working with B-Splines I would expect an basis expansion system (described by Ramsay here: http://www.psych.mcgill.ca/misc/fda/ex-basis-a1.html)

If I define an basis expansion system with n functions I would expect an n times n coefficient matrix.

In your example (with the B-Spline form) I get an 1x8-matrix containing the coefficients.

Don't get me wrong. Your tool is absolutly fantastic. That is why I try to import the results (created with your tool) in Ramsay's functional data analysis toolbox (downloadable for R and Matlab under www.functionaldata.org for free).

I would like to try the following:

- Fitting the data as b-spline with your toolbox. So I calculated the coefficient matrix (in your example: bspl.coefs).

- Import the coefficient matrix in the fda tool.

- Analyze the curves with the fda toolbox.

Maybe you or one of the other users has some ideas to fix this issues.

I would be very happy about that.

- Jonas

P.S.: If someone need an example I posted I minimal case here: http://de.mathworks.com/matlabcentral/answers/293307-optimize-b-spline-fitting-while-working-with-functional-data-analysis-fda-and-slm-toolbox

Julia HoweGreat tool!

I recently updated from 2014 to 2015b and now running any SLM Tool causes MATLAB to crash.

Has anyone else had this problem or know why this is happening? I thought perhaps transferring the files from my old computer may have had something to do with it - but even re-downloading doesn't seem to help.

-Julia

John D'Errico@Jorge -

Without seeing your data, I can only guess what is happening.

You have specified two pieces of information, thus increasing, and concavedown. So the curve must be increasing, probably to an upper asymptote.

If there are linear segments, that suggests you have regions where the data seems to indicate differently, but the curve is constrained to behave as you indicate. So in those segments, the segment reduces to a degenerate cubic polynomial, i.e., a linear segment.

That is, all segments are cubic in your problem. But some segments MAY be degenerately cubic. A linear segment is exactly that.

Perhaps the data is fairly noisy. Or perhaps you have more knots then the data adequately supports, especially in context of the number of knots.

One solution would be to use fewer knots. Of course, that gives less flexibility to the curve, so less freedom to fit what is probably noisy data.

A second solution is to specify the regularization parameter, 'regularization'. This will allow you to essentially make the curve a smoother one. The default for this parameter is 0.0001. So try a somewhat larger value. (Too large of a value here will cause the curve to approach a straight line fit through your data.)

Another option (that will be a bit slower, since it will utilize the cross-validation option) is to specify the 'regularization' property as 'crossvalidation'. This will automatically choose a degree of regularization that seems indicated by your data. As I said, this will be a more computationally intensive solve, but it should generally result in a more pleasing fit.

John D'ErricoThey change these things underfoot. :)

So in my most recent version I fixed that problem, but also added options called 'FminconAlgorithm' and 'LsqlinAlgorithm'. I guess I need to post that release.

Scott OttersonThis works well on my test problems, for example:

slm = slmengine(x,y,'knots',10,'increasing','on','leftminvalue', 0, 'rightmaxvalue', 1);

But there's a Matlab Optimization Toolbox warning:

Warning: Options LargeScale = 'off' and Algorithm = 'trust-region-reflective' conflict. Ignoring Algorithm and

running active-set algorithm. These settings will error in a future release.

> In lsqlin (line 297)

In slmengine>solve_slm_system (line 2490)

In slmengine>slmengine_cubic (line 2332)

In slmengine (line 255)

The warning went away and things still work if I set options.Algorithm='active-set' just after LargeScale is turned off in slmengine.m line 2486.

Will this break anything else? I'm running Matlab R2015b.

Jorge YannieHello John,

I have a displacement vs time data and I am trying to fit an "increasing" and "concavedown" curve to my data set (decelerating hyperbolic type of behavior). The resulting curve is a combination of linear segments and cubic splines. I know that my derivatives should decrease with time and not be constant as given by the linear segments.

Is there a way to shutdown the linear fitting and force the slm to fit only splines?

Thank you very much!

John D'Errico@rowJoe

You asked for short.

slm = slmengine(0:10,rand(1,11));

bspl = fn2fm(slm2pp(slm),'B-');

That should be a b-spline form.

rowJoePhantastic work John. I would like to use your toolbox for functional data clustering with b-splines. Is there a chance to get the basis expansions from the (b?) splines?

It would be great if you provide a short answer for my question.

John D'ErricoLeandro - just in case you DID wish to compute a higher order Hermite interpolant. While slmeval does not currently handle higher order forms than a cubic, you can use tools like ppval or fnval to do so.

Here is an example where I create a higher order Hermite function. First, I'll create the known function values and first and SECOND derivatives of a simple function.

x = (0:.25:1)';

y = sin(x);

yp = cos(x);

ypp = -sin(x);

Next, I'll create a SLM form in a lazy way, just using slmengine. That is just to create a SLM struct as a container though. I'll modify it afterwards.

slm = slmengine(x,y,'knots',x);

Now, insert the known derivative information.

slm.degree = 5;

slm.coef = [y,ypp,ypp];

and convert to a pp form.

pp = slm2pp(slm);

This is a quintic Hermite interpolant for the function sin(x) over those knots, in a pp form spline. You can now use any tools from MATLAB on the result, like fnval, fnplt, fnder, ppval, etc.

slm2pp is able to convert any of piecewise constant, linear, cubic, quintic or heptic hermite forms into their corresponding pp forms.

Again, there was NO true smoothing done here. So if your information is noisy, don't expect to be terribly happy. The result will be as noisy as your information.

John D'Errico@Leandro - One issue is you ask about smoothing noisy data, when you have up to second derivatives. The result of a Hermite (quintic) in that case does not smooth the data, it would at most provide a "smooth" interpolant between those points. And if your function values and up to second derivatives are themselves noisy, then the resulting interpolant would potentially be quite nasty looking.

So, perhaps you are asking about a least squares spline that would in some way smooth through the data, using that information about the first and second derivatives to provide input, but allowing them to be smoothed. This could be doable, although I've never seen anyone ask that question. An issue is how to resolve the penalty on any errors in function values versus the derivatives.

Regardless, you explicitly stated the idea of a higher order interpolant, but in the same sentence you stated a desire to do smoothing. An interpolant never does any smoothing. All it can do is to provide a smooth interpolant between the points. An interpolant always recovers essentially the exact values of the data at the supplied points, so smoothness is only an issue between the points, and across the knots.

Anyway, if your goal was as simple as a higher order Hermite interpolant, one could do this by converting the segments from the higher order Hermite interpolant into a pp form. Then ppval or fnval would suffice to work with the resulting spline for purposes of interpolation. But the result would NOT be a terribly smooth one if the information provided was noisy. So I don't see this as solving your problem, although I could help you in this case.

LEANDRO CryzIt is a really useful and powerful tool. I was just wondering if I could use a 5th order Hermite interpolant to smooth a noisy data which I know the first and second derivatives. The smooth spline of matlab fit tool does not allow the user specify the derivatives, although one can choose the smoothness level, i.e how much the curve will be close to a cubic interpolation spline or to a least square straight line. It would be nice if this option were available here.

LEANDRO CryzM. F.John D'ErricoI'm not sure what you mean by "any of these constraints". You can do what is allowed. I've tried to offer as many simple prescriptions as possible, but of course there will always be something I did not think of.

If the derivative is always increasing, then the derivative of the derivative is always positive. I.e., the second derivative is positive. You get that by setting 'concaveup','on'.

GiannaGreat tool - can you apply any of these constraints when plotting the derivatives? (i.e. can I say my derivative must always be increasing)

Kelly KearneyTruly excellent tool, very useful.

One small bug: If you pass data with NaNs to slmeval, and your slm prescription includes linear or cubic extrapolation, it will throw an error (since NaNs fall through the <,> checks and are not assigned to non-zero bins on lines 201-202). Minor issue easily solved by preprocessing data, but it took me a few minutes to track down the cause of error.

Matt JVery nice. The .rtf document needs some editing, though. Explanations of some of the parameters are missing: xy,xyp,xypp, etc...

LeoJorge YannieThanks for such a great tool! I really love it!

John D'ErricoJukka,

No, that is not right. Mreg is ONLY the regularizer part.

There are several different matrices you would need to use for this.

Mfit is the design matrix. Mreg is the regularizer. Meq contains the equality constraints, Mineq the inequality constraints.

In general, there will be equality constraints for a cubic problem, there MAY be inequalities. In case you are using something like monotonicity, then you have inequalities in the problem.

If you have any active inequalities, they should arguably be considered as equality constraints.

For any equality + active inequalities, they would need to be treated apart from the design and regularizer parts of the problem. Essentially one needs to reduce the problem by those constraints. Think of it as eliminating variables from the problem, in advance of the solution.

I may need to consider adding this as an option, as it will get a bit messy.

jukkaJohn, thanks for the quick reply. By effective degrees of freedom I meant the trace of the hat matrix of the spline. I need to find the hat matrix to be able to analyze the bias-variance trade-off of the spline fit. I have tried to find the SLM equivalent for the hat matrix, and my best guess thus far is the upper-left portion of the Mreg matrix. Would this be right?

John D'Errico@Jukka - Effective degrees of freedom is a bit messy.

Assuming there are no inequality constraints, then one subtracts the number of equality constraints from the number of parameters to be estimated. The number of parameters to be estimated will depend on the order of the spline. For a cubic spline with N knots, this is 2*N parameters. Assuming a c2 cubic with the default end conditions, there are N-2 equality constraints, plus any others you have specified. Shifts, scalings and normalizations are irrelevant - they have no impact at all, except that they MAY be important to improve numerical performance.

The difficult part is if there are ACTIVE inequality constraints. Inequality constraints arise from things like monotonicity, etc. For each such active inequality constraint, one should arguably subtract an effective degree of freedom.

@Maria - You need to download AND install this toolbox in order to use it.That will require you to add the toolbox directory to your search path.

jukkaJohn, how does one calculate the effective degrees of freedom of the SLM spline fit? I've tried, but the spline setup inside the slmengine.m routine is a bit hard to follow after all the slope restrictions, scalings, and normalizations.

Maria AbizeidDear John,

I trying to use your code for determination of inflection point of ITC data, however It’s not working on my matlab R2015a. I have the optimization app installed. The following is the error message:

slm=slmengine(x,y,'degree', 1, 'plot', 'on')

Undefined function or variable ‘slmengine'

Please, inform me if there is anything I can do to solve the problem.

Regards,

Maria

ansi512Dear John,

first at all thanks for that contribution! Well done, absolutely helpful!

Is there a way of constaining the maximal and minimal value of the second derivative?

I try to handle gravity currents front position data and limiting the maximal acceleration to g is one of the exact contrains I have...

Thanks in advance, Anselm

AnaelBrilliant work! It combines everything you need to make a spline into one powerful tool. Forget Matlab's spaps,csapi,csape,etc. and their weird parameters conventions.

One negative point: the robust and free knots options are not working for me. Maybe it's me... i'll update if this changes.

AnaelYuan JunHi John,

I have another question.

I set every data point as a knot as follows:

slmengine(x, y, 'knots', -1,'plot', 'on', 'verbosity', 1);

It seems that the regularization doesn't work. No matter how I tune the regularization parameter, no smoothness effect occurs.

But, the smoothing spline function in Matlab also treats every data point as a knot, and the smoothness effect is obvious.

So, I wonder about the differences bewteen the SLM and the smoothing spline in Matlab.

Looking forward to your answer.

Thanks very much!

Best regards,

Yuan Jun.

John D'ErricoSorry. The robust code is not yet implemented. I had started to write it, but was only half done. I thought it was disabled as an option, but I apparently uploaded the toolbox for another fix while I was in the process of writing.

Yuan JunDear John,

Thanks for sharing your code.

When I use your code for fitting, some errors came up.

This is my code:

slm = slmengine(x, y, 'robust', 'on', 'plot', 'on', 'verbosity', 1);

The errors info:

Error in slmengine>solve_slm_system (line 2428)

if strcmpi(prescription.Robust,'on')

Output argument "lambda" (and maybe others) not assigned during call to

"C:\Users\YuanJun\Desktop\SLMtool\slmengine.m>solve_slm_system".

Error in slmengine>robustfit_slm_system (line 2398)

[coef,lambda] =

solve_slm_system(RP,Mdes,rhs,Mreg,rhsreg,Meq,rhseq,Mineq,rhsineq,prescription);

Error in slmengine>solve_slm_system (line 2429)

[coef,lambda] =

robustfit_slm_system(RP,Mdes,rhs,Mreg,rhsreg,Meq,rhseq,Mineq,rhsineq,prescription);

Error in slmengine>slmengine_cubic (line 2332)

coef = solve_slm_system(finalRP,Mdes,rhs,Mreg,rhsreg, ...

Error in slmengine (line 255)

slm = slmengine_cubic(x,yhat,prescriptionhat);

I don't know what happened.

Any solution to address the issue?

Thanks again!

Regards,

Jun Yuan

Sergei P.John D'ErricoXavier - My guess is you either do not have the optimization toolbox, or it is not installed properly. SLM does require that toolbox.

A quick test (running 2015a) shows your test to work nicely on my machine. So I don't see any reason why it would fail for you.

Xavier SiraultDear John,

I am interested in using SLM but I encounter the following error (using Matlab 2014b and 2015a):

>> x = -5:.1:5;

y = exp(-x.^2/2)/sqrt(2*pi) + randn(size(x))/100;

slm = slmengine(x,y,'plot','on','knots',15,'integral',1,'deg',3,'minval',0)

Error using optimset (line 148)

No default options available for the function 'lsqlin'.

Any solution to address the issue?

Regards

Xavier

Mike AlonzoHi John,

Question about the robustfit_slm_system function: I do not understand how it returns coefs estimated using robust fitting. It appears that the coefs returned are actually non-robust from the solve_slm_system function (called after prescription.Robust is set to 'off'). It's very possible that I'm misunderstanding the code, so apologies in advance if that's the case.

My overall goal here was to see if I could adjust the robust fit to a quantile fit but I might play with the Weights setting instead.

Thanks for your thoughts,

Mike

John D'ErricoOh, drat. Looks like it is time for me to revisit the options on those calls. 2015a is now out too, so I'll fix it asap. If you are using free knots, then fmincon is used to adjust the knots, but lsqlin is used to estimate the spline coefficients. So both tools are actually in play there.

Mike AlonzoHi John,

Great tool. I'm using it to classify land cover change over 30 years in Landsat imagery. I have mostly been using this in 2013b and then a little in 2014a.

When I started trying it in 2014b, I got warnings related to fmincon or lsqlin (not sure which exactly) saying that the "Large Scale" and "Trust reflective region" options are in conflict and the "active set" algorithm is used instead. Also, it said that in future versions this warning will instead be an error.

I'm far from an optimization expert so I wanted to check to see if this could yield different results or hurt performance. I'm using free interior knots on millions of pixels so performance is critical for me.

Any thoughts or recommendations are much appreciated.

Thanks,

Mike Alonzo

UCSB Geography

Mike AlonzoJohn D'ErricoNo. I'm sorry, but SLM probably never will work in more than one dimension, at least as I will offer this tool.

That is not to say it would not be possible, by someone to do, some day. But the huge variety of constraints I allow are simply not easily transformed into meaningful concepts in higher dimensions. And the essential philosophy behind SLM is to estimate a curve in a way that allows the user to simply encode their knowledge about a system into the curve estimation.

Wu YingDoes it work when we have more than 1 (say 3, 4 or more) input variables?

John D'ErricoEshwar - I considered offering a 5th order or higher fit when I wrote the code. I did not do so however, for valid reasons, at least what I considered valid ones.

- I've only rarely ever needed more than 3rd order. In one such case we were modeling paper paths through a copier, and the path needed to be smoother than a cubic spline could offer.

- Higher orders than a cubic are a serious problem for many of the most useful constraints one may want to apply. Monotonicity for example, is done using a set of necessary constraints based on an inequality from a Fritsch and Carlson paper. Higher orders than cubic however will not allow such a nice solution, so we would have problems ensuring true monotonicity. The curvature constraints would also be more difficult to satisfy.

So in the end, I chose not to implement higher orders than cubic. Sorry.

eshwar kannanHi John,

What can I do, if I wanted to make the order more than '3'? I could find the order only until '3' from you tool.

Regards,

Raja

J. PVHello John,

Thanks for this great tool.

I do have a question, I am not sure if it has been already answered. I would like pre-define the knots to be the points where the derivative of the curve changes sign. Is there an option for that? I am using the piecewise linear SLM fit, I can send you an example of my data if needed.

Thanks,

Judith

John D'ErricoMustafa - I'm glad you like it.

The initial knot placement is as you supply it, or if you supplied nothing, then the default is used. By default, the knots chosen are equally spaced from min(x) to max(x), with 6 knots in total, so there would be 4 interior knots to vary. The optimizer then varies only those interior knots.

There is no overt randomness in the optimization or the initial values though. For example:

x = rand(100);

y = exp(x);

clear functions

tic,slm = slmengine(x,y,'knots',6,'interiorknots','free');toc

Elapsed time is 1.369497 seconds.

tic,slm = slmengine(x,y,'knots',6,'interiorknots','free');toc

Elapsed time is 0.176209 seconds.

tic,slm = slmengine(x,y,'knots',6,'interiorknots','free');toc

Elapsed time is 0.127645 seconds.

tic,slm = slmengine(x,y,'knots',6,'interiorknots','free');toc

Elapsed time is 0.123504 seconds.

tic,slm = slmengine(x,y,'knots',6,'interiorknots','free');toc

Elapsed time is 0.121428 seconds.

tic,slm = slmengine(x,y,'knots',6,'interiorknots','free');toc

Elapsed time is 0.123522 seconds.

tic,slm = slmengine(x,y,'knots',6,'interiorknots','free');toc

Elapsed time is 0.126947 seconds.

The difference in time for the subsequent calls is small, but non-zero. That variability is purely due to your CPU as it is constantly running various other things in the background.

ALWAYS do such time tests multiple times, and ignore the first couple of calls from your sample, as the first time a function is called, MATLAB takes extra time to cache the JIT parsed code. Above, see that I did a clear functions first to clear the function cache. The first call was quite slow, then after the second call, the time needed was far less, and quite consistent.

John D'ErricoI'm confused. If you found this page, you should be able to click on the "Download Submission" button. Once you have done that, unzip the file, and add the resulting directory to your search path using either addpath & savepath or pathtools.

yunt tianNice ,but how can i download it?

MustafaDear John

This is an amazing tool , thanks a lot, and definitely gonna be referenced in my journal. I have one question regarding piecewise linear fitting using free , 'interierknots' : I could not find how it initiate the first values of the knots to be optimized , and I am trying to figure this out in order to implement it on a microcontroller board , I would much appreciated if you could tell me, how the initial guesses were chosen based on the fmincon function so I can evaluate the complexity and how much time it will take on a printed board. Also when does the fitting stops, I noticed that the time it takes to reach to some certain fit varies every time ( for the same options ) which means the initial points differs , right ?

Regards,

John D'ErricoOh, and yes, piecewise linear segments are there. set the degree 'property' to either 1 or 'linear'.

John D'ErricoHi Cyril,

If you mean the ability for SLM to choose a knot placement for the piecewise linear case, it is in there. You can set the 'interiorknots' property to be 'free'.

If you want the end points themselves to be free, I don't see the need, as long as those knots contain the data and are placed as far out as you need them.

cyrilHi John

should be "xev = (0:.001:1)';" in the tutorial 2nd cell

Would there be a way to determine automatically the length of bins in piecewise linear fit? they could have any length, but ends hen the data change significantly, any idea, interesting work else

John D'ErricoJoseph,

The point is that the knots MUST contain the data. It will throw an error if that is not true, telling you something to that effect. In fact, I don't know where you got the idea that the knots should lie wholly inside the range of the data, as that is not something I have shown in any example in the tutorials. Note that the default for 'knots' is a set of 6 equally spaced points that COMPLETELY span the data.

If the knots were to fall entirely inside the data, with data points that lie outside, then slmengine would be forced to extrapolate, something which I am strongly opposed to doing with a spline, since extrapolation of a cubic polynomial segment will give virtually random crap.

If however, I force you to contain the data inside knots of the spline, then you can control the shape of the spline over the entire knot range. This is how the tool works. It does so for very good reasons. Note that if you DO wish to extrapolate, then since you can control the shape of the spline out as far as the knots go, then you can force the tool to extrapolate intelligently, at least within the bounds of the knots.

JosephHi John,

Greatly appreciate your fast response.

Turning C2 off at specific points is probably what I needed, but I am not complaining that it isnt there. Also, I didn't realize the concave up/down can be applied to more than one region, but other aspects of my simulations don't keep the concavity the same. I do appreciate those suggestions.

I kinda knew I would have to overcome my lazyness and start using custom placed knots for each of intervals to capture the inflections and avoid overfitting certain regions.

However I am a little confused by lines 176-178 of slmengine.m.

if (knots(1)>min(x)) || (knots(end)<max(x))

error(... 'Knots do not contain the data. Data range: ',num2str([min(x),max(x)])])

Suppose my data is just x = 1:10 with the knots = [2, 5, 7] (i.e. knots contained in the data range). That would cause line 176 to evaluate to true and then throw an error.

(2 > 1) || (7 < 10) => true || true => true

(If I am not mistaken or confused; but probably just confused) Shouldn't that knot placement for the data set be acceptable because all knots are interior to the data set?

Or is it that the range of the knots must include the data range as subset?

Sorry if this is a basic question, but I don't understand what the significance of having knots placed outside the range of data would be.

John D'ErricoHi Joseph,

I had thought about allowing a user to turn off C2 continuity at specific knots, but it seemed a bit klugy in terms of the interface.

If you have a decent idea of where those breaks occur, then you could add a spare knot or two in those areas. If you are trying to do it semi-automatically, then you might need to do it in two passes, using the first fit to find roughly where those second derivative breaks live. So the first pass would employ lots of knots in the fit.

You can also specify a region or set of regions where the curve will be concave down or up. Make sure those regions don't overlap, else the fit will be too strongly constrained to be useful, and make sure there are a couple of knots between each pair of consecutive regions.

Finally, remember that derivative estimation is an ill-posed process. So it tends to be a noise amplification process.

JosephDear John,

I very much appreciate your SLM tool (plus all your other submissions and useful tips).

I have a question regarding fitting a monotonic increasing data (gas effusion data) that will have inflections (due to increasing or holding the temperature during simulation with time that has an exponential scaling on effusion rate). I am able to get a good fit to the raw data but I am more interested in derivative of the spline because that gives me information about the effusion rate.

I expect the derivative to always be positive (due to monotonicity) and there to be sharp peaks in the derivative due to the inflections in the raw data at temperature changes. Essentially, exponentially increasing then exponentially decreasing, followed by exponentially increasing and exponentially decreasing, etc.

If I was to fit the original data with separate splines defined over intervals of constant temperature behavior I would have jump discontinuities in the first derivatives at the boundary of the temperature intervals which is unacceptable.

To give a pseudo-example to what I expect from the first derivative:

d = [exp(0:0.1:1) exp(0.9:-0.1:0.5) exp(0.4:0.1:1.5) exp(1.4:-.1:1)];

plot(d);

I am fine with the peaks being smoothed a bit due to the smoothness conditions of the spline fit of the original data. I did try changing the C2 to 'off' but then each interval of the derivative was not sufficiently smooth.

Any suggestions on what prescriptions to use when I know my first derivative should be smooth in an interval but have a sharp peak?

John D'ErricoHi Peter,

Yes, I'm afraid that you did misunderstand the intent, although I can understand your confusion.

SLMEVAL does not look at the prescription you have provided to know how to extrapolate. In fact, it looks only at the spline itself. (The prescription field is returned to you as a field of the model for several reasons. For example, you can use that prescription field to fit a similar spline to other sets of data, passing the prescription itself into slmengine. It also represents a form of documentation for what you had done to build the spline itself.)

From the help for slmeval, I quote:

"As opposed to extrapolation as I am, slmeval will not

extrapolate. If you wanted to extrapolate, then you

should have built the spline differently."

The point is, if you provide a point that lies outside of the support of the spline, it uses the value of the spline at the end point knot for the value it returns. I suppose you can view this as constant extrapolation.

My point about needing to build the spline "differently" is that you need to supply knots that extend over the region where you will expect to extrapolate. When you do this, now you can tell slmengine how to build the spline over those regions, especially if there is no data out there. Your knowledge as the user is of paramount importance where no data exists to fill a void.

For example, since you are building a purely linear spline (as opposed to a piecewise linear spline), there is no need to use even the default set of 6 equally spaced knots. So you might have done this:

X = linspace(5, 10, 100);

Y = 0.5 + 2*X + 0.001*X.^2;

slm = slmengine( X, Y,'plot','on','knots',[-50,50],'degree','linear');

slmeval( -10, slm )

ans =

-19.704

See that now slmeval can evaluate the function at any point in the desired region.

Or you might have specified the end conditions for the spline. The natural end conditions indicate a spline that has zero second derivatives at each end of the spline. Since this is a two knot cubic spline, that forces the spline to be linear over the entire range, although it is still explicitly cubic. Again, as long as the knots go all the way to where you will need it evaluated, slmeval has no problem.

slm = slmengine( X, Y,'plot','on','knots',[-50,50],'endc','natural');

slmeval( -10, slm )

ans =

-19.704

Even here though, SLMEVAL willl not extrapolate beyond the knots of the spline, except as a constant. So if I try to force it to do so, SLMEVAL will refuse to cooperate:

slmeval( -100:20:100, slm )

ans =

-100.3 -100.3 -100.3 -80.154 -39.854 0.44588 40.746 81.046 101.2 101.2 101.2

No warnings of this behavior are generated, although I suppose I could have built that into SLM too. So the next time I update SLM, I might consider adding an "extrapolation" option. The options available to the user might then arguably be:

{'error', 'warning', 'constant', 'linear', 'cubic'}

Thus 'error' would produce an error whenever any extrapolation is done. 'warning' would issue a warning message, but then extrapolate as a constant. 'constant' is what is currently done. 'linear' would extrapolate linearly form the end knots, etc.

I suppose the most logical default would be 'warning' to tell the user something strange is being done, although for consistency, 'constant' seems right.

A final note, deep in my past, I once wrote a tool that would allow you to extrapolate an existing spline, based on ideas not unlike those in the SLM tools. That is, given a spline, it would attach new knots to that spline, and a shape for the spline that was consistent with any goals that the user supplied. So could specify a new spline that maintained the shape at the end of the old curve, smoothly extrapolating out to a specific point, AND such that the spline was monotonic over that region, or linear over that region, etc. I included ways to specify that the spline could not go above a maximum, below a minimum, have a given slope at the ends, etc. Essentially anything you wanted to do, it allowed you do to it over the extrapolated region. I suppose one day I'll write a tool like that to work with SLM.

Peter CottonTutorial gave me the impression that setting a LinearRegion would allow for linear extrapolation outside the knot points. Did I mistake the intent?

>> X = linspace(5, 10, 100);

Y = 0.5 + 2*X + 0.001*X.^2;

slm = slmengine( X, Y,'plot','on','LinearRegion',[-50,50]);

slmeval( -10, slm )

ans =

10.5209

>> slm.prescription.LinearRegion

ans =

-50 50

FabJohn D'ErricoLeonid - Yes, in the past I have had a few problems with errors in variables, easily enough solvable when the problem is purely linear. It gets a bit nasty when you have splines though. Sorry, but it is not something that SLM can solve. John

Leonid PeshkinThank you very much for putting together and sharing Shape Language Modeling !

I hope you could answer one question for me. When you fit a spline into a given data (or part of it) using MLS penalty

the data is divided by X coordinate and penalty is calculated in Y coordinate.

But for many datasets, X and Y are symmetric and what I'd see as a natural penalty in these cases would be

the Minimal Least Squares of a distance from each point to the curve/spline, calculated as length of orthogonal projection of a point onto that curve. Is there a way to use slmengine to find optimal fit this way ?

John D'ErricoMarkus - your question refers to the free knot solver in SLM. This uses fmincon because the problem is a (partly) nonlinear one in that case. The knot placement parameters enter nonlinearly here, although some of the parameters are still solved using a direct linear solver.

However the link that you show is a link for an LP solver. I did not see a nonlinear code there, although there are references to the ability to solve quadratic problems, thus least squares. (I admittedly know nothing about CPLEX.) So CPLEX could not be used to replace a call to fmincon as far as I could see.

When SLM is called for a fixed knot problem, I believe that CPLEX could probably be used to replace either of the calls to LSQLIN or LSE, but that does not appear to be your problem.

If anyone can offer a more intelligent response, feel free to help.

John

MarkusWow that is really a great tool.

Is there a possibility to change the used optimizer from fmincom to CPLEX Solver? I hope to have an increased speed using that one. Somehow I have no idea how to deal with line "intknots = fmincon(@free_knot_obj,intknots,A,b, ...

[],[],[],[],[],fminconoptions,x,y,prescrip);" and especially with the free_knot_obj function because CPLEX expects to have a matrix concerning http://pic.dhe.ibm.com/infocenter/cosinfoc/v12r2/index.jsp?topic=%2Filog.odms.cplex.help%2FContent%2FOptimization%2FDocumentation%2FCPLEX%2F_pubskel%2FCPLEX1131.html at that point. Is there someone who can give me a hint??

John D'ErricoThe knots are only equally divided IF you only tell it the number of knots. If you provide a vector of knots, then they are as you indicate in the vector. (Read the help!)

As far as optimal locations, the option is provided to optimize the interior knots. Look at the option for 'interiorknots'. If you specify 'free', then it will use an optimizer on the knot locations. Again, you will learn this by reading the help. (Note that this is only an optimization. Any optimizer can fail if it is provided with poor starting values, so if you have a good idea of where the knots might belong, it will help to provide some direction with the starting choice of knots.)

The advice to read the help is quite important to follow, since there are so many options in this tool. You might be surprised at what you find in there. SLM_tutorial.html is worth reading.

Deepesh upadrashtaHi John,

Very good tool.

I want to know how can we get optimal piecewise linear functions using this tool. Right now, I think Xrange is equally divided based number of knots. For the given number knots, how to get the optimal locations of X which give best fit to data?

If this tool doesn't have the capability, can you suggest any other option.

Thanks,

Deepesh.

ZhengJohn D'ErricoMaria - a good question. While the Hermite form is MOST commonly a cubic, the generic idea extends to other odd orders too, thus 1st, 5th, even 7th order.

A piecewise linear Hermite will be a continuous function, given by the fact that the Hermite form shares a function value across knot boundaries. It must perforce be continuous.

When we step up two orders to a cubic Hermite, the cubic segments share both a function value and the first derivative across knot boundaries. Again, the result is perforce both continuous and differentiable. So we can build an entire family of functions that are everywhere differentiable merely by specifying the value of the function and its derivative at each knot of the spline. (Artful choice of those numbers is what SLM achieves to fit your data while including your goals in the fitting process.)

In theory, I could have allowed the user to choose a 5th degree Hermite (or any odd degree) too. Here we would specify the values of the function and its first and second derivatives at each knot. There would be problems that would then arise, since monotonicity is no longer so easily constrained for such a problem. (For anyone that wants me to do so, sorry, but there is little gain that would arise from a higher order spline than what I have offered, certainly compared to the effort that it would entail on my part to write it. Again, things like monotonicity would become a bit more difficult to constrain. If you need additional flexibility in a spline, then use more knots.)

The point is, a general Hermite form can exist for any odd order, although few authors choose to do so. In this code, I do also offer a piecewise constant form, which personally I don't think terribly valuable, but it was easy enough to include even though it is not a classical Hermite form. Anyway, I'm sure some of my users have used it.

Should I have written this code to use a B-form? Arguably so, since then I could have offered quadratic splines too without much effort. But I've always liked the general Hermite form, since I can look at the coefficients and easily visualize the shape of the spline directly from the function values and derivatives. As well, it was quite easy to write the code for the various constraints. Oh well, my code, my subtle bias.

I hope this rambling answer helps.

MariaExcellent toolbox, thank you for sharing.

I have a question about the piecewise linear fit. In smlset it specifies:

'degree' = 'linear' --> Use a piecewise linear Hermite

Can a piecewise linear fit be Hermite? I searched for the definition of Hermite fit and it always comes up as cubic. Any ideas would be helpful!

SvenWhen I grow up I want to make intuitive and unbelievably useful tools like John D'Errico.

Zhexuan ZhangI haven't figured out how to do this. Can anyone help me out?

What I want is how to make a 3-piecewise linear curve fitting with free knots, with one of the 3 pieces having fixed slope.

For example, I would like to have linear, constant, linear curve, where the two intersections are unknown. Is there a built in option that I can use? Thank you in advance.

Zhexuan ZhangSantiagoRojasJohn D'ErricoA long time ago, I wrote a tool that would take an existing spline, and rather than simply evaluate the first or last segments of the spline for extrapolation, I added a new segment that had all desired properties, like monotonicity, concavity, endpoint slope or value constraints, etc. Of course the new segments were fully consistent with the old end points of the spline and the shape at that point. If necessary to meet the specified constraints, I added several segments. The interface for this code was similar to that for SLM, with many possible property/value pairs for any possible shape.

I'll claim that this tool fully met with the SLM philosophy, in that it encouraged the user to explicitly specify information about the shape of the extrapolated curve. While I'd like to provide such a tool, more important in my opinion is to write a GUI wrapper for SLM.

To a large extent you can do that form of extrapolation already, when you first build the curve. Simply specify knots that go out as far as you need the curve to go. The knots need not always be tight up to the end points of the data. (That is the default for SLM, but you can choose your own knots.) This allows you to directly apply any pertinent shape information about that extrapolated region.

neutrino4242Dear John,

thanks for your rapid answer. Your reference to Mark Twain gave me a new inside view about his mathematical abilities :-). You are right, it's a bit philosophical question and of course, extrapolation may results in unexpected results. But my background is numerical/physical motivated. I'm interest in of deconvolution of a given time series. The frequency response = susceptibility in fourier space is known. It's is well known, extending the data set = padding is mandatory to avoid boundary effects like ringing. In case of image deconvolution, used as deblurring see e.g. R. Liu, "REDUCING BOUNDARY ARTIFACTS IN IMAGE DECONVOLUTION" + google. The suggestion in the book Numerical Recipes, Chapter 13.1.1 is zero padding. This is fine is the data set starts and ends with zeros but fails in all other cases. Zero padding leads to strong ringing at the beginning and end of the deconvoluted time series, independently of the padding length. The reason is the discontinuity in the data set before deconvolution. A better idea is the padding with constants to avoid the discontinuity. Next better idea is a extrapolation in the (unphysical) padded region where is no jump in first and second derivative. If i perform the deconvolution with such extrapolation, the ringing artifacts disappears. Of course, after deconvolution, only the time span without the padding regions in front and the end of the data set has a physical interpretation. I hope it explains my physical/numerical motivation of extrapolation with the first and last spline.

BTW: i) I'm a German and my surname is John :-).

ii) I use the slmengine mostly to obtain the numerical derivative of noisy data. From my point of view, the slmengine has many advantages in control of the necessary smoothing of the noisy data set, e.g. concaveup or integral. It's not possible to implement such (physical motivated) features in more sophisticated algorithms like higher order methods or Savitzky-Golay-filters.

John D'ErricoHi Neutrino,

You might call it an inconsistency. I choose to call it a strongly held difference of opinion. Really, it all comes down to my philosophy about extrapolation as opposed to that embodied in PPVAL.

I don't let you extrapolate a spline outside of its support using SLMEVAL. Extrapolation does foolish things, just when you least want it to happen. Perhaps my favorite mathematical quote (that hardly anybody else ever seems to know about) is by Mark Twain, from Life on the Mississippi.

“In the space of one hundred and seventy six years the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over a mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oölitic Silurian Period, just a million years ago next November, the Lower Mississippi was upwards of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-pole. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo [Illinois] and New Orleans will have joined their streets together and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.”

The point is, extrapolation does nasty things, so SLMEVAL does NOT allow you to do anything but extrapolate as a constant function beyond the support of the spline. PPVAL does do so. That is a problem of PPVAL, IMHO.

So what should you do if you truly do NEED to extrapolate? You should have done a better job fitting the spline! Use the capabilities of the SLM tool to fit a spline that goes out as far as you want it to go. Now you have total control over the shape, and you should be monitoring the results to make sure that you get something intelligent, instead of something virtually random as you would get from PPVAL.

To put it another way, if you don't know what the curve should be doing out there beyond the data, or will not choose to deal with controlling the extrapolated shape, then you should not be extrapolating your curve out into those nether regions. Again, this is my opinion, but it seems a very logical one, and it is one that is fully consistent with the philosophies of SLM.

neutrino4242Dear John, excellent tool, I use it extensively in my daily work with noisy data. A reference in the next paper will be given.

May be, i found a tiny inconsistency regarding the extrapolation of data set. Have a look on this sample code:

% aim: find a good extrapolation of noisy data, green line in final plot

clear all;close all;

x=0:0.01:1.5*pi;

y=sin(x)+rand(size(x))*0.1;

xinterp=-2:0.01:3*pi;

% matlab interp1 is not designed for noisy data, fails

yinterp=interp1(x,y,xinterp,'pchip','extrap');

figure(1);plot(xinterp,yinterp);

% first try with slm, only the first and last data point are padded in extrapolation

yslm=slmengine(x,y,'endconditions','estimate','plot','on');

yslmeval=slmeval(xinterp,yslm);

figure(1);hold on;plot(xinterp,yslmeval,'-r');ylim([-3 3]);

% second try, this works, calulation of the polynomial in the extrapolated regions

ypp=slmengine(x,y,'endconditions','estimate','plot','on','result','pp');

yppeval=ppval(xinterp,ypp);

figure(1);hold on;plot(xinterp,yppeval,'-g');ylim([-3 3]);

erickThank you, John. According to your suggestion, I have fitted my data again, but unfortunately, the results is worse than seperately fitting the two parts of the curve. I made a simple simulation, but I still got the similar result. Would you please take a few minutes to have a look at my simulation data, I will send it to you by email. Thank you.

I found the code written by Meyer on this web site "http://www.stat.colostate.edu/~meyer/srrs.htm", but it was written in R code, not matlab. I think I should learn R code these days. Meyer also did not give out how to fit the data with half convex and half concave, it seems still a long way for me to get out. Would you please give me some suggestion? Thank you.

I am looking forward to your new reply, Thank you again.

John D'ErricoNo. SLM does NOT use truncated power functions! These are generally a numerically poor way to implement a spline, even only a cubic spline.

SLM uses a Hermite formulation, which personally, I've always preferred as they are easy to look at and understand the shape just by looking at the parameters since they are parameterized by function values and derivatives at the knots. Its just my personal preference though, since a spline by any other basis is still a spline.

As far as monotonicity or concavity applying only over a restricted range, you can do so using SLM directly. Read the help for slmset. It says (in part):

'increasing' - controls monotonicity of the function

= 'off' --> No part of the spline is constrained to be an

increasing function.

= 'on' --> the function will be increasing over its entire domain.

= vector of length 2 --> denotes the start and end points of a

region of the curve over which it is monotone increasing.

= array of size nx2 --> each row of which denotes the start

and end points of a region of the curve over which it is

monotone increasing.

So if you wish a constraint to apply only over a portion of the curve, you can specify the interval. This same approach applies to the concavity constraints. If that constraint would indicate the curve be monotonic over only part of a knot interval, I do stretch it to require monotonicity over a complete knot interval. So you cannot stop midway between a pair of knots. In that case, simply add a knot.

erickJohn, would you please tell me if it is possible to fit such kind of curve as below: monotonic 'increasing', 'concaveup' in its first half, 'concavedown' in its second half.

To fit the above curve use slm tool, I splited the curve into halves. Then I use slm tool twice to seperately fit the two parts of the curve. The result is not so good.

Can slm tool solve this problem? I am looking forword to you early reply.

erickJohn, I think I have got the answers to my questions.

First, I thnik the basis spline you used in SLM toolis the truncated power functions, this kind of function is effective for low degree of spline(e.g. degree 2 or 3). In Ramsay's paper, he suggested that I-spline would be a more suitable set of basis splines that can be combined linearly to yield any other spline associated with knot sequence. I wonder if it is possible to add the application by I-spline in slm tool in its latter versions.

Second, Is "slm.stats.finalRP" the smoothing parameter for regression splines? According to the example in "slm_tutorial.m", I can get many "RP", and the last one is about 8.9.

I think you must be busy these days. Wish you good luck all the time. Thank you.

erickJohn, I think I have misunderstand "cubic spline". "Cubic spline" only means the spline has degree 3. I can choose different spline basis function to form cubic spline, such as M-spline, B-spline or I-spline. In Ramsay's paper, he gave out some examples of monotonic cubic spline fitting by linear combination of I-splines with degree 3.

John, Would you tell me what is the spline basis function you choosed in SLM tool if I use cubic spline of drgree 3 for curve fitting?

Looking forward to your early reply, Thank you again.

erickThank you, John. I got many answers from your comment.

I have a little confused about "A cubic spline means you have a curve that is composed of piecewise segments". In Ramsay's paper, he give out an example (see figure 1 of Ramsay's paper) of combining of six I-splines to form the final spline. Can I regard the "piecewise segments" as "I-spline" or "other-spline" basis functions in SLM tool?

Thank you for your new update, would you please give some explaination on "slm.stats.finalRP "? It will be best if you can give an example on "slm.stats.finalRP " in "slm_tutorial.m".

Thank you again.

John D'ErricoErick - sorry, but I don't have Meyer's paper to know what their terminology is. Degree 3 is a cubic spline in SLM.

I do now recall the Ramsey paper as a useful one, but most of my old collection of papers was left behind when I retired from Kodak.

If you set 30 knots, then the result is ONE cubic spline. A cubic spline means you have a curve that is composed of piecewise segments, thus 30 pieces, that are each connected at the knots, so they are adequately smooth across those tie points.

The current version of SLM did not return the final parameter generated from cross validation, but I've modified it so that it will do so. slm.stats.finalRP will now contain that value. I'll upload the new version in the morning. It was due for an update anyway.

erickThank you, John. Thank you, Peter.

I have read the paper written by J.O. Ramsay, and after comparison I feel more about the excellence of Slm tool.

John, I have a few questions left about slm tools, would you please give me some idea?

Firstly, if I choose degree 3 in "slmengine", does it mean that I choose the cubic spline for curve fitting? I read the paper by Meyer, he said that I-spline was the integration of M-spline, and C-spline is the integration of I spline. Is C-spline another name of cubic spline?

Secondly, if I set 30 knots and degree 3 for my data, is the output curve by Slm tool composed by the combination of 30 cubic splines? And are the positions of these cubic splines decided by the position of knots?

Last question, I found that your Slm tool provide "cross validation" to help smooth the curve. Can I get the optimal smooth parameter by set "cross validation" in "slmengine"?

Looking forward to your early reply, Thank you again.

Peter Simon@erick: A good entry to the literature on regression splines might be J.O. Ramsay, "Monotone regression splines in action", which is heavily cited by later works. A copy can be obtained for free from http://tinyurl.com/cqekokb

John D'ErricoErick - sorry, but I'm not doing active research in this area, and I don't have access to any new papers. I recall a paper by Wright and Wegman about constrained regression splines, but that dates back over 30 years. The work I did in developing SLM was mainly my own, based on my consulting activities, though I never published any papers on the topic.

erickThank you, John. I think I get some answer from your comment. Fritsch and Carlson gave out the theory for interpolating, while SLM is a tool for regression. My data is noisy, so I need a tool for regression not for interpolating. Thank you again.

Another question, Can you give me some references about SLM tools on the theory of “monotone piecewise cubic regression”, I found a paper written by meyer, that is "Inference using shape-restricted regression splines". But Meyer said little about "monotone piecewise cubic regression", would you please recommend me some references, Thank you.

John D'ErricoErick - Yes, that paper is where I (long ago) read about the region of monotonicity for a cubic. I use a multi-sided (6 or 7 sides, I can't recall the exact number at the moment) polygonal approximation to the region that contains all monotonic cubic segments.

So is the theory used in SLM exactly that set forth by Fritsch and Carlson? Not exactly, because they describe an interpolating spline, and because I use a different, slightly more inclusive region than they implemented.

As far as knot placement goes, this is traditionally the biggest problem of this class os spline. A knot belongs where the third derivative changes, but that is quite difficult to identify. The idea behind SLM is typically to use generally more knots than you really need, but then to apply artful constraints on the shape of the curve, based on your knowledge of the process. This is where SLM excels, in allowing you to bring virtually any information that you may have about the relationship into the modeling process.

erickReally excellent work. Thank you.

I have two questions about monotonic data fitting. First one:I find a paper "Monotone Piecewise Cubic Interpolation(SIAM J. Numer. Anal., 17(2), 238–246

)". If I use your SLM tool for monotonic data fitting, is the theory under your slm tool similar with that in this paper.

Seond question: I don't know how many knots should I choose for data fitting, my curve is a smoothed step curve, If I choose too many knots, the result is wrong. But if I choose not enough knots, the result is not good. I just choose equal intervals to set knots position.

Looking forward to your early reply, Thank you again.

erickReally excellent work. Thank you.

I have two questions about monotonic data fitting. First one:I find a paper "Monotone Piecewise Cubic Interpolation(SIAM J. Numer. Anal., 17(2), 238–246

)". If I use your SLM tool for monotonic data fitting, is the theory under your slm tool similar with that in this paper.

Seond question: I don't know how many knots should I choose for data fitting, my curve is a smoothed step curve, If I choose too many knots, the result is wrong. But if I choose not enough knots, the result is not good. I just choose equal intervals to set knots position.

Looking forward to your early reply, Thank you again.

C JSantoshAssuming, derivative singularity means discontinuity in the derivative function, I was wondering if there is a way a spline can be made to fit a step function reasonably.

SantoshI have a question regarding an example in your tutorial.

What is the meaning of derivative singularity? It may be a naive question but I have not seen a definition of this anywhere.

SantoshMatlab2010Jeff DozierThis package is perhaps the most useful that I have downloaded from Mathworks. Before discovering it, I had worked on the problem myself for some specific cases (fitting a monotonic function, fitting one with a simple peak), with just limited success. Thank you John.

John D'ErricoTim - yes, by default SLM does yield a C2 (twice continuously differentiable) approximation. You can change that of course as you desire.

And, yes, the entire idea behind SLM is it gives you a good fit that allows you to build your own information into a problem, while not requiring the user to provide an explicit model.

TimI find this whole idea of nonparametric "fitting" very intriguing, as I have some data for which I do not have a function to fit to the data. I have position vs. time data points, and I need to get the acceleration vs. time. I have found that different fit functions certainly have different derivative behaviors, so will SLM give a reliable fit that is twice differentiable? You indicate it gives the fit plus derivatives.

If this could work in a way I could have confidence in, it would be a godsend!

John D'ErricoMarc - the problem is interesting, but difficult to solve where cubic polynomials are involved using the class of optimizer that is employed. I'll start by talking about how a simpler constraint is achieved. For example, suppose you define a bound constraint on the value of the function at the left end of the curve, so perhaps you provide a prescription for LeftMinValue.

This is equivalent to setting a single inequality constraint on the parameters of the spline. More importantly, I can write that inequality constraint in a linear form, so those parameters enter linearly into the equation. Since the SLM tool will use a linear solver (in this case, LSQLIN) to solve the problem, this is necessary. LSQLIN is a fast, efficient optimizer, and it returns your answer quickly and consistently. As importantly, you should appreciate that LSQLIN has no need for a starting value like many other optimizers.

Monotonicity is a more difficult class of constraint to define. The trick is to return to how PCHIP works. PCHIP is based on the work of Fritsch and Carlson, where they showed how to set bounds on the set of parameters of a cubic polynomial, such that it will be monotone over a fixed interval. As it turns out, the set of parameters reduces to a region with a convex but curved boundary in the relevant parameter space. I approximate that curved boundary with a slightly smaller polygonal boundary. The nice thing is a polygonal domain can be defined using the linear inequality constraints that LSQLIN allows. Effectively, SLM searches over a slightly smaller set of possible splines that ALWAYS satisfy the required monotonicity constraints. This is what I mean when I say that SLM uses a sufficient set of constraints for that case. There MAY be other splines that would fit the data slightly better, but I cannot find them. The difference is slight and subtle here, but in general not that important. Effectively, the sufficient constraint makes the spline slightly less flexible. So if you really desperately need a more flexible spline, you just use an extra knot or two. No problem.

Global constraints on MinSlope or MaxSlope are reduced by a transformation into the monotonicity problem, so they are also easy.

Ok, so now what happens when you want to enforce a general MaxValue or MinValue prescription? This is the only case where I had to employ these necessary but not truly sufficient constraints. There is no solution (that I know of, and I have looked) for the minimum value of a cubic polynomial over an interval that allows me to enforce that constraint using linear algebra and linear inequality constraints. So whereas I was able to solve the monotonicity problem using a call to LSQLIN, I must employ what are at best necessary constraints for a global max or min value. Essentially, I sample the function at a set of fixed points on each knot interval, constraining the function from exceeding that desired value. As the points are reasonably close, then the function will not exceed your global bound by much, but this is as much as I can realistically do.

The alternative would be to force SLM to use a different solver, perhaps FMINCON. FMINCON puts the problem into an entirely different class, a place that in general you do not wish to go. FMINCON will be MUCH slower in convergence properties. It will require starting values for the parameters. And sometimes FMINCON will not converge to a solution that you like. You don't want to use FMINCON unless that is absolutely necessary here. LSQLIN is a better solver for these problems, but the use of LSQLIN forces me to use purely necessary constraints. They can allow the bound to be exceeded by a small amount. Such is life.

Having explained (with much hand waving, but I hope I got the point across) why I cannot solve your problem exactly in SLM, you are not completely lost. Very often such global bound constraints are a reflection of something a bit deeper. For example, if you know absolutely that your function can never go negative, it may be because you are really working in the wrong domain. That is, if Y may never be negative (OR zero), then why not work in the domain log(Y)?

Effectively, I am suggesting one might build the model log(Y) = f(X), or Y = exp(f(X)). This is simple to do. You merely log your Y variable before passing it into slmengine. Some constraints may need to be adjusted to stay in context. Because this is a nice transformation of the problem, monotonicity is not even affected. If the curve was monotonic before, it still is. And a MinValue constraint at zero need not even be provided. That global minimum value is built into your transformation. Of course, if you had a point with an exactly zero Y value this will fail, but otherwise, it should work nicely.

The point is that very often such a global bound constraint on value is a reflection of a problem with domain. It is a sign that you really are working in the wrong domain anyway. The transformation just brings you into the correct place to work.

Is such a transformation always the best solution? Not for all problems. Remember that when you transform Y, you also muck with the error structure. Logging Y effectively means that you are now assuming a lognormal error structure, so your errors are now proportional. Data points near zero will now assert more influence on the solution. A tiny error in a point very near zero will be massive once you log things. That may or may not be appropriate.

Now, you mention a case where you have both a global MinValue AND a global MaxValue. A logical transformation here is based on a sigmoid function of some ilk. For example, suppose your problem is known to lie in the bounds -1 and 1. Simply transform the problem such that

Y = erf(f(X))

You would then pass into slmengine the variables X and erfinv(Y). Similar transformations work for a true sigmoid, of the general form 1./(1+exp(u)), or for the relation atan(u). Any monotone bounded transformation that has an analytic inverse will work in theory.

Again, these transformations may wreck complete havoc on your error structure. One solution is to employ weights that are based on the local derivative of the transformation you have employed. This is easy enough to implement by passing in the appropriate vector of weights.

The only other potential alternative is to provide a MinValue or MaxValue that is conservative by a bit. This gives the necessary constraint some room before the function exceeds you real bound. The problem is how much headroom to give it, and that is an unanswerable problem. (Sorry.)

So, if you have managed to read all of this without falling soundly asleep, I'll apologize that I cannot provide a perfect solution, but there are some ideas that may work for you. - John

MarcReally good code, thank you John.

just one question, I want to have a constrained spline between a max and min values but with 'maxvalue' and 'minvalue' you say (and I try it and it happens): "This constraint is only a necessary constraint. It is not sufficient. In some circumstances the spline may pass slightly above this maximum value". Is there a way to avoid that?

And why is not exactly? is it a fomulation problem or a numerical? because if is numerical i think i can work with that.

Thanks

Andrew StampsJohn,

Thanks for this excellent tool. I wish I had discovered it sooner. I have been writing custom quadratic programs to fit polynomials with various constraints (monotonicity, concavity, endpoint slopes, etc.) for years, but it seemed like every time required a slight variation on what I had done previously. This tool captures all of those things I was doing (and many more).

John D'ErricoMatt- that would not be an uncertainty on the prediction, but on the parameters themselves, and in that case on a knot position. In order to do that you might want to try a bootstrap estimator.

http://en.wikipedia.org/wiki/Bootstrapping_(statistics)

MattHi John. This is an excellent tool! I have a question. The .rtf accompanying the file lists 'prescription.PredictionUncertainty' as an option. The file itself, however, does not. Is there any way to tease out the uncertainty/confidence interval for the 'x' value chosen for a free knot? The verbose output lists 'Range of prediction errors', but this appears to be in 'y'. I'm using a linear 3-knot fit.

John D'ErricoActually, 'rmse' is not a property, although you can do what you desire. I wrote that doc before the code itself, and then missed changing the doc to reflect a minor change in design. I simply folded that ability into the 'regularization' property. IF a negative value (-r) is supplied for that property, then a spline is fit using r as a goal for the RMSE. The help correctly tells you this, but then the help is long and it could easily be missed.

Philipp RauchJohn,

thank you for providing this great tool. I use it to fit deformed polymer filaments and it does a great job. I tried to use the 'rmse' parameter described in your documentation. However, there seems to be no such property in the code. Has it been removed (due to frequent sky downfalling events) or do I have to set the rmse property in a different way than other properties? E.g. setting the 'endconditions' property works fine.

Thanks a lot,

Philipp

John D'ErricoPer - SPLINEFIT and SLM are different tools completely, with somewhat different design goals.

The basic philosophy of SLM is that you as a user have knowledge about a system to be modeled, and that you will get a far better model as a result by including that knowledge. SLM provides a tool to efficiently and intelligently bring that knowledge into the modeling process. As a spline, SLM is a regularized one, so that the knot locations in SLM are less important than they are for SPLINEFIT. Even if you have more knots than data points, SLM will STILL yield a nice looking result if at all possible. For example, try this with both tools:

x = rand(5,1);

y = sin(x);

pp = splinefit(x,y,10);

fnplt(pp) % fnplt is in the curvefitting toolbox

slm = slmengine(x,y,'plot','on','knots',10);

As well, the number of constraints you may apply with SPLINEFIT is quite limited, whereas the use of regularization allows that to not be a problem. Again, this effectively prevents you from even trying to build a truly monotonic cubic spline.

SPLINEFIT offers only a very limited set of constraints on function values and derivatives at specified locations. This makes true monotonicity impossible to assure for a cubic (or higher order) spline. The wide variety of constraints in SLM and natural interface to those constraints make it (IMHO) far more useful.

On the other end, splinefit offers higher order splines for those who truly need a high order of differentiability. Of course, it is also true that estimation of high order derivatives is nearly impossible from data with any noise in it, so you are probably kidding yourself if you try to do so. SPLINEFIT also allows a robust fitting option, a useful idea I have already been encouraged to offer in my next release.

Another thing that SPLINEFIT offers is the ability to solve multiple problems in one call, whereas I think an external loop is an adequate solution to that. Anyway, since every problem is different, I recommend that you build a model and decide if you are happy before going on.

In the end, both tools are useful, and offer the ability to fit a spline to your data. Personally, I think SLM is by far my favorite, but then you cannot blame me for being biased in that direction.

Per NordlöwHow does SLM compare to the FEX splinefit package?

John D'ErricoHi Matthew,

SLM has some neat abilities. I've tried to put everything in it that anyone will need, although I'll never succeed completely at that goal. You can essentially get your wish here, but you need to explore the options of SLM. Try this example:

x = rand(50,1)*6 - 3;

y = erf(x);

slm = slmengine(x,y,'plot','on','knots',[-3 -1.5 1.5 3],'increasing','on');

With 3 segments, a purely cubic spline does not do terribly well. Suppose we provide some information that the function should be linear over the first and last segments? The 'linearregion' property does that. Be careful at the knots so that it does not get confused and try to force the central region to also be linear.

slm = slmengine(x,y,'plot','on','knots',[-3 -1.5 1.5 3], 'increasing','on', 'linearregion',[-3 -1.6;1.6 3]);

Why did this fail? Because SLM by default is a C2 (twice continuously differentiable) function. So the second derivatives are continuous. That is too much of a constraint at those interior knots here with only 3 segments. By relaxing the C2 continuity, we can do considerably better.

slm = slmengine(x,y,'plot','on','knots', [-3 -1.5 1.5 3], 'increasing','on', 'linearregion',[-3 -1.6;1.6 3], 'c2','off');

If you explored the coefficients of the resulting segments, you would find that indeed the first and last segments were linear, although they are represented as cubics. The first two coefficients are zero (to within floating point trash.) In fact here the first segment reduced to a constant.

pp = slm2pp(slm);

pp.coefs

ans =

-5.8146e-16 1.3602e-15 -5.8398e-16 -0.99183

-0.14716 0.6624 -4.2825e-16 -0.99183

2.7139e-16 -6.6613e-16 0.0011762 0.99655

Throw in a spare knot at zero, and the curve fit starts to look much more like an erf.

slm = slmengine(x,y,'plot','on','knots',[-3 -1.5 0 1.5 3], 'increasing','on', 'linearregion',[-3 -1.6;1.6 3], 'c2','off');

Admittedly, just throwing plenty of knots at it works nicely too.

slm = slmengine(x,y,'plot','on','knots',[-3:.25:3], 'increasing','on');

Matthew ForemanJohn,

This tool is great. I have been using it to perform piecewise regression on movement data from motion capture cameras.

I have one question. I have searched the forums high and low (I'm sure it exists somewhere and I'm just missing it) and have been unable to find a simple way to vary the degree of the splines for a single fit.

For example, I would like to use 4 knots to produce three different segments, and be able to vary the degree of the segments such that the beginning is linear, the middle is cubic, and the end is linear.

Thanks in advance for your help! And again, this tool is great!

Matt

SHEIKHA brilliant work!

Thanks John for re-posting SLM.

John D'ErricoChris - Sorry. It was a typo in the code, repaired and re-submitted, to appear this morning. It only would have manifested when certain combinations of constraints were used, so I missed it in the testing.

Chris van der TogtHi John,

i get an error when using

slmengine in the following way;

slmengine(x,y,'degree',1,'interiorknots', 'free','knots', 3, 'leftslope', 0, 'rightminslope', 0);

??? Error using ==> lsqlin at 202

The number of rows in A must be equal to the length of b.

Error in ==> slmengine>solve_slm_system at 2333

[coef,junk,junk,exitflag,junk,lambda] = ...

when I look at the call to lsqlin, Mineq has three values and rhsineq only two.

Mineq = [0 0.5 -0.5];

rhsineq = [0 0];

Is this a bug, or am I calling slmengine incorrectly.

regards,

John D'ErricoDF: I'm sorry, but not at this time. Some constraints (such as monotonicity) become more difficult to encode for higher order splines. While they can be formulated in a necessary form, it does not truly constrain the curve to have the desired properties.

DFCan SLM use splines of any order? I would like to use it with 4 and 5th order splines.

Thank you!

Britnee crawfordJohn,

I just sent you an email regarding a question about fitting a piecewise constant curve to an approximately normal distribution. But, I thought I would post the question here as well. I need the pw const curve to have 13 pieces, but I want the knots to be freely spaced, but I noticed in your comments that pw constant curves have trouble with this. Is there any way around it?

If not, is there a way to make the last interior knot to be at the start of the flat part of the distribution (the right tail) and then the other pieces are fit to the central part of the curve where the curve is changing rapidly.

Thank you again for your great contributions! I have used your tools in several parts of my dissertation.

Thank you,

Britnee

John D'ErricoRon - Thanks for finding the weights bug. I'll post a new version with a fix. As far as references go, I don't really know of any. The basic philosophy is a somewhat Bayesian one, wherein one uses knowledge of the physical system to be modeled to yield a viable model consistent with your prior information. In practice, I have found it to work splendidly, and have used similar schemes for over 20 years. I've given many talks on that modeling philosophy, but no published papers. I'll argue that what makes SLM work as well as it does is the use of a simple scheme to encode your knowledge of the system into a set of well defined parameters describing the shape of the curve.

Ron AbileahThese routines are very useful. Would like to read more on the mathematical foundation of these routines. Can you recommend reading beyond the basic least square splines references?

Also, I made the following correction concerning handling of weights...

% were there any NaN or inf elements in the data?

k = isnan(x) | isnan(y) | isinf(x) | isinf(y);

if any(k)

% drop them from the analysis

x(k) = [];

y(k) = [];

prescription.Weights(k) = []; % ALSO drop corresponding weights, May 2011

n = length(x);

end

% if weights or errorbars were set, verify the sizes of these

% parameters, compared to the number of data points.

PeteAside from all the awfully clever (advanced) stuff, also a really handy way to find the inflection point in a piecewise linear regression. Neat =)

see: http://www.mathworks.com/matlabcentral/newsreader/view_thread/298989

John D'ErricoIt sounds like you want to allow the knots to vary, but only within set bounds? If so, I might suggest putting the call to slmengine inside a wrapper, an objective function for fmincon. Use it to vary the knot placement, with slmengine now seeing a fixed list of knots.

In fact, this is all slmengine does to vary the knots when you specify free interior knots, so one might argue that I could have offered this as yet another option for slm. The problem is, the interface would be a bit messy, coming up with a way to allow the user to specify bounds on some or all of the knots.

If you read through the code, you will see in the beginning how I call fmincon in that case. Note that I constrain the knots to be distinct.

Ask again if I am mistaken about your goal here.

Kevin stephensBrilliant package, it does almost everything I need as it is, but just wondering how could I insert a radius at the knots in a free piecewise linear hermite? Obviously I could do it post optimisation but there would be a significant error for a larger radius.

Royi AvitalJohn, Thank you for your response.

I will cite as you advised me to.

I'll try to have a look at De Boor's book.

I just wanted the solid math behind the tool.

Hopefully I'll get from there.

One day, If you do write an article or notes about it I'd be happy to read and learn.

Thank you.

John D'ErricoI wish I had written a paper on this topic years ago when I wrote the first version of this tool. I did not do so then, although I have given a few talks on the underlying modeling philosophy of these tools.

The basic idea is simply that of a least squares spline, augmented by a smoothness penalty like a smoothing spline. The smoothness penalty solves the problem of arbitrary (poor) knot selection in many cases.

General least squares splines are covered in depth in the literature, as are smoothing splines. For these fundamental ideas, de Boor is of course the classic reference, a book worth reading for any user of splines.

What the SLM tools add though is something that I've never really seen written about in the literature. This is the idea that intelligently chosen constraints on the curve shape can act as a strong, useful regularizer on your result. They allow you to build your own knowledge about a system into the model, using a simple vocabulary to describe the desired shape of that curve.

I wrote these tools after some years of seeing people using strange nonlinear regression models to fit curves, just because they needed a curve with a given shape. So the user picks some nonlinear sigmoidal form, a Gaussian, etc., for no better reason than that it fits some fundamental desired shape. And of course, nonlinear regression has its own problems, like poor starting values, lack of convergence, etc. Worse, the user finds that the curve shape they chose is not really the correct shape, so they end up with significant lack of fit. Once a viable tool becomes available to fit those curves, I find that far fewer people end up using nonlinear regression models for the wrong reasons.

There are a couple of files that discuss some of these ideas "slm_tutorial.html" and "shape prescriptive modeling.rtf" in the zip file, but that is all I can offer. We also suggested a citation format for tools from the file exchange, if you do need an explicit reference. You can find our recommendations here:

http://matlabwiki.mathworks.com/Citing_Files_from_the_File_Exchange

Royi AvitalIs there any article which could be used as reference to this kind of fitting?

I'd like to use this tool in my project and would like to back it up with some background info.

Thanks for this amazing tool.

Vitaly KoissinDear John, thank you very much in advance!

- If by chance you'll have time to apply this 2nd derivative constraint until the next week, this would be great :-) Then I'll include the better approximation in a coming conference paper.

- you are right; I was just confused by several plateaus observed in the 1st derivative. But as it is already emphasized in the help-file, this means the monotonicity in the sense "=>0' (or "<=0"), not just ">" or "<".

- OK, the total arc length constraint is not vitally important while I have a good number of points to interpolate (and for me it is not a problem to take many points from the test data).

John D'Errico- I should be able to add a constraint on the trend of the curvature. Really, what I'll do is allow the user to constrain the sign of the third derivative. In combination with constraints on the sign of the second derivative, this will allow you to create a function with increasing or decreasing curvature as desired.

- Constraints on monotonic behavior for the first derivative (increasing or decreasing) are already present, in the form of the concaveup or concavedown properties. I'll add a comment in the help to emphasize this fact.

- A constraint on the total arc length of the curve is difficult, since this is a nonlinear constraint. This would force the use of a nonlinear optimizer like fmincon when that constraint is applied.

Vitaly KoissinThank you very much for this tool! I use it to fit the test data for the bent shape of a textile stripe loaded by its own weight, and SLM the best fitting I used for this!

However :-) I would be happy yet more if you extend the files to include the following properties:

1) monotonic increase/decrease of the curvarure (in my case the condition of decreasing curvarure is very important, since I take the 2nd derivative to calculate the bending stiffness of the stripe). Now I need to make a double fiitting: first for the bent shape and then for the slope; otherwise the curvature has too large jumps. But even after this the result is still not perfect :-(

2) controlled length of the curve (similarly to the integral option). This is not so important in my case but could improve the fitting, since this condition has a sound physical sense in my problem).

Property of the monotonic increase/decrease of the 1st derivative could also be useful IMHO.

Best regards,

Vitaly

mathworks2011very powerful.

John D'ErricoNo, the attached e-mail address is correct. I don't recall seeing an e-mail from you, but I do get large amounts of e-mail, so your mail may have been lost.

If by ground truth, you intend to force the curve through a given set of points, this is in there already. You force the curve by setting the 'xy' property. Each row of the supplied array defines one point, as an (x,y) pair. Beware that it may be impossible to solve a problem if you specify several points in a single knot interval.

Since slmengine always returns a spline model, it makes sense that any easy operation mode is best achieved by creating a wrapper for slmengine, that then calls slmeval.

Royi AvitalThis is a great tool.

Yet I'm missing 2 options:

1. "Easy Operation Mode" - insert 3 vectors - x, f(x), x'. The result is the Extrapolated f(x').

2. Setting "Ground Truth" on some data points.

I tried emailing you on something regarding it.

Might be the email address in your profile is wrong.

Thanks for sharing this great tool.

Royi AvitalPete shererAre you planning to extend this excellent tool to more than one input parameters. Something similar to MARS or simpler would provide users with great data surfacing fitting tool.

Andre Guy TranquilleJohn D'ErricoMy thanks to Mark Shore for catching a problem with the curvature constraints that only appeared in some circumstances. The repair has been submitted.

John D'ErricoKay - While I would like to help you (and the many others who have requested confidence intervals) and provide confidence intervals for this tool, they would not be valid much of the time. Whenever any inequality constraints are involved, the standard methods for confidence intervals for a linear regression become inappropriate. And many of the most useful aspects of this tool involve inequality constraints. Worse, things become nastier yet if free knots are estimated.

While I could, in theory, use more sophisticated techniques to provide confidence intervals, these techniques would become quite time consuming.

John

Kaycan the program determine 95% cofidence interval of the estimated parameters?

Mark ShoreThanks John, that's just what I was looking for. Didn't think of checking the FAQs.

To add to my previous comments, the command-line arguments for the SLM functions are terse but logical and intuitive. I needed very accurately interpolated fits to densely measured noisy data with multiple inflection points, and was able to get highly satisfactory results on my first day using these tools.

John D'ErricoCiting files on the FEX seems to come up often enough. We came up with a couple of ideas, and put them on this link:

http://matlabwiki.mathworks.com/Citing_Files_from_the_File_Exchange

Mark ShoreVery impressive indeed. Fast, very flexible and easy to use. And - not a small point - the results respect the data... After just a couple of hours' trial, this has convinced me to purchase the optimization toolbox.

A quick question to John or other FEX veterans, what's the generally accepted way to reference FEX submissions?

Johannes KorsaweIt's really great work. I found it very simple to use and very powerful in the results. Thanks again, John!

KayHow best can one determine the uncertainity or error in the fitted line?

Xavier XavierJohn D'ErricoDave - you found a bug in slmeval. I've just submitted the fix. Thanks for telling me about it.

David HeslopThank you John this is wonderful package, but I think there may be an issue with slmeval when using the inverse of a spline which is monotonically decreasing.

As a simple example, when I work with data with a positive relationship everything is okay:

%first working with a simple positive relationship

X=sort(randn(20,1));

Y=X+10;

slm = slmengine(X,Y,'plot','on');

%find the Y value for X=0.5

Yhat=slmeval(0.5,slm,0);

%and work in the opposite direction

Xhat=slmeval(Yhat,slm,-1)

Xhat =

0.5000

But when I try them same thing for a negative relationship a NaN is returned:

%now work with a negative relationship

X=sort(randn(20,1));

Y=-X-10;

slm = slmengine(X,Y,'plot','on');

%find the Y value for X=0.5

Yhat=slmeval(0.5,slm,0)

%and work in the opposite direction

Xhat=slmeval(Yhat,slm,-1)

Xhat =

NaN

Of course this is just a simply example and the problem I'm working involves more complex relationships. Maybe I've misunderstood the use of slmeval, but any advice you could offer would be great,

thanks, Dave

Juho JalavaThank you John, I'm very, very pleased with the results!

Raymond ChengThanks for your sharing.

JamesJohn D'ErricoThe new version just got uploaded to repair the 'active-set' problems incurred with the newer optimization toolbox releases.

Chris van der TogtVery nice John, I will cite you in our papers.

I also got this warning:

Warning: Options LargeScale = 'off' and Algorithm = 'trust-region-reflective' conflict.

Ignoring Algorithm and running active-set method. To run trust-region-reflective, set

LargeScale = 'on'. To run active-set without this warning, use Algorithm = 'active-set'.

and added this line after line numer 158 in your slmengine.m file;

fminconoptions.Algorithm = 'active-set';

now I don't get the warning anymore.

John D'ErricoThe warning message that fmincon returns is new, due apparently to a change in the optimization toolbox. It is only a warning though, that does not hurt the operation of the code itself.

I'll fix the problem.

Didi CvetHi

I think that this kind of toolbox is great idea and I am doing some tests over this file. I have very specific data points and while doing my experiments I've tried to use 'knots', 'free' option but I get this warning message:

Warning: Options LargeScale = 'off' and Algorithm = 'trust-region-reflective' conflict.

Ignoring Algorithm and running active-set method. To run trust-region-reflective, set

LargeScale = 'on'. To run active-set without this warning, use Algorithm = 'active-set'.

I've tried to add this row

>>fminconoptions.Algorithm = 'Active-set';

in your slmengine.m code but it doesn't work.

Thanks for shearing this!

Best wishes

Dijana

John D'ErricoFabian - This is more difficult to solve. Ordinarily, one would simply fit x(t) and y(t) independently. However, your constraint is on the term

sqrt((dx/dt)^2 + (dy/dt)^2)

Do you need a cubic result? If so, then it is more complex yet, since any overall slope type of constraint is bad enough to formulate for a cubic.

I might use a brute force approach. Use a pair of independent models, x(t), y(t). You can put a global constraint on dx/dt and dy/dt on these models, limiting the maximum and minimum slopes attained.

Now, go back, and test the actual velocity when the two curves are united into a parametric path in the (x,y) plane. If sqrt((dx/dt)^2 + (dy/dt)^2) never exceeds your velocity limit, then you are done. Otherwise, you will now need to use fmincon to perturb the parameters of the splines, while minimizing the global sum of squares of errors to (x(t), y(t)).

The constraints for this will clearly be nonlinear. You might set one constraint at every point, but this will not constrain the true maximum velocity attained. So you might sample each curve at perhaps 1000 points, returning 1000 nonlinear constraints along the curve. This will give you a necessary condition on the velocity, but it need not be truly sufficient. Thus it might exceed the aim max velocity by a tiny amount.

Finally, the optimization over the spline parameter space will also have other linear constraints on those parameters. I suppose one could (if you were adventuresome) go into the SLM code to extract (and return) the actual equations used to estimate the model as it is sent to lsqlin. Fmincon would need to employ those constraints too.

HTH,

John

Fabian KloostermanDear John,

I wonder if it is possible with your functions to fit a spline to a set of (x,y) coordinates over time (each data point also has an associated weight). I want to constrain the velocity of the fitted spline trajectory, which means I can't just fit (x,t) and (y,t) separately. If this is not possible with your SLM tools, do you have any suggestions how to approach this problem?

Thanks, Fabian

Jeem79 OlsenI just used your package to fit my stress-strain curves obtained from image acquisition and processing, and it works perfectly. Thanks for a great job! However, is it possible to extract only the fitted curve? I didn't see an obvious option for that in your documentation.

Jeem79

JoshuaJohn D'ErricoPete - A goal to be finished as soon as I can is to write a gui that will wrap SLM inside. Note that the computational tool in SLM is called SLMENGINE. This was purposeful, since my expectation is that most users would use the gui form when I am able to offer it. So I have definitely been planning for a gui wrapper.

Even at that though, SLM would not have allowed you to just draw a curve through your data freehand, and then return the coefficients of that curve. A freehand drawn curve may be arbitrarily complex, or not even a single-valued function. SLM is best when you fit the curve, then apply your own special knowledge of a system to be modeled, in the form of constraints on the overall shape of the underlying functional form. But a drawn curve to follow is too broad of a constraint to follow. So SLM might not be capable in general of returning such a functional form in that eventuality.

As well, this becomes a unique problem. Is the problem then to find the curve that fits the original data, and has the general shape of the freehand drawn curve? This involves two separate problems of approximation. It seems one must use a tool to smooth and approximate the drawn curve. Then take that same tool and try to fit the drawn curve through the data? This task would take some serious amount of effort, and would never be possible to do so where it would be transparent to the user. Worse, suppose the curve in the end had some lack of fit to the data (as perceived by the user?) Where did the lack of fit arise? Is it lack of fit to the freehand drawn curve? Or is it lack of fit to the data itself?

Pete shererWould it be possible to add the GUI such that users can simply drag/adjust the fitted line in the way they want. The scripts then return the functional form and coefficients of the adjusted line. Maybe as an additional script or something.

Peter SimonA superb package. Well done! It is perfect for fitting a curve to in-orbit test (IOT) antenna pattern measured data points, used in validating the performance of our satellite antennas. Thanks very much.

Peter Simon

Space Systems/Loral

Antenna Subsystems Operations

weeS B