Code covered by the BSD License  

Highlights from
SLM - Shape Language Modeling

5.0

5.0 | 21 ratings Rate this file 67 Downloads (last 30 days) File Size: 326.37 KB File ID: #24443
image thumbnail

SLM - Shape Language Modeling

by John D'Errico

 

15 Jun 2009 (Updated 20 Dec 2010)

Least squares spline modeling using shape primitives

| Watch this File

File Information
Description

If you could only download one curve fitting tool to your laptop on a desert island, this should be it.

For many years I have recommended that people use least squares splines for their curve fits, with a caveat. Splines offer tremendous flexibility to build a curve in any shape or form. They can nicely fit almost any set of data you will throw at them. This same flexibility is their downfall at times too. Like polynomial models, splines can be too flexible if you are not careful. The trick is to bring your knowledge of the system under study to the problem.

As a scientist, engineer, data analyst, etc., you often have knowledge of a process that you wish to model. Sometimes that knowledge comes from physical principles, sometimes it arises from experience, and sometimes the knowledge just comes from looking at a plot of the data. Regardless of the source, we often want to build in this prior knowledge of a process into our modeling efforts. This is perhaps the biggest reason why nonlinear regression tools are used, and I'll argue, the worst reason. If you are fitting a sigmoid function to your data only because it happens to be monotone and your data appear to have that property, then you have made the wrong choice of modeling tool. (If you are fitting a sigmoid because this is known to be the proper model for your process, then go ahead and fit the sigmoid.)

I'll argue the proper tool when you merely need a monotonic curve fit is a least squares spline, but a spline that is properly constrained to have the fundamental shape you know to be there. This is a very Bayesian approach to modeling, and a very useful one in my experience.

The SLM tools provided here give you an easy to use interface to build an infinite number of curve types from data. SLM stands for Shape Language Modeling. The idea is to provide a prescription for a curve fit using a set of shape primitives. If your curve is monotone, then build that information into the model, so you can estimate the monotone curve that best fits your data. What you will find is that once you employ the proper set of constraints, you will wonder why you ever used nonlinear regression in the past!!!

For example, the screenshot for this file was generated for the following data:

x = (sort(rand(1,100)) - 0.5)*pi;
y = sin(x).^5 + randn(size(x))/10;

slm = slmengine(x,y,'plot','on','knots',10,'increasing','on', ...
'leftslope',0,'rightslope',0)
slm =
            form: 'slm'
          degree: 3
           knots: [10x1 double]
            coef: [10x2 double]
    prescription: [1x1 struct]
               x: [100x1 double]
               y: [100x1 double]

You can evaluate the spline or its derivatives using slmeval.

slmeval(1.3,slm)
ans =
      0.79491

You plot these splines using plotslm.

plotslm(slm)

The plotslm function is nice because it is a simple gui, allowing you to plot the curve, residuals, its derivatives or the integral. You can also evaluate various parameters of the spline, such as the maximum function value over an interval, the minimum or maximum slope, etc.

slmpar(slm,'maxslope')
ans =
       1.5481

You provide all this information to slmengine using a property/value pair interface. slmset mediates this interaction, so you can use it to create the set of properties that will be used. The default set of properties and their values are given by slmset. Everything about the shape, slopes, curvature, values, etc., about your function can be controlled by a simple command. SLMENGINE also offers the ability to generate splines of various orders, as well as free knot splines.

For a complete set of examples of the SLM tools in action, see the included published tutorial with this submission. There is also a small treatise included on the concept of Shape Language Modeling for curve fitting.

The SLM toolkit will be considerably improved over the next few months. I expect to add a graphical interface, as well as at least one helper application. As well, if I have missed any natural shape primitives, please let me know. While I have tried to be very inclusive, surely there is something I've missed. If I can add your favorite to the list above I will do so.

Finally, the SLM tools require the optimization toolbox to solve the various estimation problems.

Required Products Optimization Toolbox
MATLAB release MATLAB 7.5 (R2007b)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (45)
17 Jun 2009 S B  
19 Jun 2009 wee  
30 Jul 2009 Peter Simon

A superb package. Well done! It is perfect for fitting a curve to in-orbit test (IOT) antenna pattern measured data points, used in validating the performance of our satellite antennas. Thanks very much.

Peter Simon
Space Systems/Loral
Antenna Subsystems Operations

15 Sep 2009 Pete sherer

Would it be possible to add the GUI such that users can simply drag/adjust the fitted line in the way they want. The scripts then return the functional form and coefficients of the adjusted line. Maybe as an additional script or something.

16 Sep 2009 John D'Errico

Pete - A goal to be finished as soon as I can is to write a gui that will wrap SLM inside. Note that the computational tool in SLM is called SLMENGINE. This was purposeful, since my expectation is that most users would use the gui form when I am able to offer it. So I have definitely been planning for a gui wrapper.

Even at that though, SLM would not have allowed you to just draw a curve through your data freehand, and then return the coefficients of that curve. A freehand drawn curve may be arbitrarily complex, or not even a single-valued function. SLM is best when you fit the curve, then apply your own special knowledge of a system to be modeled, in the form of constraints on the overall shape of the underlying functional form. But a drawn curve to follow is too broad of a constraint to follow. So SLM might not be capable in general of returning such a functional form in that eventuality.

As well, this becomes a unique problem. Is the problem then to find the curve that fits the original data, and has the general shape of the freehand drawn curve? This involves two separate problems of approximation. It seems one must use a tool to smooth and approximate the drawn curve. Then take that same tool and try to fit the drawn curve through the data? This task would take some serious amount of effort, and would never be possible to do so where it would be transparent to the user. Worse, suppose the curve in the end had some lack of fit to the data (as perceived by the user?) Where did the lack of fit arise? Is it lack of fit to the freehand drawn curve? Or is it lack of fit to the data itself?

17 Sep 2009 Joshua  
01 Oct 2009 Jeem79 Olsen

I just used your package to fit my stress-strain curves obtained from image acquisition and processing, and it works perfectly. Thanks for a great job! However, is it possible to extract only the fitted curve? I didn't see an obvious option for that in your documentation.

Jeem79

03 Oct 2009 Fabian Kloosterman

Dear John,

I wonder if it is possible with your functions to fit a spline to a set of (x,y) coordinates over time (each data point also has an associated weight). I want to constrain the velocity of the fitted spline trajectory, which means I can't just fit (x,t) and (y,t) separately. If this is not possible with your SLM tools, do you have any suggestions how to approach this problem?

Thanks, Fabian

04 Oct 2009 John D'Errico

Fabian - This is more difficult to solve. Ordinarily, one would simply fit x(t) and y(t) independently. However, your constraint is on the term

sqrt((dx/dt)^2 + (dy/dt)^2)

Do you need a cubic result? If so, then it is more complex yet, since any overall slope type of constraint is bad enough to formulate for a cubic.

I might use a brute force approach. Use a pair of independent models, x(t), y(t). You can put a global constraint on dx/dt and dy/dt on these models, limiting the maximum and minimum slopes attained.

Now, go back, and test the actual velocity when the two curves are united into a parametric path in the (x,y) plane. If sqrt((dx/dt)^2 + (dy/dt)^2) never exceeds your velocity limit, then you are done. Otherwise, you will now need to use fmincon to perturb the parameters of the splines, while minimizing the global sum of squares of errors to (x(t), y(t)).

The constraints for this will clearly be nonlinear. You might set one constraint at every point, but this will not constrain the true maximum velocity attained. So you might sample each curve at perhaps 1000 points, returning 1000 nonlinear constraints along the curve. This will give you a necessary condition on the velocity, but it need not be truly sufficient. Thus it might exceed the aim max velocity by a tiny amount.

Finally, the optimization over the spline parameter space will also have other linear constraints on those parameters. I suppose one could (if you were adventuresome) go into the SLM code to extract (and return) the actual equations used to estimate the model as it is sent to lsqlin. Fmincon would need to employ those constraints too.

HTH,
John

16 Oct 2009 Didi Cvet

Hi
I think that this kind of toolbox is great idea and I am doing some tests over this file. I have very specific data points and while doing my experiments I've tried to use 'knots', 'free' option but I get this warning message:

Warning: Options LargeScale = 'off' and Algorithm = 'trust-region-reflective' conflict.
Ignoring Algorithm and running active-set method. To run trust-region-reflective, set
LargeScale = 'on'. To run active-set without this warning, use Algorithm = 'active-set'.

I've tried to add this row
>>fminconoptions.Algorithm = 'Active-set';
in your slmengine.m code but it doesn't work.

Thanks for shearing this!
Best wishes
Dijana

16 Oct 2009 John D'Errico

The warning message that fmincon returns is new, due apparently to a change in the optimization toolbox. It is only a warning though, that does not hurt the operation of the code itself.

I'll fix the problem.

20 Oct 2009 Chris

Very nice John, I will cite you in our papers.

I also got this warning:
Warning: Options LargeScale = 'off' and Algorithm = 'trust-region-reflective' conflict.
Ignoring Algorithm and running active-set method. To run trust-region-reflective, set
LargeScale = 'on'. To run active-set without this warning, use Algorithm = 'active-set'.

and added this line after line numer 158 in your slmengine.m file;
fminconoptions.Algorithm = 'active-set';

now I don't get the warning anymore.

29 Oct 2009 John D'Errico

The new version just got uploaded to repair the 'active-set' problems incurred with the newer optimization toolbox releases.

02 Nov 2009 James  
21 Nov 2009 Raymond Cheng

Thanks for your sharing.

03 Dec 2009 Juho Jalava

Thank you John, I'm very, very pleased with the results!

10 Dec 2009 David Heslop

Thank you John this is wonderful package, but I think there may be an issue with slmeval when using the inverse of a spline which is monotonically decreasing.

As a simple example, when I work with data with a positive relationship everything is okay:

%first working with a simple positive relationship
X=sort(randn(20,1));
Y=X+10;
slm = slmengine(X,Y,'plot','on');

%find the Y value for X=0.5
Yhat=slmeval(0.5,slm,0);

%and work in the opposite direction
Xhat=slmeval(Yhat,slm,-1)

Xhat =

    0.5000

But when I try them same thing for a negative relationship a NaN is returned:

%now work with a negative relationship
X=sort(randn(20,1));
Y=-X-10;
slm = slmengine(X,Y,'plot','on');

%find the Y value for X=0.5
Yhat=slmeval(0.5,slm,0)

%and work in the opposite direction
Xhat=slmeval(Yhat,slm,-1)

Xhat =

   NaN

Of course this is just a simply example and the problem I'm working involves more complex relationships. Maybe I've misunderstood the use of slmeval, but any advice you could offer would be great,
thanks, Dave

10 Dec 2009 John D'Errico

Dave - you found a bug in slmeval. I've just submitted the fix. Thanks for telling me about it.

13 Dec 2009 Xavier Xavier  
21 Dec 2009 Kay

How best can one determine the uncertainity or error in the fitted line?

21 Dec 2009 Johannes Korsawe

It's really great work. I found it very simple to use and very powerful in the results. Thanks again, John!

08 Jan 2010 Mark Shore

Very impressive indeed. Fast, very flexible and easy to use. And - not a small point - the results respect the data... After just a couple of hours' trial, this has convinced me to purchase the optimization toolbox.

A quick question to John or other FEX veterans, what's the generally accepted way to reference FEX submissions?

11 Jan 2010 John D'Errico

Citing files on the FEX seems to come up often enough. We came up with a couple of ideas, and put them on this link:

http://matlabwiki.mathworks.com/Citing_Files_from_the_File_Exchange

11 Jan 2010 Mark Shore

Thanks John, that's just what I was looking for. Didn't think of checking the FAQs.

To add to my previous comments, the command-line arguments for the SLM functions are terse but logical and intuitive. I needed very accurately interpolated fits to densely measured noisy data with multiple inflection points, and was able to get highly satisfactory results on my first day using these tools.

31 Jan 2010 Kay

can the program determine 95% cofidence interval of the estimated parameters?

02 Feb 2010 John D'Errico

Kay - While I would like to help you (and the many others who have requested confidence intervals) and provide confidence intervals for this tool, they would not be valid much of the time. Whenever any inequality constraints are involved, the standard methods for confidence intervals for a linear regression become inappropriate. And many of the most useful aspects of this tool involve inequality constraints. Worse, things become nastier yet if free knots are estimated.

While I could, in theory, use more sophisticated techniques to provide confidence intervals, these techniques would become quite time consuming.

John

13 Feb 2010 John D'Errico

My thanks to Mark Shore for catching a problem with the curvature constraints that only appeared in some circumstances. The repair has been submitted.

08 Mar 2010 Andre Guy Tranquille  
22 Mar 2010 Pete sherer

Are you planning to extend this excellent tool to more than one input parameters. Something similar to MARS or simpler would provide users with great data surfacing fitting tool.

16 Apr 2010 Royi Avital  
16 Apr 2010 Royi Avital

This is a great tool.
Yet I'm missing 2 options:
1. "Easy Operation Mode" - insert 3 vectors - x, f(x), x'. The result is the Extrapolated f(x').
2. Setting "Ground Truth" on some data points.

I tried emailing you on something regarding it.
Might be the email address in your profile is wrong.

Thanks for sharing this great tool.

16 Apr 2010 John D'Errico

No, the attached e-mail address is correct. I don't recall seeing an e-mail from you, but I do get large amounts of e-mail, so your mail may have been lost.

If by ground truth, you intend to force the curve through a given set of points, this is in there already. You force the curve by setting the 'xy' property. Each row of the supplied array defines one point, as an (x,y) pair. Beware that it may be impossible to solve a problem if you specify several points in a single knot interval.

Since slmengine always returns a spline model, it makes sense that any easy operation mode is best achieved by creating a wrapper for slmengine, that then calls slmeval.

02 Jun 2010 mathworks2011

very powerful.

19 Jul 2010 Vitaly Koissin

Thank you very much for this tool! I use it to fit the test data for the bent shape of a textile stripe loaded by its own weight, and SLM the best fitting I used for this!

However :-) I would be happy yet more if you extend the files to include the following properties:

1) monotonic increase/decrease of the curvarure (in my case the condition of decreasing curvarure is very important, since I take the 2nd derivative to calculate the bending stiffness of the stripe). Now I need to make a double fiitting: first for the bent shape and then for the slope; otherwise the curvature has too large jumps. But even after this the result is still not perfect :-(

2) controlled length of the curve (similarly to the integral option). This is not so important in my case but could improve the fitting, since this condition has a sound physical sense in my problem).

Property of the monotonic increase/decrease of the 1st derivative could also be useful IMHO.

Best regards,
Vitaly

20 Jul 2010 John D'Errico

- I should be able to add a constraint on the trend of the curvature. Really, what I'll do is allow the user to constrain the sign of the third derivative. In combination with constraints on the sign of the second derivative, this will allow you to create a function with increasing or decreasing curvature as desired.

- Constraints on monotonic behavior for the first derivative (increasing or decreasing) are already present, in the form of the concaveup or concavedown properties. I'll add a comment in the help to emphasize this fact.

- A constraint on the total arc length of the curve is difficult, since this is a nonlinear constraint. This would force the use of a nonlinear optimizer like fmincon when that constraint is applied.

21 Jul 2010 Vitaly Koissin

Dear John, thank you very much in advance!

- If by chance you'll have time to apply this 2nd derivative constraint until the next week, this would be great :-) Then I'll include the better approximation in a coming conference paper.

- you are right; I was just confused by several plateaus observed in the 1st derivative. But as it is already emphasized in the help-file, this means the monotonicity in the sense "=>0' (or "<=0"), not just ">" or "<".

- OK, the total arc length constraint is not vitally important while I have a good number of points to interpolate (and for me it is not a problem to take many points from the test data).

30 Jul 2010 Royi Avital

Is there any article which could be used as reference to this kind of fitting?
I'd like to use this tool in my project and would like to back it up with some background info.

Thanks for this amazing tool.

30 Jul 2010 John D'Errico

I wish I had written a paper on this topic years ago when I wrote the first version of this tool. I did not do so then, although I have given a few talks on the underlying modeling philosophy of these tools.

The basic idea is simply that of a least squares spline, augmented by a smoothness penalty like a smoothing spline. The smoothness penalty solves the problem of arbitrary (poor) knot selection in many cases.

General least squares splines are covered in depth in the literature, as are smoothing splines. For these fundamental ideas, de Boor is of course the classic reference, a book worth reading for any user of splines.

What the SLM tools add though is something that I've never really seen written about in the literature. This is the idea that intelligently chosen constraints on the curve shape can act as a strong, useful regularizer on your result. They allow you to build your own knowledge about a system into the model, using a simple vocabulary to describe the desired shape of that curve.

I wrote these tools after some years of seeing people using strange nonlinear regression models to fit curves, just because they needed a curve with a given shape. So the user picks some nonlinear sigmoidal form, a Gaussian, etc., for no better reason than that it fits some fundamental desired shape. And of course, nonlinear regression has its own problems, like poor starting values, lack of convergence, etc. Worse, the user finds that the curve shape they chose is not really the correct shape, so they end up with significant lack of fit. Once a viable tool becomes available to fit those curves, I find that far fewer people end up using nonlinear regression models for the wrong reasons.

There are a couple of files that discuss some of these ideas "slm_tutorial.html" and "shape prescriptive modeling.rtf" in the zip file, but that is all I can offer. We also suggested a citation format for tools from the file exchange, if you do need an explicit reference. You can find our recommendations here:

http://matlabwiki.mathworks.com/Citing_Files_from_the_File_Exchange

30 Jul 2010 Royi Avital

John, Thank you for your response.
I will cite as you advised me to.

I'll try to have a look at De Boor's book.
I just wanted the solid math behind the tool.
Hopefully I'll get from there.

One day, If you do write an article or notes about it I'd be happy to read and learn.

Thank you.

04 Dec 2010 Kevin

Brilliant package, it does almost everything I need as it is, but just wondering how could I insert a radius at the knots in a free piecewise linear hermite? Obviously I could do it post optimisation but there would be a significant error for a larger radius.

04 Dec 2010 John D'Errico

It sounds like you want to allow the knots to vary, but only within set bounds? If so, I might suggest putting the call to slmengine inside a wrapper, an objective function for fmincon. Use it to vary the knot placement, with slmengine now seeing a fixed list of knots.

In fact, this is all slmengine does to vary the knots when you specify free interior knots, so one might argue that I could have offered this as yet another option for slm. The problem is, the interface would be a bit messy, coming up with a way to allow the user to specify bounds on some or all of the knots.

If you read through the code, you will see in the beginning how I call fmincon in that case. Note that I constrain the knots to be distinct.

Ask again if I am mistaken about your goal here.

04 Jan 2011 Pete

Aside from all the awfully clever (advanced) stuff, also a really handy way to find the inflection point in a piecewise linear regression. Neat =)

see: http://www.mathworks.com/matlabcentral/newsreader/view_thread/298989

31 May 2011 Ron Abileah

These routines are very useful. Would like to read more on the mathematical foundation of these routines. Can you recommend reading beyond the basic least square splines references?

Also, I made the following correction concerning handling of weights...

% were there any NaN or inf elements in the data?
k = isnan(x) | isnan(y) | isinf(x) | isinf(y);
if any(k)
  % drop them from the analysis
  x(k) = [];
  y(k) = [];
  prescription.Weights(k) = []; % ALSO drop corresponding weights, May 2011
  n = length(x);
end

% if weights or errorbars were set, verify the sizes of these
% parameters, compared to the number of data points.

01 Jun 2011 John D'Errico

Ron - Thanks for finding the weights bug. I'll post a new version with a fix. As far as references go, I don't really know of any. The basic philosophy is a somewhat Bayesian one, wherein one uses knowledge of the physical system to be modeled to yield a viable model consistent with your prior information. In practice, I have found it to work splendidly, and have used similar schemes for over 20 years. I've given many talks on that modeling philosophy, but no published papers. I'll argue that what makes SLM work as well as it does is the use of a simple scheme to encode your knowledge of the system into a set of well defined parameters describing the shape of the curve.

10 Aug 2011 Britnee crawford

John,

I just sent you an email regarding a question about fitting a piecewise constant curve to an approximately normal distribution. But, I thought I would post the question here as well. I need the pw const curve to have 13 pieces, but I want the knots to be freely spaced, but I noticed in your comments that pw constant curves have trouble with this. Is there any way around it?

If not, is there a way to make the last interior knot to be at the start of the flat part of the distribution (the right tail) and then the other pieces are fit to the central part of the curve where the curve is changing rapidly.

Thank you again for your great contributions! I have used your tools in several parts of my dissertation.

Thank you,
Britnee

Please login to add a comment or rating.
Updates
20 Dec 2010

added one new property, also fixed a bug when lse was used with certain constraints.

Contact us at files@mathworks.com