MATLAB Examples

The following script is a tutorial on the methodology of what I call "Shape Prescriptive Modeling". Its a tool for modeling a function of a single variable

`  y = f(x) + error`

The actual model will be in the form of a least squares spline. Shape prescriptive modeling is really a way of thinking about a model - it has a lot of very Bayesian underpinnings. I'll get to that in a bit. First, lets talk about modeling, who builds models, and why we choose a specific model form.

Why do we, as scientists and engineers, fit a curve to data? There are really two basic reasons why, if we ignore the common "My boss told me to do this." The first such reason is for purely predictive value. We may wish to reduce some set of data to a simply accessible functional relationship for future use. The process of modeling may allow us to smooth the data, producing a smooth curve where only distinct data points fell before. As well, this smooth curve is now a platform for interpolation. Note that if we have smoothed the data, removing unwanted noise that was present in the data, then the interpolant is more an approximant. I.e., one will not interpolate the data, but a smoother form of it.

The second common goal of curve fitting is often to predict/estimate some parameter of the relationship, such as a minimum or maximum value where the curve approaches an asymptote, or perhaps a maximum slope or a rate parameter.

If we are to fit a model to data, what kind of models do we use, and how does the model class reflect our goals in the modeling process? To answer this question, I'll first look at the common modeling tools we might see. First and most primeval is polyfit. Polyfit allows us to fit a polynomial model to our data. Such a model tells us little about our process beyond perhaps a simple slope. These models are really of value only for very simple relationships, where we are willing to invest little in the modeling process. Many of the individuals who use polynomial modeling tools do so for the wrong reasons. They use higher order polynomial fits to get a better approximation to their data, not realizing the problems inherent with those higher order polynomials.

At the other end of the spectrum, one sees individuals using nonlinear regression to fit a large variety of curves to data. Exponential terms, gaussian-like modes, sigmoid shapes, as well as many more. Did these model forms result from valid mechanistic arguments? I.e., has the analyst developed a real model of the process under study, where all that is left is to estimate the parameters of that model from data?

Sometimes it is so, but far more often the analyst individual knows something about the basic underlying process, and the model chosen has the correct basic functional shape. I'll argue that truly valid mechanistic models are nearly as rare as hen's teeth.

There is a subtle variation on the mechanistic model. I call it the metaphoric or metaphorical model. Here we use a mathematical model of one process that we do understand, used as a model for a process that we do not really understand. My favorite examples of such metaphorical models are cubic splines, where a mathematical model of a thin flexible beam is used to predict many other processes. A second good example is the use of epidemiological models as used to predict sales of some product. There are many other examples of course.

An advantage of a metaphorical model is it may help us to understand/ predict/even extrapolate future behavior based on our knowledge of the metaphor. Many of the nonlinear regression models that we see built in Matlab are of the shape variety. I.e., we see a bell shaped curve, so we fit a gaussian to it. We see a function with a lower and upper constant asymptote, and we fit some variety of logistic function, or an erf variant to it. If the fit is inadequate for our purposes, we try another functional form. I'll argue that this group of individuals is often using nonlinear regression for the wrong reason. They simply want some curve that behaves the way they know it should. Examples of this might be a photographic film modelor, who knows the film darkens as more photons land upon it, or a biologist, who knows that the count of bacteria in a petri dish will grow with time. (All of these examples have limits, but a good scientist/engineer will know those limits.)

The fact is, we often understand something about a process we are trying to model. What we may not have is the ability to build that knowledge and understanding into our model. Even if we know the underlying functional relationship is monotone increasing/decreasing, how do we estimate a model that has that behavior built into it? This is where shape prescriptive modeling shines. It is really a very Bayesian concept. It is the process of quantifying our basic knowledge of a process in mathematical terms, and then estimating a predictive model of the process that has that knowledge built into it. Our knowledge becomes a prescription for the final shape of the process. Sometimes we know much about a process, in which case we have a detailed prescription. Other times we have a very sketchy prescription.

The slm tools in this toolkit are designed to help the user to embed their knowledge into a model using a langauage of shape. One builds a prescription for the desired shape, then combines it with their data to build a model. These steps can be done distinctly, where an explicit prescription is created, then applied with a modeling engine to data, or the prescription can be built in a fluid manner using a gui. Here a point and click interface can provide the information necessary.

Some final notes before we begin actual experimentation: This toolkit assumes possession of the optimization toolbox.

Also, all examples will use the commandline engine for modeling, instead of the gui interface. One good reason is that at the time of writing this tutorial, I had not finished writing the gui. A better reason is that anything that can be done via the gui is also possible to do using the computational engine, and cell mode favors this approach.

Finally, ...

## Some data

Lets start out by creating some data. A simple functional form that we understand the shape of.

```n = 100; x1 = sort(rand(n,1)); y1 = exp(x1) + randn(n,1)/20; plot(x1,y1,'ro') ```

## Polyfit?

First, lets build a simple model using polyfit.

```P = polyfit(x1,y1,6); xev = 0:.001:1; yev = polyval(P,xev); ```

and plot the results on top of the data

```plot(xev,yev,'g-',x1,y1,'ro') ```

Are you happy with the resulting curve? Perhaps. Much depends upon our goals in building a model for this process. How good of a fit do we really need, how much noise is there in the data? Does the model behave as we believe it should? Does this model have lack of fit? Actually, it looks pretty good to me, at least given the amount of noise in the data. But see what happens when I use a high order polynomial model.

This one was just too much, with condition problems.

```P = polyfit(x1,y1,15); xev = 0:.001:1; yev = polyval(P,xev); ```
```Warning: Polynomial is badly conditioned. Add points with distinct X values, reduce the degree of the polynomial, or try centering and scaling as described in HELP POLYFIT. ```

and plot the results on top of the data

```plot(xev,yev,'g-',x1,y1,'ro') ```

## Less data, more noise, with polyfit

Lets try again, only this time we'll use less data, with a much higher level of noise.

```close n = 30; x2 = sort(rand(n,1)); y2 = exp(x2) + randn(n,1)/3; plot(x2,y2,'ro') P = polyfit(x2,y2,6); xev = linspace(min(x2),max(x2),100); yhat = polyval(P,xev); hold on plot(xev,yhat,'g-') hold off ```

Its a clear example of overfitting this noisy data.

## A piecewise linear SLM fit

Now lets try using the slm tools. After all, its why we are working through this tutorial.

```slm = slmengine(x2,y2,'degree',1,'plot','on') ```
```slm = form: 'slm' degree: 1 knots: [6x1 double] coef: [6x1 double] prescription: [1x1 struct] x: [30x1 double] y: [30x1 double] ```

note that the result from slmengine is a structure. It contains 7 fields, and has much the same information as the 'pp' form does.

'form' is (in this case) 'slm'. This means that the function this structure contains is a slm model. We could have had it return a pp form instead.

'degree' is the polynomial degree of the piecewise segments. 1 indicates a piecewise linear function, continuous across the breakpoints (or knots.) Note that the splines toolbox tends to use the term 'order', one more than the degree.

'knots' contains the set of knots or breakpoints. The default is to use 6 equally spaced knots, from min to max of the data.

'coef' is the set of coefficients of the spline model. I've used a Hermite form here. Its an easy one to understand, as well as to look at the coefficients and visualize the shape of the curve. The field coef contains the value of the function at the corresponding knot.

'prescription' is the default prescription. A bayesian might call it an uninformative prior. Its the list of all settings for every optional parameter one can set for a model. Since we specified nothing but the data in our call to slmengine, we get all defaults.

Finally, there are fields for x and y. The model thus contains the data used to build it. This serves a documentation purpose but it also allows plotslm to work well later.

## Shape prescriptions

We can look at the parameters in slm.prescription, but now is as good a tiem as any to see what slmset does for us. Here are the default set of properties.

```prescription = slmset ```
```prescription = C2: '' ConcaveDown: 'off' ConcaveUp: 'off' ConstantRegion: [] Decreasing: 'off' Degree: 3 EndConditions: 'estimate' Envelope: 'off' ErrorBar: [] Increasing: 'off' Integral: [] InteriorKnots: 'fixed' Knots: 6 LeftMaxSlope: [] LeftMaxValue: [] LeftMinSlope: [] LeftMinValue: [] LeftSlope: [] LeftValue: [] LinearRegion: [] MaxSlope: [] MaxValue: [] MinSlope: [] MinValue: [] NegativeInflection: [] Order: [] Plot: 'off' PositiveInflection: [] Regularization: 0.0001 Result: 'slm' RightMaxSlope: [] RightMaxValue: [] RightMinSlope: [] RightMinValue: [] RightSlope: [] RightValue: [] Scaling: 'on' SimplePeak: [] SimpleValley: [] Verbosity: 0 Weights: [] XY: [] XYP: [] XYPP: [] ```

slmset is the tool that allows us to specify a variety of pieces of information to the engine.

## Use of slmset

Help slmset will list all the options. It works on a property/value pair interface. There are a wide variety of properties that one can set, enough to satisfy almost any type of information a user can bring to a modeling problem. Here are the options one can set:

```help slmset ```
``` slmset: defines the shape prescription for a model usage 1: prescription=slmset(prop1,val1,pro2,val2,...) usage 2: prescription=slmset(prescription,prop1,val1,pro2,val2,...) A set of property/value pairs are parsed into a structure to use as the prescription for slmengine and slmfit. arguments: prop, val - property/value pairs (see below for the complete list) There is no limit on the upper number of property/value pairs allowed. Property names may be shortened, as long as they are unambiguous. Case is ignored. prescription - shape prescription structure, defining that which the user is willing to say about the model. The fields of the prescription structure reflect the defaults, plus any modifications specified. This structure will control the behaviours built a curve fit by slmengine or the defaults for slmfit. Property/value pairs: Properties are character strings, chosen to be mnemonic of their purpose. Any property name may be shortened, as long as the shortened string is unambiguous. Thus since no other property starts with the letter k, any of these alternatives are acceptable shortenings for the 'knots' property: 'knot', 'kno', 'kn' or 'k'. In the event that a given property is assigned more than once in the list of property/value pairs, only the last value in the list is assigned. Property names & admissable values: 'concavedown' - controls curvature of the function = 'off' --> No part of the spline is constrained to be a concave down function (i.e., a negative second derivative.) = 'on' --> f''(x) >= 0 over the entire domain of the spline. = vector of length 2 denoting the start and end points of a region of the spline over which the second derivative is negative. = array of size nx2, each row of which denotes the start and end points of a region of the spline over which the second derivative is negative. DEFAULT VALUE: 'off' Comment: curvature properties do not apply to piecewise constant functions. 'concaveup' - controls curvature of the function = 'off' --> No part of the spline is constrained to be a concave up function (i.e., a positive second derivative.) = 'on' --> f''(x) >= 0 over the entire domain of the spline. = vector of length 2 denoting the start and end points of a region of the spline over which the second derivative is positive. = array of size nx2, each row of which denotes the start and end points of a region of the spline over which the second derivative is positive. DEFAULT VALUE: 'off' Comment: curvature properties do not apply to piecewise constant functions. 'constantregion' - defines a region over which the curve is forced be constant, although the exact level is not defined. = [] --> No region of the spline is forced to be a constant function. = vector of length 2 denoting the start and end points of a region of the spline over which it is a constant function. = array of size nx2, each row of which denotes the start and end points of a region of the spline over which it is a constant function. DEFAULT VALUE: [] Comments: A segment which is forced to be constant over only part of a knot interval must necessarily be constant over that entire interval, since the curve is composed of a single polynomial segment in a knot interval. 'decreasing' - controls monotonicity of the function = 'off' --> No part of the spline is constrained to be a decreasing function. = 'on' --> the function will be decreasing over its entire domain. = vector of length 2 --> denotes the start and end points of a region of the curve over which it is monotone decreasing. = array of size nx2 --> each row of which denotes the start and end points of a region of the curve over which it is monotone decreasing. DEFAULT VALUE: 'off' Comments: in actuality this property should be named 'non-increasing', since a constant function is admissible. In addition, it is a sufficient constraint for monotonicity. It is not a necessary constraint. There may exist another spline which has a slightly lower sum of squares and is also monotone. 'degree' - controls the degree of the piecewise Hermite function = 'constant' --> Use a piecewise constant "Hermite" = 'linear' --> Use a piecewise linear Hermite = 'cubic' --> Use a piecewise cubic Hermite As a concession to memory, valid synonyms for 'constant' 'linear', and 'cubic' are respectively the integers 0, 1, & 3 DEFAULT: 'cubic' Comment: Some properties are inappropriate for all degrees of function. E.g., it would be silly to specify a specific value for the left hand end point slope of a piecewise constant function. All information supplied will be used to whatever extent possible. The "order" of a form, as used by the spline toolbox, will be one more than the degree. 'order' is also a valid property in these tools, but it results only in setting the degree field, where degree = order - 1. 'endconditions' - controls the end conditions applied to the spline = 'natural' --> The "natural" spline conditions will be applied. I.e., f''(x) = 0 at end end of the spline. = 'notaknot' --> Not-a-knot end conditions applied. = 'periodic' --> Periodic end conditions applied. = 'estimate' --> end conditions are estimated from the data. DEFAULT VALUE: 'estimate Comment: Except for periodicity, end conditions are not relevant to any degree model below cubic. Periodic end conditions mean that the function values are the same at each end of the curve for linear and cubic fits. For cubic fits, periodicity means that the function has also first and second derivative continuity across the wrapped boundary. For piecewise constant fits, end conditions do not apply, and are ignored. 'envelope' - allows the user to solve for an envelope of the data = 'off' --> the curve will be a simple least squares spline = 'supremum' --> comute a model such that all residuals (yhat - y) are positive. In effect, the curve will be a "supremum" (least upper bound) function. = 'infimum' --> comute a model such that all residuals (yhat - y) are negative. In effect, the curve will be a "infimum" (greatest lower bound) function. DEFAULT VALUE: 'off' Comment: this option should be rarely used, but its a cute idea when it does come up. 'errorbar' - defines a set of lower and upper bounds for the function value of the spline at each data point. If there are n data points, then if the corresponding value is: = [] --> No errorbar constraints will be imposed = a scalar E --> error bars will be set at [Y-E,Y+E] = a vector E --> error bars will be set at [Y-E,Y+E] = an nx2 array --> error bars will be set at [Y-E(:,1),Y+E(:,2)] DEFAULT VALUE: [] Comment: It is possible that depending on the choice of knots, it will be impossible to satisfy some sets of error bars. It may be best to simply use every single data point as a knot. Of course, replicate data points with non-overlappng error bars will always cause a failure. 'increasing' - controls monotonicity of the function = 'off' --> No part of the spline is constrained to be an increasing function. = 'on' --> the function will be increasing over its entire domain. = vector of length 2 --> denotes the start and end points of a region of the curve over which it is monotone increasing. = array of size nx2 --> each row of which denotes the start and end points of a region of the curve over which it is monotone increasing. DEFAULT VALUE: 'off' Comments: in actuality this property should be named 'non-decreasing', since a constant function is admissible. In addition, it is a sufficient constraint for monotonicity. It is not a necessary constraint. There may exist another spline which has a slightly lower sum of squares and is also monotone. 'integral' - known aim value for the integral of the curve over its domain. DEFAULT VALUE: [] 'interiorknots' - allows for free knot placement (of the interior knots) = 'fixed' spaced knots. In this case, the first data point will be the first knot. = 'free' --> uses fmincon to optimize the interior knot placement to minimize the overall rmse. DEFAULT VALUE: 'fixed' Comment: The initial values for the knot placement are taken from the knots property. Comment: Since the free knot placement is done by an optimizer (fmincon), it is not allowed to both choose a set of free knots and set the rmse of the fit. Comment: The first and last knots are not adjusted by the optimization, so there must be at least 3 knots. Comment: Piecewise constant functions sometimes have difficulty with free knots. 'knots' - controls the number of knots used, or the number of equally spaced knots. = A scalar integer which denotes the number of equally spaced knots. In this case, the first data point will be the first knot. = A vector containing the list of knots themselves. The knots must be distinct. No replicate knots allowed. DEFAULT VALUE: 6 'leftmaxslope' - controls the maximum slope allowed at the left hand end of the curve. = [] --> No explicit value provided for the maximum slope of the spline at its left hand end point. = A numeric scalar --> the slope of the function will be constrained to not rise above this value at its left hand end point (i.e., the first knot.) DEFAULT VALUE: [] 'leftmaxvalue' - controls the maximum valued allowed at the left hand end of the curve. = [] --> No explicit value provided for the maximum value of the spline at its left hand end point. = A numeric scalar --> the function will be constrained to not rise above this value at its left hand end point (i.e., the first knot.) DEFAULT VALUE: [] 'leftminslope' - controls the minimum slope allowed at the left hand end of the curve. = [] --> No explicit value provided for the minimum slope of the spline at its left hand end point. = A numeric scalar --> the slope of the function will be constrained to not fall below this value at its left hand end point (i.e., the first knot.) DEFAULT VALUE: [] 'leftminvalue' - controls the minimum valued allowed at the left hand end of the curve. = [] --> No explicit value provided for the minimum value of the spline at its left hand end point. = A numeric scalar --> the function will be constrained to not fall below this value at its left hand end point (i.e., the first knot.) DEFAULT VALUE: [] 'leftslope' - controls the function slope at the left hand endpoint. = [] --> No explicit value provided for the slope of the curve at its left hand end point. = A numeric scalar --> the function will be assigned this slope at its left hand end point (i.e., the first knot.) DEFAULT VALUE: [] 'leftvalue' - controls the function value at its left hand endpoint. = [] --> No explicit value provided for the value of the curve at its left hand end point. = A numeric scalar --> the function will be assigned this value at its left hand end point (i.e., the first knot.) DEFAULT VALUE: [] 'linearregion' - defines a region over which the curve is forced be linear, although the exact level is not defined. = [] --> No region of the spline is forced to be a linear function. = vector of length 2 denoting the start and end points of a region of the spline over which it is a linear function. = array of size nx2, each row of which denotes the start and end points of a region of the spline over which it is a linear function. DEFAULT VALUE: [] Comment 1: A segment which is forced to be linear over only part of a knot interval must necessarily be constant over that entire interval, since the curve is composed of a single polynomial segment in a knot interval. Comment 2: A linear region may extend across knots, in which case the slope will take on the same value across knot boundaries. 'maxslope' - controls the globally maximum slope of the function = [] --> No explicit value provided for the globally maximum slope of the curve. = A numeric scalar --> the globally maximum slope of the spline DEFAULT VALUE: [] Comment: This is a sufficient constraint for the maximum slope of the spline. It is not a necessary constraint. There may exist another spline which has a slightly lower sum of squares and also has the same maximum slope. 'maxvalue' - controls the globally maximum value of the curve = [] --> No explicit value provided for the globally maximum value of the spline. = A numeric scalar --> the function must lie no higher than this maximum value. DEFAULT VALUE: [] Comment 1: This constraint is only a necessary constraint. It is not sufficient. In some circumstances the spline may pass slightly above this maximum value Comment 2: The location of the global minimizer is unspecified. 'minslope' - controls the globally minimum slope of the function = [] --> No explicit value provided for the globally minimum slope of the curve. = A numeric scalar --> the globally minimum slope of the spline DEFAULT VALUE: [] Comment: This is a sufficient constraint for the minimum slope of the spline. It is not a necessary constraint. There may exist another spline which has a slightly lower sum of squares and also has the same minimum slope. 'minvalue' - controls the globally minimum value of the curve = [] --> No explicit value provided for the globally minimum value of the spline. = A numeric scalar --> the function will lie no lower than this minimum value. DEFAULT VALUE: [] Comment 1: This constraint is only a necessary constraint. It is not sufficient. In some circumstances the spline may pass slightly below this minimum value Comment 2: The location of the global minimizer is unspecified. 'negativeinflection' - controls the existence and placement of a inflection point of the final curve. The second derivative of the function will pass through zero at this point. = [] --> no inflection point constraint employed = a numeric scalar --> the function will have a point of inflection at the supplied x location DEFAULT VALUE: [] Comment: NegativeInflection is really just a composite property of a curve. A negative inflection point at x == a is equivalent to a ConcaveUP function for x<=a, and a ConcaveDown function for x>=a. As such, NegativeInflection will override any other curvature properties one has previously set. Comment: Inflection points only apply to linear or cubic models 'order' - An implicit synonym for 'degree', controls the degree of the piecewise Hermite function. Order MUST be an numeric integer, from the set [1, 2, 4]. = 1 --> Use a piecewise constant "Hermite" = 2 --> Use a piecewise linear Hermite = 4 --> Use a piecewise cubic Hermite DEFAULT: 4 Setting the order to some value has the effect of setting the degree of the Hermite spline fit to one less than the order. Comment: Some properties are inappropriate for all degrees of function. E.g., it would be silly to specify a specific value for the left hand end point slope of a piecewise constant function. All information supplied will be used to whatever extent possible. 'plot' - controls whether a final plot is generated of the curve = 'off' --> No plot = 'on' --> plot the curve and data using slmplot DEFAULT: 'off' 'positiveinflection' - controls the existence and placement of a inflection point of the final curve. The second derivative of the function will pass through zero at this point. = [] --> no inflection point constraint employed = a numeric scalar --> the function will have a point of inflection at the supplied x location DEFAULT VALUE: [] Comment: PositiveInflection is really just a composite property of a curve. An inflection point at x == a is equivalent to a ConcaveDown function for x<=a, and a ConcaveUp function for x>=a. As such, PositiveInflection will override any other curvature properties one has previously set. Comment: Inflection points only apply to linear or cubic models 'regularization' - = [] --> Uses the default regularization parameter of 0.0001. = A Non-negative scalar value --> explicitly defines the weight given to smoothness of the resulting curve. = 'crossvalidation' --> Use cross validation method to choose the regularization parameter. = A NEGATIVE scalar value --> attempts to choose a regularization parameter which has as its rmse the absolute value of the supplied value. = 'smoothest' --> Finds the smoothest curve that satisfies the supplied prescription. This is not really a least squares model, since the goal is purely to maximize the smoothness. This option would very often be used in conjunction with errorbars. DEFAULT VALUE: 0.0001 Comment: Smaller values will yield less smoothing, larger values more smoothing. In most cases this parameter should be left alone. It is used to prevent numerical singularities in the linear algebra, as well as help in the case of extrapolation and intrapolation. Smoothness of the resulting spline can be far better controlled by changing the number of knots and their placement. Specifically, the regularization parameter is a scale factor applied to the integral of the squared second derivative of the spline. Comment: It is possible that no value for the regularization parameter will yield the given rmse. In this case the sky will fall down. Comment: Since the cross validation option and matching a given rmse are both optimizations, it is not allowed to use these options together with the interiorknots estimation. Comment: Both the cross validation and rmse options can both be quite slow. Remember that they are optimizations. 'result' - controls the output structure style = 'pp' --> Returns a pp struct, use ppval to evaluate = 'slm' --> Returns a slm struct in a Hermite form. Evaluate using slmeval. DEFAULT VALUE: 'slm' 'rightmaxslope' - controls the maximum slope allowed at the right hand end of the curve. = [] --> No explicit value provided for the maximum slope of the spline at its right hand end point. = A numeric scalar --> the slope of the function will be constrained to not rise above this value at its right hand end point (i.e., the last knot.) DEFAULT VALUE: [] 'rightmaxvalue' - controls the maximum valued allowed at the right hand end of the curve. = [] --> No explicit value provided for the maximum value of the spline at its right hand end point. = A numeric scalar --> the function will be constrained to not rise above this value at its right hand end point (i.e., the first knot.) DEFAULT VALUE: [] 'rightminslope' - controls the minimum slope allowed at the right hand end of the curve. = [] --> No explicit value provided for the minimum slope of the spline at its right hand end point. = A numeric scalar --> the slope of the function will be constrained to not fall below this value at its right hand end point (i.e., the last knot.) DEFAULT VALUE: [] 'rightminvalue' - controls the minimum valued allowed at the right hand end of the curve. = [] --> No explicit value provided for the minimum value of the spline at its right hand end point. = A numeric scalar --> the function will be constrained to not fall below this value at its right hand end point (i.e., the last knot.) DEFAULT VALUE: [] 'rightslope' - controls the function slope at the right hand endpoint. = [] --> No explicit value provided for the slope of the curve at its right hand end point. = A numeric scalar --> the function will be assigned this slope at its right hand end point (i.e., the last knot.) DEFAULT VALUE: [] 'rightvalue' - controls the function value at its right hand endpoint. = [] --> No explicit value provided for the value of the curve at its right hand end point. = A numeric scalar --> the function will be assigned this value at its right hand end point (i.e., the last knot.) DEFAULT VALUE: [] 'scaling' - controls data scaling to avoid numerical problems = 'on' --> data is shifted and scaled so as to minimize any numerical problems that may result in the solution. = 'off' --> No scaling is done. DEFAULT VALUE: 'on' Comments: There is no scaling that will positively eliminate all problems. All scaling is undone in the final spline coefficients. 'simplepeak' - controls the existence and placement of a single maximizer of the final curve = [] --> no peak placement constraint employed = a numeric scalar --> the function will attain its maximum at the supplied x location DEFAULT VALUE: [] Comment: SimplePeak is really just a composite property of a curve. A peak at x == a is equivalent to a monotone increasing function for x<=a, and a monotone decreasing function for x>=a. As such, simplepeak will override any other monotonicity properties one has previously set. 'simplevalley' - controls the existence and placement of a single minimizer of the final curve = [] --> no valley placement constraint employed = a numeric scalar --> the function will attain its minimum at the supplied x location DEFAULT VALUE: [] Comment: SimpleValley is really just a composite property of a curve. A valley at x == a is equivalent to a monotone decreasing function for x<=a, and a monotone increasing function for x>=a. As such, simplevalley will override any other monotonicity properties one has previously set. 'verbosity' - controls commandline feedback to the user = 0 --> No response at the commandline = 1 --> Fit statistics reported = 2 --> Debug level output DEFAULT: 0 'weights' - defines a weight vector = [] --> all data points are assigned equal (unit) weight. = vector of the same length as length(x), denotes relative weights for each data point. If supplied, the length of this vector of weights must be the same as the number of data points. DEFAULT VALUE: [] 'xy' - Forces the curve through an individual point or set of points = [] --> no points are forced = a 1x2 vector --> an x-y pair that the curve must pass through with no error. = an nx2 array --> each row corresponds to a single point that the curve passes through. DEFAULT: [] Comment 1: The curve will pass through the desired point to within computational error, IF it is possible to do so. Comment 2: Multiple points that are inconsistent with each other, or inconsistent with other parameters that are set will cause failure of the least squares. 'xyp' - Forces the curve to have a specified slope at an individual point or set of points = [] --> no points have their slope enforced = a 1x2 vector --> an x-yprime pair that the curve must satisfy through with no error. = an nx2 array --> each row corresponds to an x-yprime pair that the curve must satisfy. DEFAULT: [] Comment 1: The curve will satisfy the desired slope to within computational error, IF it is possible to do so. Comment 2: Multiple points that are inconsistent with each other, or inconsistent with other parameters that are set will cause failure of the least squares. Comment 3: Setting the slope at some point of a zero'th degree function will be ignored. 'xypp' - Forces the curve to have a specified second derivative at an individual point or set of points = [] --> no points have their 2nd derivative enforced = a 1x2 vector --> an x-y'' pair that the curve must satisfy through with no error. = an nx2 array --> each row corresponds to an x-y'' pair that the curve must satisfy. DEFAULT: [] Comment: The curve will satisfy the desired 2nd derivative to within computational error, IF it is possible to do so. Comment: Multiple points that are inconsistent with each other, or inconsistent with other parameters that are set will cause failure of the least squares. Comment: Setting the 2nd derivative at some point of a zero'th degree or linear function will be ignored. Comment: This property only applies to cubic models. ```

## Evaluating a model

One more thing before we return to fitting data. We can use the models that we build in a predictive mode.

```slmeval(.5,slm) ```
```ans = 1.62123073715032 ```

slmeval can also differentiate the function (although piecewise constant models would have a zero derivative everywhere, with a singularity at every knot.) The third argument controls the order ofthe derivative to be generated.

```slmeval(.5,slm,1) ```
```ans = 2.64627908440699 ```

and even compute an inverse at any point.

```slmeval(1.75,slm,-1) ```
```ans = 0.548660499797033 ```

We could plot the function as fit uing slmeval,

```yhat = slmeval(xev,slm,0); plot(x2,y2,'ro',xev,yhat,'b-') ```

## Plotting a model using plotslm

Or we could have plotted it using a nice utility that I've provided. Plotslm plots the curve, the data that it was built from, and the knots/breaks of the spline model. Plotslm also allows you to plot the residuals of the model to the data, or the derivatives of the curve. Finally, you can control many features of this plot using the plot menu.

```plotslm(slm) ```

## slmengine can plot too

Or we could have just told our engine to plot after it finished estimating its curve fit. This time we will fit it as a cubic spline.

```slm = slmengine(x2,y2,'degree',3,'knots',4,'plot','on'); ```

## Computations on a spline model, min max, integral, etc.

plotslm can plot the function derivatives or integral, and slmeval can evaluate the function, its derivatives or the inverse function at specific points. However, it is occasionally useful to be able to find the minimum (or maximum) value of a spline over some interval, or find the minimum or maximum value of the slope, or compute the integral of the curve between a pair of points.

The slmpar function does these things, both for a slm form as well as a pp form.

```x = -5:.1:5; y = exp(-x.^2/2)/sqrt(2*pi) + randn(size(x))/100; slm = slmengine(x,y,'plot','on','knots',15,'integral',1,'deg',3,'minval',0) ```
```slm = form: 'slm' degree: 3 knots: [15x1 double] coef: [15x2 double] prescription: [1x1 struct] x: [101x1 double] y: [101x1 double] ```

Of course, here the integral was forced to 1, so the integral should be correct.

```res = slmpar(slm,'in') ```
```res = 1 ```

The maximum value should occur very near zero.

```[res,loc] = slmpar(slm,'maxfun') % The next few examples show slmpar at work on a pp form, as created by spline: x = linspace(-pi,pi); y = sin(x); pp = spline(x,y); ```
```res = 0.397683127849647 loc = -0.0153556703326923 ```

The overall integral should be zero, or as close as floating point computations can give us for the integral of a spline.

```res = slmpar(pp,'integral') ```
```res = -2.00360561475321e-16 ```

The integral from 0 to pi should be 2.

```res = slmpar(pp,'integral',[0,pi]) ```
```res = 1.99999995586484 ```

The maximum of a sine wave over its period is 1, and should occur at x = pi/2.

```[res,loc] = slmpar(pp,'maxfun') ```
```res = 0.99999997621329 loc = 1.57079432824228 ```

The minimum of a sine wave over [-1,1] should occur at the lower end point.

```[res,loc] = slmpar(pp,'minfun',[-1,1]) ```
```res = -0.841470963664934 loc = -1 ```

A nice thing about plotslm is it overlays the knot positions on top of the curve. The knots are at the vertical green dotted lines.

## Curvature constraints

Having seen the fit above, one might wonder why one would use a shape prescriptive model at all. It appears to be little better than that 6'th order polynomial model we fit before using polyfit.

What do we know about the underlying functional relationship? We want to think in terms of fundamental shape primitives.

I'd suggest that the function is known to be positively curved. Its was a simple exponential after all. The second derivatve should never be negative.

```slm = slmengine(x2,y2,'plot','on','concaveup','on'); ```

## Monotonicity constraints

Having seen the fit above, one might wonder why one would use a shape prescriptive model at all. It appears to be little better than that 6'th order polynomial model we fit before using polyfit.

What do we know about the underlying functional relationship? We want to think in terms of fundamental shape primitives.

I'd suggest that the function is known to be positively curved. Its was a simple exponential after all. The second derivatve should never be negative.

```slm = slmengine(x2,y2,'plot','on','increasing','on'); ```

## Both curvature and monotonicity constraints

Having seen the fit above, one might wonder why one would use a shape prescriptive model at all. It appears to be little better than that 6'th order polynomial model we fit before using polyfit.

What do we know about the underlying functional relationship? We want to think in terms of fundamental shape primitives.

I'd suggest that the function is known to be positively curved. It was a simple exponential after all. The second derivatve should never be negative.

```slm = slmengine(x2,y2,'plot','on','concaveup','on','increasing','on'); ```

## Minimum slope or maximum slope constraints

Try an erf function, with just a bit of noise. But then, show that the curve can be constrained to have an overall minimum and maximum slope.

```n = 100; x = 4*rand(n,1) - 2; y = erf(x) + randn(size(x))/100; ```

First, with no slope constraints at all

```slm = slmengine(x,y,'plot','on','knots',15); ```

Plot the first derivative. See that it should strictly be bounded by in the interval [0,1].

```xev = linspace(-2,2,1000); d = slmeval(xev,slm,1); plot(xev,d,'-') grid on ```

Redo the fit, but with just a touch of a constraint on the slopes.

```slm = slmengine(x,y,'plot','on','knots',15,'minslope',0.1,'maxslope',0.9); d = slmeval(xev,slm,1); figure plot(xev,d,'-') grid on ```

## Extrapolation in the fit, non-uniform knot spacings

We might also decide to fit the curve over a wider knot range, extrapolating here beyond the data.

```slm = slmengine(x2,y2,'plot','on','concaveup','on','increasing','on','knots',[-1,0:.2:1,2]); ```

## Piecewise constant models

Or even to use a lower order model. This version of my toolkit only allows piecewise constant or piecewise linear besides the cubic models.

Also, for brevity, we can always shorten any property names, as long as the name remains unambiguous. Capitalization is ignored.

```slm = slmengine(x1,y1,'plot','on','kn',0:.1:1,'deg',0); ```

Note that I also used more knots in this last fit. I also returned to our original set of data with 100 points.

## Fixed specific values

There are other shape parameters we could have set in our prescription. Kowing that the underlying relationship is an exponential, we might remember that at x == 0, that y == 1. Specific values like this are often known. If our data was from an MTF curve, we might choose to assume a modulation of 100% at a frequency of 0. Likewise, population growth data might have a known population at time == 0.

```slm = slmengine(x1,y1,'plot','on','concaveup','on','knots',0:.2:1,'leftvalue',1); ```

## Flawed information, "Garbage in, garbage out"

What if our information is flawed? The old saying always applies.

```slm = slmengine(x1,y1,'plot','on','concaveup','on','knots',0:.2:1,'leftvalue',2); ```

Or suppose we specify a bit too small a value for the maximum slope. The result is lack of fit.

```slm = slmengine(x1,y1,'plot','on','maxslope',1.5,'knots',0:.1:1); ```

Garbage in, garbage out. If we specify a monotone decreasing function on increasing data, then we have wasted a lot of machine cycles to compute the mean of our data.

```slm = slmengine(x1,y1,'plot','on','decreasing','on','knots',0:.2:1,'degree',1); ```

We could have done the same with somewhat less work...

```slm = slmengine(x1,y1,'plot','on','degree',0,'knots',2); ```

The really lazy would just use mean(y1). Perhaps efficient is a better word.

lets look at a function with slightly more interesting behaviour. still keep it simple enough that we know what it should look like.

```close all n = 500; x = linspace(0,1,n)'; y = sin(x*pi) + randn(n,1)/5; slm = slmengine(x,y,'plot','on','knots',10); ```

If I remember correctly, sin(x) will have a negative second derivative over that interval. If I choose to apply this information to our curve, the fit gets quite good.

```slm = slmengine(x,y,'plot','on','knots',10,'concavedown','on'); ```

A simple way of specifying a curve shape with a single maximum value at some point is just this:

```slm = slmengine(x,y,'plot','on','knots',0:.1:1,'simplepeak',.5); ```

all it does is to automatically specify increasing and decreasing regions of the curve.

lets change our data again now. Make it a fairly noisy sine wave.

```clear close all n = 250; x = linspace(0,1,n)'; y = sin(2*x*pi) + randn(n,1)/2; ```

this time we might choose to assume periodicity.

```slm = slmengine(x,y,'plot','on','knots',0:.1:1,'endconditions','periodic'); ```

That last curve had a lot of noise in it, I might not be surpised if it was a little bumpy. We also know that somewhere in the middle of the curve, this function goes through an inflection point. I'll just pick somehere to put that inflection point. Note that the curvature would be negative to the left, and positive to the right, so set a positive inflection point.

```slm = slmengine(x,y,'plot','on','kn',0:.1:1,'endc','per','positiveinflection',.5); ```

There are certainly many ways to define a prescription for the shape of a curve. We have only covered a few of them here.

Lets return to slmset though. Note that the two lines below are the same as the direct call to slmengine.

```prescription = slmset('plot','on','kn',0:.1:1,'endc','per','positiveinflection',.5); slm = slmengine(x,y,prescription); ```

The model as it is returned includes a field with the shape prescription that defined that model. It also includes the original data in named fields. Each of these fields is useful for documentation purposes.

```slm ```
```slm = form: 'slm' degree: 3 knots: [11x1 double] coef: [11x2 double] prescription: [1x1 struct] x: [250x1 double] y: [250x1 double] ```

We can also modify an existing prescription.

```prescription = slmset(prescription,'maxvalue',1,'minvalue',-1); slm = slmengine(x,y,prescription); ```

## Too many knots?

Sometimes we have little data, but use too many knots. A traditional spline model will often fail in this case, with a singular matrix warning. Slmengine will not fail, although you may not always be happy with the result. Remember, when there is little information brought to a problem by the data, a reasonable fit may still be obtained if you can add enough information from your own kowledge of the process. In this particular example, I'm tryin only to show that the use of too many kots did not result in a singularity.

```clear close all n = 10; x = rand(n,1); y = sin(2*x*pi) + randn(n,1)/10; slm = slmengine(x,y,'plot','on','knots',0:.05:1); ```

Fewer knots may be better, even though we did survive having way too many knots.

```slm = slmengine(x,y,'plot','on','knots',[0 1]); ```

And while it may be a little slow, one can use cross validation to help smooth your curve. Don't try it with too many knots or too many points.

```slm = slmengine(x,y,'plot','on','knots',0:.05:1,'regularization','cross'); ```

Another option, if you had a decent estimate of the error standard deviation, is to supply that value. slmengine will build a fit with the desired residual error.

```slm = slmengine(x,y,'plot','on','knots',0:.05:1,'regularization',-0.10); ```

## Use in practice

for some more realistic data, try this one out. The function is known to be increasing, and everywhere non-negative. There is a little curvature at the bottom that is probably real. At the top end, we expect the curve to be moderately well behaved so extrapolation to x = 255 is linear.

```load scann slm = slmengine(R,L,'plot','on','knots', ... [0 5 10 15 20 30 50 80 120 160 200 255],'leftminvalue',0,'increasing','on','deg',3,'linearregion',[200,255]) ```
```slm = form: 'slm' degree: 3 knots: [12x1 double] coef: [12x2 double] prescription: [1x1 struct] x: [42x1 double] y: [42x1 double] ```

I often see people using splines wrongly. For example, given a relationship with a derivative singularity, how best to build a spline model?

```x = linspace(0,1,20); truefun = @(x) nthroot(x,5); y = truefun(x); xfine = linspace(0,1,1000); yfine = truefun(xfine); plot(x,y,'-o') title 'A singularity at x = 0' xlabel X ylabel Y ```

Obviously, fitting a spline to this data, in the form y(x), we should expect a problem. Even with no noise present in the system.

```slm = slmengine(x,y,'plot','on','knots',10) hold on plot(xfine,yfine,'b--') hold off ```
```slm = form: 'slm' degree: 3 knots: [10x1 double] coef: [10x2 double] prescription: [1x1 struct] x: [20x1 double] y: [20x1 double] ```

We might force the spline to be monotone, or even choose a non-uniform knot spacing, but the derivative singularity at x == 0 is too much for polynomial segments to handle well.

```slm = slmengine(x,y,'plot','on','knots',[0 .01 .02 .05 .1 .3 .6 1],'incr','on','concavedown','on') hold on plot(xfine,yfine,'b--') hold off ```
```slm = form: 'slm' degree: 3 knots: [8x1 double] coef: [8x2 double] prescription: [1x1 struct] x: [20x1 double] y: [20x1 double] ```

A better solution is to recognize the essential singularity in the problem. Here, swap the dependent and independent variables. What was once a singularity is no longer so. Se that the spline fit and the original function now overlay each other almost perfectly here.

```slm = slmengine(y,x,'plot','on','knots',10,'incr','on','concaveup','on','deg',3) hold on plot(yfine,xfine,'b--') hold off ```
```slm = form: 'slm' degree: 3 knots: [10x1 double] coef: [10x2 double] prescription: [1x1 struct] x: [20x1 double] y: [20x1 double] ```

Of course, prediction will be more difficult, since the resulting model must be interpolated in an inverse form. slmevel can do so.

`slmeval(0.16807,slm,-1)`