26 views (last 30 days)

Daniel Shub
on 7 Mar 2012

A more general answer might be:

TMW generally does not announce the release date prior to the release. According to wikipedia ( http://en.wikipedia.org/wiki/MATLAB) the release schedule has been:

R2006a 3/1 R2006b 9/1

R2007a 3/1 R2007b 9/1

R2008a 3/1 R2008b 10/9

R2009a 3/6 R2009b 9/4

R2010a 3/5 R2010b 9/3

R2011a 4/8 R2011b 9/1

R2012a 3/1

So I guess from this history you can expect it the first couple of days of March/September, but if it doesn't come by that first week it could be a while.

Royi Avital
on 7 Mar 2012

One of the most disappointing releases. Nothing really got improved :-).

Thomas
on 7 Mar 2012

Richard Willey
on 7 Mar 2012

Statistics Toolbox includes a lot of impressive new functionality for regression analysis.

I'm attach code for a blog post that will come out in a couple weeks.

%%New regression capabilities

%

% The 12a release of Statistics Toolbox includes new functions for

%

% * Linear regression

% * Nonlinear regression

% * Logistic regression (and other types of generalized linear models)

%

% These regression techniques aren’t new to Statistics Toolbox. What is

% new is that MathWorks addded a wide set of support functions that

% simplify common analysis tasks like plotting, outlier detection,

% generating predictions, performing stepwise regression, applying robust

% regression...

%

% We'll start with a simple example using linear regression.

%%Create a dataset

%

% I'm going to generate a basic dataset in which the relationship between X

% and Y is modeled by a straight line (Y = mX + B) and add in some normally

% distributed noise. Next, I'll generate a scatter plot showing the

% relationship between X and Y

clear all

clc

rng(1998);

X = linspace(1,100,50);

X = X';

Y = 7*X + 50 + 30*randn(50,1);

New_X = 100 * rand(10,1);

scatter(X,Y, '.')

%%Use linear regression to model Y as a function of X

%

% Looking at the data, we can see a clear linear relationship between X

% and Y. I'm going to use the new LinearModel function to model

% Y as a function of X. The output from this function will be stored as an

% object named "myFit" which is displayed on the screen.

myFit = LinearModel.fit(X,Y)

The first line shows the linear regression model. When you perform a regression you need to specify a model that describes the relationship between our variables. By default, LinearModel assumes that you want to model the relationship as a straight line with an intercept term. The expression "y ~ 1 + x1" describes this model. Formally, this expression translates as "Y is modeled as a linear function which includes an intercept and a variable". Once again note that we are representing a model of the form Y = mX + B...

The next block of text includes estimates for the coefficients, along with basic information regarding the reliability of those estimates.

Finally, we have basic information about the goodness-of-fit including the R-square, the adjusted R-square and the Root Mean Squared Error.

%%Use myFit for analysis

%

% Earlier, I mentioned that the new regression functions include a wide

% variety of support functions that automate different analysis tasks.

% Let's look at some them.

%

% First, let's generate a plot that we can use to evaluate the quality of

% the resulting fit. We'll do so by applying the standard MATLAB plot

% command to "myFit".

plot(myFit)

Notice that this simple command creates a plot with a wealth of information including

- A scatter plot of the original dataset
- A line showing our fit
- Confidence intervals for the fit

MATLAB has also automatically labelled our axes and added a legend.

%%Look for patterns in the residuals

%

% Alternatively, let's assume that we wanted to see whether there was any

% pattern to the residuals. (A noticeable pattern to the residuals might

% suggest that our model is too simple and that it failed to capture a real

% work trend in the data set. This technique can also be used to check and

% see whether the noise component is constant across the dataset).

%

% Here, I'll pass "myFit" to the new "plotResiduals" method and tell

% plotResiduals to plot the residuals versus the fitted values.

figure

plotResiduals(myFit, 'fitted')

My plot looks like random noise - which in this case is a very good thing.

%%Look for autocorrelation in the residuals

%

% Autocorrelation in my data set could also throw off the quality

% of my fit. The following command will modify the residual plot by

% plotting residuals versus lagged residuals.

figure

plotResiduals(myFit, 'lagged')

Here, once again, the lack of any noticable pattern in the residuals suggests a good fit. If the residuals suggested a line or a cigar shaped pattern this would suggest autocorrelation.

%%Look for outliers

%

% Suppose that I wanted to check for outliers... We also have a plot for

% that.

figure

plotDiagnostics(myFit, 'cookd')

Cook's Distance is a metric that is commonly used to see whether a dataset contains any outliers. For any given data point, Cook's Distance is calculated by performing a brand new regression that excludes that data point. Cook's distance measures how much the shape of the curve changes between the two fits. If the curve moves by a large amount, that data point has a great deal of influence on the model and might very well be an outlier.

- The red crosses show the Cook's Distance for each point in the data set.
- The horizontal line shows "Three times the average Cook's Distance for all the points in the data set". Data points whose Cook's Distance is greater than three times the mean are often considered possible outliers.

In this example, none of our data points look as if they are outliers.

%%Use the resulting model for prediction

%

% Last, but not least, let's assume that we wanted to use our model for

% prediction. This is as easy as applying the "predict" method.

Predictions = predict(myFit, New_X)

%%Discover the full set of methods available with regression objects

%

% I hope that you agree that all these built in plots and analysis routines

% represent a significant improvement in usability. However, if you're

% anything like me, your immediate reaction is going to be "Great, you've

% built a lot of nice stuff, however, how do you expect me to find out

% about this?"

%

% What I'd like to do now is show you a couple of simple tricks that you

% can use to discover all the new cabilities that we've added. The first

% trick is to recognize that "myFit" is an object and that objects have

% methods associated with them. All of the commands that we've used so far

% like "plot", "plotResiduals", and "predict" are methods for the

% LinearModel object.

%

% Any time that I'm working with one of the built in objects that ship with

% MATLAB my first plot is to inspect the full set of methods that ship with

% that object. This is as easy as typing methods(myFit) at the command

% line. I can use this to immediately discover all the built in

% capabilities that ship with the object. If one of those options catches

% my eye, I can use the help system to get more information.

methods(myFit)

%%Discover the full set of information included in the regression object

%

% Here's another really useful trick to learn about the new regression

% objects. You can use the MATLAB variable editor to walk through the

% object and see all the information that is available.

%

% You should have an object named "myFit" in the MATLAB workspace. Double

% clicking on the object will open the object in the Variable Editor.

%%Formulas

%

% At the start of this blog there was some brief introduction to

% "formulas". I'd like to conclude this talk by providing a bit more

% information about formulas. Regression analysis requires the ability to

% specify a model that describes the relationship between your predictors

% and your response variables.

%

% Let's change our initial example such that we're working with a high

% order polynomial rather than a straight line. I'm also going to change

% this from a curve fitting problem to a surface fitting problem.

X1 = 100 * randn(100,1);

X2 = 100 * rand(100,1);

X = [X1, X2];

Y = 3*X1.^2 + 5*X1.*X2 + 7* X2.^2 + 9*X1 + 11*X2 + 30 + 100*randn(100,1);

myFit2 = LinearModel.fit(X,Y)

Let's take a look at the output from this example. We can see, almost immediately, that something has gone wrong with our fit.

- The R^2 value is pretty bad
- The regression coefficients are nowhere near the ones we specified when we created the dataset

If we look at the line that describes the linear regression model we can see what went wrong. By default, LinearModel is fitting a plane to the dataset. (In our intial example, we had a single preditor, so LinearModel defaulted to a line. here we have two predictors, so LinearModel is defaulting to a plane). However, we "know" that the the true relationship between X and Y should be modelled with a high order polynomial. We need to pass this additional piece of information to "LinearModel".

Modeling a high order polynomial (Option 1)

Here are a couple different ways that I can use LinearModel to model a high order polynomial. The first option is to write out the formula by hand.

myFit2 = LinearModel.fit(X,Y, 'y ~ 1 + x1^2 + x2^2 + x1:x2 + x1 + x2')

%%Modeling a high order polynomial (Option 2)

%

% Alternatively, I can simple use the the string "poly22" to indicate a

% second order polynomial for both X1 and X2 an automatically generate all

% the appropriate terms and cross terms.

myFit2 = LinearModel.fit(X, Y, 'poly22')

%%Nonlinear regression: Generate our data

%

% Let's consider a nonlinear regression example. This time around,

% we'll work with a sine curve. The equation for a sine curve is governed by

% four key parameters.

%

% # The phase

% # The amplitude

% # The vertical shift

% # The phase shift

%

% We'll start by generating a dataset

X = linspace(0, 6*pi, 90);

X = X';

Y = 10 + 3*(sin(1*X + 5)) + .2*randn(90,1);

%%Nonlinear regression: Generate a fit

%

% Next we'll using the NonLinearModel function to perform a nonlinear

% regression. Here we need to

%

% # Specify formula that describes the relationship between X and Y

% # Provide some reasonable starting conditions for the optimization

% solvers

myFit3 = NonLinearModel.fit(X,Y, 'y ~ b0 + b1*sin(b2*x + b3)', [11, 2.5, 1.1, 5.5])

%%Nonlinear regression: Work with the resulting model

%

% Once again, the output from the regression analysis is an object which we

% can use for analysis. For example:

%

figure

scatter(X,Y)

hold on

plot(X, myFit3.Fitted, 'r')

%%Nonlinear regression: Alternative ways to specify the regression model

%

% One last point to be aware of: The syntax that I have been using to

% specify regression models is based on "Wilkinson's notation". This is a

% standard syntax that is commonly used in Statistics. If you prefer, you

% also have the option to specify your model using anonymous functions.

%

% For example, that command could have been written as

myFit4 = NonLinearModel.fit(X,Y, @(b,x)(b(1) + b(2)*sin(b(3)*x + b(4))), [11, 2.5, 1.1, 5.5])

%%Conclusion

%

% I've been working with Statistics Toolbox for close to five years now.

% Other than the introduction of dataset arrays a few years back, nothing

% has gotten me nearly as excited as the release of these new regression

% capabilities. So, it seemed fitting to conclude with an example that

% combines dataset arrays and the new regression objects.

%

% Tell me what you think is going on in the following example. (Extra

% credit if you can work in the expression "dummy variable")

%

load carsmall

ds = dataset(MPG,Weight);

ds.Year = ordinal(Model_Year);

mdl = LinearModel.fit(ds,'MPG ~ Year + Weight^2')

Opportunities for recent engineering grads.

Apply TodayFind the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
## 0 Comments

Sign in to comment.