Skip to Main Content Skip to Search
Product Documentation

Modeling Data

Overview

Parametric models translate an understanding of data relationships into analytic tools with predictive power. Polynomial and sinusoidal models are simple choices for the up and down trends in the traffic data.

Polynomial Regression

Use the polyfit function to estimate coefficients of polynomial models, then use the polyval function to evaluate the model at arbitrary values of the predictor.

The following code fits the traffic data at the third intersection with a polynomial model of degree six:

load count.dat
c3 = count(:,3); % Data at intersection 3 
tdata = (1:24)'; 
p_coeffs = polyfit(tdata,c3,6);

figure 
plot(c3,'o-') 
hold on 
tfit = (1:0.01:24)'; 
yfit = polyval(p_coeffs,tfit); 
plot(tfit,yfit,'r-','LineWidth',2)
legend('Data','Polynomial Fit','Location','NW')

The model has the advantage of being simple while following the up-and-down trend. The accuracy of its predictive power, however, is questionable, especially at the ends of the data.

General Linear Regression

Assuming that the data are periodic with a 12-hour period and a peak around hour 7, it is reasonable to fit a sinusoidal model of the form:

The coefficients a and b appear linearly. Use the MATLAB mldivide (backslash) operator to fit general linear models:

load count.dat
c3 = count(:,3); % Data at intersection 3 
tdata = (1:24)'; 
X = [ones(size(tdata)) cos((2*pi/12)*(tdata-7))];
s_coeffs = X\c3;

figure
plot(c3,'o-')
hold on
tfit = (1:0.01:24)';
yfit = [ones(size(tfit)) cos((2*pi/12)*(tfit-7))]*s_coeffs; 
plot(tfit,yfit,'r-','LineWidth',2)
legend('Data','Sinusoidal Fit','Location','NW')

Use the lscov function to compute statistics on the fit, such as estimated standard errors of the coefficients and the mean squared error:

[s_coeffs,stdx,mse] = lscov(X,c3)
s_coeffs =
   65.5833
   73.2819
stdx =
    8.9185
   12.6127
mse =
  1.9090e+003

Check the assumption of a 12-hour period in the data with a periodogram, computed using the fft function:

Fs = 1; % Sample frequency (per hour)
n = length(c3); % Window length
Y = fft(c3); % DFT of data
f = (0:n-1)*(Fs/n); % Frequency range
P = Y.*conj(Y)/n; % Power of the DFT

figure
plot(f,P)
xlabel('Frequency')
ylabel('Power')

predicted_f = 1/12
predicted_f =
    0.0833

The peak near 0.0833 supports the assumption, although it occurs at a slightly higher frequency. The model can be adjusted accordingly.

See Regression Analysis and Fourier Transforms in the MATLAB Data Analysis and Mathematics documentation for more information on data modeling.

  


Recommended Products

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.

 © 1984-2012- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS