Products & Services Solutions Academia Support User Community Company

Learn more about MATLAB   

Summarizing Data

Overview

Many MATLAB functions enable you to summarize the overall location, scale, and shape of a data sample.

One of the advantages of working in MATLAB is that functions operate on entire arrays of data, not just on single scalar values. The functions are said to be vectorized. Vectorization allows for both efficient problem formulation, using array-based data, and efficient computation, using vectorized statistical functions.

Measures of Location

Summarize the location of a data sample by finding a "typical" value. Common measures of location or "central tendency" are computed by the functions mean, median, and mode:

load count.dat
x1 = mean(count)
x1 =
   32.0000   46.5417   65.5833

x2 = median(count)
x2 =
   23.5000   36.0000   39.0000

x3 = mode(count)
x3 =
    11     9     9

Like all of its statistical functions, the MATLAB functions above summarize data across observations (rows) while preserving variables (columns). The functions compute the location of the data at each of the three intersections in a single call.

Measures of Scale

There are many ways to measure the scale or "dispersion" of a data sample. The MATLAB functions max, min, std, and var compute some common measures:

dx1 = max(count)-min(count)
dx1 =
   107   136   250

dx2 = std(count)
dx2 =
   25.3703   41.4057   68.0281

dx3 = var(count)
dx3 =
  1.0e+003 *
    0.6437    1.7144    4.6278

Like all of its statistical functions, the MATLAB functions above summarize data across observations (rows) while preserving variables (columns). The functions compute the scale of the data at each of the three intersections in a single call.

Shape of a Distribution

The shape of a distribution is harder to summarize than its location or scale. The MATLAB hist function plots a histogram that provides a visual summary:

figure
hist(count)
legend('Intersection 1',...
       'Intersection 2',...
       'Intersection 3')

Parametric models give analytic summaries of distribution shapes. Exponential distributions, with parameter mu given by the data mean, are a good choice for the traffic data:

c1 = count(:,1); % Data at intersection 1
[bin_counts,bin_locations] = hist(c1);
bin_width = bin_locations(2) - bin_locations(1);
hist_area = (bin_width)*(sum(bin_counts));

figure
hist(c1)
hold on

mu1 = mean(c1);
exp_pdf = @(t)(1/mu1)*exp(-t/mu1); % Integrates
                                   % to 1
t = 0:150;
y = exp_pdf(t);
plot(t,(hist_area)*y,'r','LineWidth',2)
legend('Distribution','Exponential Fit')

Methods for fitting general parametric models to data distributions are beyond the scope of this Getting Started guide. Statistics Toolbox software provides functions for computing maximum likelihood estimates of distribution parameters.

See Descriptive Statistics in the MATLAB Data Analysis documentation for more information on summarizing data samples.

  


Recommended Products

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.

 © 1984-2009- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS