image thumbnail

Demo files for MathWorks Statistics Webinar (Oct 14, 2008)

by

 

22 Oct 2008 (Updated )

Demo and presentation files for the Oct 14, 2008 Webinar on statistics.

Solving Data Management and Analysis Challenges using MATLAB® and Statistics Toolbox™

Solving Data Management and Analysis Challenges using MATLAB® and Statistics Toolbox™

Demo file for the Data Management and Statistics Webinar. This demo requires the Statistics Toolbox and was created using MATLAB 7.7 (R2008b).

In this demo, we will see how we can take a set of data describing performance and characteristics of various cars, and organize, extract, and visualize useful information for further analysis.

Contents

Automobile Data

Now let's begin. We'll work with this MAT file which contains some automobile data.

clear;
load carbig
whos
  Name                Size            Bytes  Class     Attributes

  Acceleration      406x1              3248  double              
  Cylinders         406x1              3248  double              
  Displacement      406x1              3248  double              
  Horsepower        406x1              3248  double              
  MPG               406x1              3248  double              
  Model             406x36            29232  char                
  Model_Year        406x1              3248  double              
  Origin            406x7              5684  char                
  Weight            406x1              3248  double              
  cyl4              406x5              4060  char                
  org               406x7              5684  char                
  when              406x5              4060  char                

This data set contains information regarding 406 different cars. There are different variables for each piece of information, and each row corresponds to the same car.

Dataset Object

Dataset objects allow you to organize information in a tabular format, and have structures very much like that of matrices. Each row represents the observations, or "cars" in this case, and the columns represent the variables, with the appropriate header names.

clc;
cars = dataset(Acceleration, Cylinders, Displacement, Horsepower, ...
  MPG, Model, Model_Year, Origin, Weight)
cars = 
    Acceleration    Cylinders    Displacement    Horsepower
      12            8             307            130       
    11.5            8             350            165       
      11            8             318            150       
      12            8             304            150       
    10.5            8             302            140       
      10            8             429            198       
       9            8             454            220       
     8.5            8             440            215       
      10            8             455            225       
     8.5            8             390            190       
    17.5            4             133            115       
    11.5            8             350            165       
      11            8             351            153       
    10.5            8             383            175       
      11            8             360            175       
      10            8             383            170       
       8            8             340            160       
       8            8             302            140       
     9.5            8             400            150       
      10            8             455            225       
      15            4             113             95       
    15.5            6             198             95       
    15.5            6             199             97       
      16            6             200             85       
    14.5            4              97             88       
    20.5            4              97             46       
    17.5            4             110             87       
    14.5            4             107             90       
    17.5            4             104             95       
    12.5            4             121            113       
      15            6             199             90       
      14            8             360            215       
      15            8             307            200       
    13.5            8             318            210       
    18.5            8             304            193       
    14.5            4              97             88       
    15.5            4             140             90       
      14            4             113             95       
      19            4              98            NaN       
      20            4              97             48       
      13            6             232            100       
    15.5            6             225            105       
    15.5            6             250            100       
    15.5            6             250             88       
    15.5            6             232            100       
      12            8             350            165       
    11.5            8             400            175       
    13.5            8             351            153       
      13            8             318            150       
    11.5            8             383            180       
      12            8             400            170       
      12            8             400            175       
    13.5            6             258            110       
      19            4             140             72       
      15            6             250            100       
    14.5            6             250             88       
      14            4             122             86       
      14            4             116             90       
    19.5            4              79             70       
    14.5            4              88             76       
      19            4              71             65       
      18            4              72             69       
      19            4              97             60       
    20.5            4              91             70       
    15.5            4             113             95       
      17            4            97.5             80       
    23.5            4              97             54       
    19.5            4             140             90       
    16.5            4             122             86       
      12            8             350            165       
      12            8             400            175       
    13.5            8             318            150       
      13            8             351            153       
    11.5            8             304            150       
      11            8             429            208       
    13.5            8             350            155       
    13.5            8             350            160       
    12.5            8             400            190       
    13.5            3              70             97       
    12.5            8             304            150       
      14            8             307            130       
      16            8             302            140       
      14            8             318            150       
    14.5            4             121            112       
      18            4             121             76       
    19.5            4             120             87       
      18            4              96             69       
      16            4             122             86       
      17            4              97             92       
    14.5            4             120             97       
      15            4              98             80       
    16.5            4              97             88       
      13            8             350            175       
    11.5            8             304            150       
      13            8             350            145       
    14.5            8             302            137       
    12.5            8             318            150       
    11.5            8             429            198       
      12            8             400            150       
      13            8             351            158       
    14.5            8             318            150       
      11            8             440            215       
      11            8             455            225       
      11            8             360            175       
    16.5            6             225            105       
      18            6             250            100       
      16            6             232            100       
    16.5            6             250             88       
      16            6             198             95       
      21            4              97             46       
      14            8             400            150       
    12.5            8             400            167       
      13            8             360            170       
    12.5            8             350            180       
      15            6             232            100       
      19            4              97             88       
    19.5            4             140             72       
    16.5            4             108             94       
    13.5            3              70             90       
    18.5            4             122             85       
      14            6             155            107       
    15.5            4              98             90       
      13            8             350            145       
     9.5            8             400            230       
    19.5            4              68             49       
    15.5            4             116             75       
      14            4             114             91       
    15.5            4             121            112       
      11            8             318            150       
      14            4             121            110       
    13.5            6             156            122       
      11            8             350            180       
    16.5            6             198             95       
      17            6             200            NaN       
      16            6             232            100       
      17            6             250            100       
      19            4              79             67       
    16.5            4             122             80       
      21            4              71             65       
      17            4             140             75       
      17            6             250            100       
      18            6             258            110       
    16.5            6             225            105       
      14            8             302            140       
    14.5            8             350            150       
    13.5            8             318            150       
      16            8             302            140       
    15.5            8             304            150       
    16.5            4              98             83       
    15.5            4              79             67       
    14.5            4              97             78       
    16.5            4              76             52       
      19            4              83             61       
    14.5            4              90             75       
    15.5            4              90             75       
      14            4             116             75       
      15            4             120             97       
    15.5            4             108             93       
      16            4              79             67       
      16            6             225             95       
      16            6             250            105       
      21            6             250             72       
    19.5            6             250             72       
    11.5            8             400            170       
      14            8             350            145       
    14.5            8             318            150       
    13.5            8             351            148       
      21            6             231            110       
    18.5            6             250            105       
      19            6             258            110       
      19            6             225             95       
      15            6             231            110       
    13.5            8             262            110       
      12            8             302            129       
      16            4              97             75       
      17            4             140             83       
      16            6             232            100       
    18.5            4             140             78       
    13.5            4             134             96       
    16.5            4              90             71       
      17            4             119             97       
    14.5            6             171             97       
      14            4              90             70       
      17            6             232             90       
      15            4             115             95       
      17            4             120             88       
    14.5            4             121             98       
    13.5            4             121            115       
    17.5            4              91             53       
    15.5            4             107             86       
    16.9            4             116             81       
    14.9            4             140             92       
    17.7            4              98             79       
    15.3            4             101             83       
      13            8             305            140       
      13            8             318            150       
    13.9            8             304            120       
    12.8            8             351            152       
    15.4            6             225            100       
    14.5            6             250            105       
    17.6            6             200             81       
    17.6            6             232             90       
    22.2            4              85             52       
    22.1            4              98             60       
    14.2            4              90             70       
    17.4            4              91             53       
    17.7            6             225            100       
      21            6             250             78       
    16.2            6             250            110       
    17.8            6             258             95       
    12.2            4              97             71       
      17            4              85             70       
    16.4            4              97             75       
    13.6            4             140             72       
    15.7            4             130            102       
    13.2            8             318            150       
    21.9            4             120             88       
    15.5            6             156            108       
    16.7            6             168            120       
    12.1            8             350            180       
      12            8             350            145       
      15            8             302            130       
      14            8             318            150       
    18.5            4              98             68       
    14.8            4             111             80       
    18.6            4              79             58       
    15.5            4             122             96       
    16.8            4              85             70       
    12.5            8             305            145       
      19            8             260            110       
    13.7            8             318            145       
    14.9            8             302            130       
    16.4            6             250            110       
    16.9            6             231            105       
    17.7            6             225            100       
      19            6             250             98       
    11.1            8             400            180       
    11.4            8             350            170       
    12.2            8             400            190       
    14.5            8             351            149       
    14.5            4              97             78       
      16            4             151             88       
    18.2            4              97             75       
    15.8            4             140             89       
      17            4              98             63       
    15.9            4              98             83       
    16.4            4              97             67       
    14.1            4              97             78       
    14.5            6             146             97       
    12.8            4             121            110       
    13.5            3              80            110       
    21.5            4              90             48       
    14.4            4              98             66       
    19.4            4              78             52       
    18.6            4              85             70       
    16.4            4              91             60       
    15.5            8             260            110       
    13.2            8             318            140       
    12.8            8             302            139       
    19.2            6             231            105       
    18.2            6             200             95       
    15.8            6             200             85       
    15.4            4             140             88       
    17.2            6             225            100       
    17.2            6             232             90       
    15.8            6             231            105       
    16.7            6             200             85       
    18.7            6             225            110       
    15.1            6             258            120       
    13.2            8             305            145       
    13.4            6             231            165       
    11.2            8             302            139       
    13.7            8             318            140       
    16.5            4              98             68       
    14.2            4             134             95       
    14.7            4             119             97       
    14.5            4             105             75       
    14.8            4             134             95       
    16.7            4             156            105       
    17.6            4             151             85       
    14.9            4             119             97       
    15.9            5             131            103       
    13.6            6             163            125       
    15.7            4             121            115       
    15.8            6             163            133       
    14.9            4              89             71       
    16.6            4              98             68       
    15.4            6             231            115       
    18.2            6             200             85       
    17.3            4             140             88       
    18.2            6             232             90       
    16.6            6             225            110       
    15.4            8             305            130       
    13.4            8             302            129       
    13.2            8             351            138       
    15.2            8             318            135       
    14.9            8             350            155       
    14.3            8             351            142       
      15            8             267            125       
      13            8             360            150       
      14            4              89             71       
    15.2            4              86             65       
    14.4            4              98             80       
      15            4             121             80       
    20.1            5             183             77       
    17.4            8             350            125       
    24.8            4             141             71       
    22.2            8             260             90       
    13.2            4             105             70       
    14.9            4             105             70       
    19.2            4              85             65       
    14.7            4              91             69       
      16            4             151             90       
    11.3            6             173            115       
    12.9            6             173            115       
    13.2            4             151             90       
    14.7            4              98             76       
    18.8            4              89             60       
    15.5            4              98             70       
    16.4            4              86             65       
    16.5            4             151             90       
    18.1            4             140             88       
    20.1            4             151             90       
    18.7            6             225             90       
    15.8            4              97             78       
    15.5            4             134             90       
    17.5            4             120             75       
      15            4             119             92       
    15.2            4             108             75       
    17.9            4              86             65       
    14.4            4             156            105       
    19.2            4              85             65       
    21.7            4              90             48       
    23.7            4              90             48       
    19.9            5             121             67       
    21.8            4             146             67       
    13.8            4              91             67       
    17.3            4              85            NaN       
      18            4              97             67       
    15.3            4              89             62       
    11.4            6             168            132       
    12.5            3              70            100       
    15.1            4             122             88       
    14.3            4             140            NaN       
      17            4             107             72       
    15.7            4             135             84       
    16.4            4             151             84       
    14.4            4             156             92       
    12.6            6             173            110       
    12.9            4             135             84       
    16.9            4              79             58       
    16.4            4              86             64       
    16.1            4              81             60       
    17.8            4              97             67       
    19.4            4              85             65       
    17.3            4              89             62       
      16            4              91             68       
    14.9            4             105             63       
    16.2            4              98             65       
    20.7            4              98             65       
    14.2            4             105             74       
    15.8            4             100            NaN       
    14.4            4             107             75       
    16.8            4             108             75       
    14.8            4             119            100       
    18.3            4             120             74       
    20.4            4             141             80       
    15.4            4             121            110       
    19.6            6             145             76       
    12.6            6             168            116       
    13.8            6             146            120       
    15.8            6             231            110       
      19            8             350            105       
    17.1            6             200             88       
    16.6            6             225             85       
    19.6            4             112             88       
    18.6            4             112             88       
      18            4             112             88       
    16.2            4             112             85       
      16            4             135             84       
      18            4             151             90       
    16.4            4             140             92       
    20.5            4             151            NaN       
    15.3            4             105             74       
    18.2            4              91             68       
    17.6            4              91             68       
    14.7            4             105             63       
    17.3            4              98             70       
    14.5            4             120             88       
    14.5            4             107             75       
    16.9            4             108             70       
      15            4              91             67       
    15.7            4              91             67       
    16.2            4              91             67       
    16.4            6             181            110       
      17            6             262             85       
    14.5            4             156             92       
    14.7            6             232            112       
    13.9            4             144             96       
      13            4             135             84       
    17.3            4             151             90       
    15.6            4             140             86       
    24.6            4              97             52       
    11.6            4             135             84       
    18.6            4             120             79       
    19.4            4             119             82       

    MPG     Model          Model_Year    Origin     Weight
      18    [1x36 char]    70            USA        3504  
      15    [1x36 char]    70            USA        3693  
      18    [1x36 char]    70            USA        3436  
      16    [1x36 char]    70            USA        3433  
      17    [1x36 char]    70            USA        3449  
      15    [1x36 char]    70            USA        4341  
      14    [1x36 char]    70            USA        4354  
      14    [1x36 char]    70            USA        4312  
      14    [1x36 char]    70            USA        4425  
      15    [1x36 char]    70            USA        3850  
     NaN    [1x36 char]    70            France     3090  
     NaN    [1x36 char]    70            USA        4142  
     NaN    [1x36 char]    70            USA        4034  
     NaN    [1x36 char]    70            USA        4166  
     NaN    [1x36 char]    70            USA        3850  
      15    [1x36 char]    70            USA        3563  
      14    [1x36 char]    70            USA        3609  
     NaN    [1x36 char]    70            USA        3353  
      15    [1x36 char]    70            USA        3761  
      14    [1x36 char]    70            USA        3086  
      24    [1x36 char]    70            Japan      2372  
      22    [1x36 char]    70            USA        2833  
      18    [1x36 char]    70            USA        2774  
      21    [1x36 char]    70            USA        2587  
      27    [1x36 char]    70            Japan      2130  
      26    [1x36 char]    70            Germany    1835  
      25    [1x36 char]    70            France     2672  
      24    [1x36 char]    70            Germany    2430  
      25    [1x36 char]    70            Sweden     2375  
      26    [1x36 char]    70            Germany    2234  
      21    [1x36 char]    70            USA        2648  
      10    [1x36 char]    70            USA        4615  
      10    [1x36 char]    70            USA        4376  
      11    [1x36 char]    70            USA        4382  
       9    [1x36 char]    70            USA        4732  
      27    [1x36 char]    71            Japan      2130  
      28    [1x36 char]    71            USA        2264  
      25    [1x36 char]    71            Japan      2228  
      25    [1x36 char]    71            USA        2046  
     NaN    [1x36 char]    71            Germany    1978  
      19    [1x36 char]    71            USA        2634  
      16    [1x36 char]    71            USA        3439  
      17    [1x36 char]    71            USA        3329  
      19    [1x36 char]    71            USA        3302  
      18    [1x36 char]    71            USA        3288  
      14    [1x36 char]    71            USA        4209  
      14    [1x36 char]    71            USA        4464  
      14    [1x36 char]    71            USA        4154  
      14    [1x36 char]    71            USA        4096  
      12    [1x36 char]    71            USA        4955  
      13    [1x36 char]    71            USA        4746  
      13    [1x36 char]    71            USA        5140  
      18    [1x36 char]    71            USA        2962  
      22    [1x36 char]    71            USA        2408  
      19    [1x36 char]    71            USA        3282  
      18    [1x36 char]    71            USA        3139  
      23    [1x36 char]    71            USA        2220  
      28    [1x36 char]    71            Germany    2123  
      30    [1x36 char]    71            France     2074  
      30    [1x36 char]    71            Italy      2065  
      31    [1x36 char]    71            Japan      1773  
      35    [1x36 char]    71            Japan      1613  
      27    [1x36 char]    71            Germany    1834  
      26    [1x36 char]    71            USA        1955  
      24    [1x36 char]    72            Japan      2278  
      25    [1x36 char]    72            USA        2126  
      23    [1x36 char]    72            Germany    2254  
      20    [1x36 char]    72            USA        2408  
      21    [1x36 char]    72            USA        2226  
      13    [1x36 char]    72            USA        4274  
      14    [1x36 char]    72            USA        4385  
      15    [1x36 char]    72            USA        4135  
      14    [1x36 char]    72            USA        4129  
      17    [1x36 char]    72            USA        3672  
      11    [1x36 char]    72            USA        4633  
      13    [1x36 char]    72            USA        4502  
      12    [1x36 char]    72            USA        4456  
      13    [1x36 char]    72            USA        4422  
      19    [1x36 char]    72            Japan      2330  
      15    [1x36 char]    72            USA        3892  
      13    [1x36 char]    72            USA        4098  
      13    [1x36 char]    72            USA        4294  
      14    [1x36 char]    72            USA        4077  
      18    [1x36 char]    72            Sweden     2933  
      22    [1x36 char]    72            Germany    2511  
      21    [1x36 char]    72            France     2979  
      26    [1x36 char]    72            France     2189  
      22    [1x36 char]    72            USA        2395  
      28    [1x36 char]    72            Japan      2288  
      23    [1x36 char]    72            Japan      2506  
      28    [1x36 char]    72            USA        2164  
      27    [1x36 char]    72            Japan      2100  
      13    [1x36 char]    73            USA        4100  
      14    [1x36 char]    73            USA        3672  
      13    [1x36 char]    73            USA        3988  
      14    [1x36 char]    73            USA        4042  
      15    [1x36 char]    73            USA        3777  
      12    [1x36 char]    73            USA        4952  
      13    [1x36 char]    73            USA        4464  
      13    [1x36 char]    73            USA        4363  
      14    [1x36 char]    73            USA        4237  
      13    [1x36 char]    73            USA        4735  
      12    [1x36 char]    73            USA        4951  
      13    [1x36 char]    73            USA        3821  
      18    [1x36 char]    73            USA        3121  
      16    [1x36 char]    73            USA        3278  
      18    [1x36 char]    73            USA        2945  
      18    [1x36 char]    73            USA        3021  
      23    [1x36 char]    73            USA        2904  
      26    [1x36 char]    73            Germany    1950  
      11    [1x36 char]    73            USA        4997  
      12    [1x36 char]    73            USA        4906  
      13    [1x36 char]    73            USA        4654  
      12    [1x36 char]    73            USA        4499  
      18    [1x36 char]    73            USA        2789  
      20    [1x36 char]    73            Japan      2279  
      21    [1x36 char]    73            USA        2401  
      22    [1x36 char]    73            Japan      2379  
      18    [1x36 char]    73            Japan      2124  
      19    [1x36 char]    73            USA        2310  
      21    [1x36 char]    73            USA        2472  
      26    [1x36 char]    73            Italy      2265  
      15    [1x36 char]    73            USA        4082  
      16    [1x36 char]    73            USA        4278  
      29    [1x36 char]    73            Italy      1867  
      24    [1x36 char]    73            Germany    2158  
      20    [1x36 char]    73            Germany    2582  
      19    [1x36 char]    73            Sweden     2868  
      15    [1x36 char]    73            USA        3399  
      24    [1x36 char]    73            Sweden     2660  
      20    [1x36 char]    73            Japan      2807  
      11    [1x36 char]    73            USA        3664  
      20    [1x36 char]    74            USA        3102  
      21    [1x36 char]    74            USA        2875  
      19    [1x36 char]    74            USA        2901  
      15    [1x36 char]    74            USA        3336  
      31    [1x36 char]    74            Japan      1950  
      26    [1x36 char]    74            USA        2451  
      32    [1x36 char]    74            Japan      1836  
      25    [1x36 char]    74            USA        2542  
      16    [1x36 char]    74            USA        3781  
      16    [1x36 char]    74            USA        3632  
      18    [1x36 char]    74            USA        3613  
      16    [1x36 char]    74            USA        4141  
      13    [1x36 char]    74            USA        4699  
      14    [1x36 char]    74            USA        4457  
      14    [1x36 char]    74            USA        4638  
      14    [1x36 char]    74            USA        4257  
      29    [1x36 char]    74            Germany    2219  
      26    [1x36 char]    74            Germany    1963  
      26    [1x36 char]    74            Germany    2300  
      31    [1x36 char]    74            Japan      1649  
      32    [1x36 char]    74            Japan      2003  
      28    [1x36 char]    74            USA        2125  
      24    [1x36 char]    74            Italy      2108  
      26    [1x36 char]    74            Italy      2246  
      24    [1x36 char]    74            Japan      2489  
      26    [1x36 char]    74            Japan      2391  
      31    [1x36 char]    74            Italy      2000  
      19    [1x36 char]    75            USA        3264  
      18    [1x36 char]    75            USA        3459  
      15    [1x36 char]    75            USA        3432  
      15    [1x36 char]    75            USA        3158  
      16    [1x36 char]    75            USA        4668  
      15    [1x36 char]    75            USA        4440  
      16    [1x36 char]    75            USA        4498  
      14    [1x36 char]    75            USA        4657  
      17    [1x36 char]    75            USA        3907  
      16    [1x36 char]    75            USA        3897  
      15    [1x36 char]    75            USA        3730  
      18    [1x36 char]    75            USA        3785  
      21    [1x36 char]    75            USA        3039  
      20    [1x36 char]    75            USA        3221  
      13    [1x36 char]    75            USA        3169  
      29    [1x36 char]    75            Japan      2171  
      23    [1x36 char]    75            USA        2639  
      20    [1x36 char]    75            USA        2914  
      23    [1x36 char]    75            USA        2592  
      24    [1x36 char]    75            Japan      2702  
      25    [1x36 char]    75            Germany    2223  
      24    [1x36 char]    75            Japan      2545  
      18    [1x36 char]    75            USA        2984  
      29    [1x36 char]    75            Germany    1937  
      19    [1x36 char]    75            USA        3211  
      23    [1x36 char]    75            Germany    2694  
      23    [1x36 char]    75            France     2957  
      22    [1x36 char]    75            Sweden     2945  
      25    [1x36 char]    75            Sweden     2671  
      33    [1x36 char]    75            Japan      1795  
      28    [1x36 char]    76            Italy      2464  
      25    [1x36 char]    76            Germany    2220  
      25    [1x36 char]    76            USA        2572  
      26    [1x36 char]    76            USA        2255  
      27    [1x36 char]    76            France     2202  
    17.5    [1x36 char]    76            USA        4215  
      16    [1x36 char]    76            USA        4190  
    15.5    [1x36 char]    76            USA        3962  
    14.5    [1x36 char]    76            USA        4215  
      22    [1x36 char]    76            USA        3233  
      22    [1x36 char]    76            USA        3353  
      24    [1x36 char]    76            USA        3012  
    22.5    [1x36 char]    76            USA        3085  
      29    [1x36 char]    76            USA        2035  
    24.5    [1x36 char]    76            USA        2164  
      29    [1x36 char]    76            Germany    1937  
      33    [1x36 char]    76            Japan      1795  
      20    [1x36 char]    76            USA        3651  
      18    [1x36 char]    76            USA        3574  
    18.5    [1x36 char]    76            USA        3645  
    17.5    [1x36 char]    76            USA        3193  
    29.5    [1x36 char]    76            Germany    1825  
      32    [1x36 char]    76            Japan      1990  
      28    [1x36 char]    76            Japan      2155  
    26.5    [1x36 char]    76            USA        2565  
      20    [1x36 char]    76            Sweden     3150  
      13    [1x36 char]    76            USA        3940  
      19    [1x36 char]    76            France     3270  
      19    [1x36 char]    76            Japan      2930  
    16.5    [1x36 char]    76            Germany    3820  
    16.5    [1x36 char]    76            USA        4380  
      13    [1x36 char]    76            USA        4055  
      13    [1x36 char]    76            USA        3870  
      13    [1x36 char]    76            USA        3755  
    31.5    [1x36 char]    77            Japan      2045  
      30    [1x36 char]    77            USA        2155  
      36    [1x36 char]    77            France     1825  
    25.5    [1x36 char]    77            USA        2300  
    33.5    [1x36 char]    77            Japan      1945  
    17.5    [1x36 char]    77            USA        3880  
      17    [1x36 char]    77            USA        4060  
    15.5    [1x36 char]    77            USA        4140  
      15    [1x36 char]    77            USA        4295  
    17.5    [1x36 char]    77            USA        3520  
    20.5    [1x36 char]    77            USA        3425  
      19    [1x36 char]    77            USA        3630  
    18.5    [1x36 char]    77            USA        3525  
      16    [1x36 char]    77            USA        4220  
    15.5    [1x36 char]    77            USA        4165  
    15.5    [1x36 char]    77            USA        4325  
      16    [1x36 char]    77            USA        4335  
      29    [1x36 char]    77            Germany    1940  
    24.5    [1x36 char]    77            USA        2740  
      26    [1x36 char]    77            Japan      2265  
    25.5    [1x36 char]    77            USA        2755  
    30.5    [1x36 char]    77            USA        2051  
    33.5    [1x36 char]    77            USA        2075  
      30    [1x36 char]    77            Japan      1985  
    30.5    [1x36 char]    77            Germany    2190  
      22    [1x36 char]    77            Japan      2815  
    21.5    [1x36 char]    77            Germany    2600  
    21.5    [1x36 char]    77            Japan      2720  
    43.1    [1x36 char]    78            Germany    1985  
    36.1    [1x36 char]    78            USA        1800  
    32.8    [1x36 char]    78            Japan      1985  
    39.4    [1x36 char]    78            Japan      2070  
    36.1    [1x36 char]    78            Japan      1800  
    19.9    [1x36 char]    78            USA        3365  
    19.4    [1x36 char]    78            USA        3735  
    20.2    [1x36 char]    78            USA        3570  
    19.2    [1x36 char]    78            USA        3535  
    20.5    [1x36 char]    78            USA        3155  
    20.2    [1x36 char]    78            USA        2965  
    25.1    [1x36 char]    78            USA        2720  
    20.5    [1x36 char]    78            USA        3430  
    19.4    [1x36 char]    78            USA        3210  
    20.6    [1x36 char]    78            USA        3380  
    20.8    [1x36 char]    78            USA        3070  
    18.6    [1x36 char]    78            USA        3620  
    18.1    [1x36 char]    78            USA        3410  
    19.2    [1x36 char]    78            USA        3425  
    17.7    [1x36 char]    78            USA        3445  
    18.1    [1x36 char]    78            USA        3205  
    17.5    [1x36 char]    78            USA        4080  
      30    [1x36 char]    78            USA        2155  
    27.5    [1x36 char]    78            Japan      2560  
    27.2    [1x36 char]    78            Japan      2300  
    30.9    [1x36 char]    78            USA        2230  
    21.1    [1x36 char]    78            Japan      2515  
    23.2    [1x36 char]    78            USA        2745  
    23.8    [1x36 char]    78            USA        2855  
    23.9    [1x36 char]    78            Japan      2405  
    20.3    [1x36 char]    78            Germany    2830  
      17    [1x36 char]    78            Sweden     3140  
    21.6    [1x36 char]    78            Sweden     2795  
    16.2    [1x36 char]    78            France     3410  
    31.5    [1x36 char]    78            Germany    1990  
    29.5    [1x36 char]    78            Japan      2135  
    21.5    [1x36 char]    79            USA        3245  
    19.8    [1x36 char]    79            USA        2990  
    22.3    [1x36 char]    79            USA        2890  
    20.2    [1x36 char]    79            USA        3265  
    20.6    [1x36 char]    79            USA        3360  
      17    [1x36 char]    79            USA        3840  
    17.6    [1x36 char]    79            USA        3725  
    16.5    [1x36 char]    79            USA        3955  
    18.2    [1x36 char]    79            USA        3830  
    16.9    [1x36 char]    79            USA        4360  
    15.5    [1x36 char]    79            USA        4054  
    19.2    [1x36 char]    79            USA        3605  
    18.5    [1x36 char]    79            USA        3940  
    31.9    [1x36 char]    79            Germany    1925  
    34.1    [1x36 char]    79            Japan      1975  
    35.7    [1x36 char]    79            USA        1915  
    27.4    [1x36 char]    79            USA        2670  
    25.4    [1x36 char]    79            Germany    3530  
      23    [1x36 char]    79            USA        3900  
    27.2    [1x36 char]    79            France     3190  
    23.9    [1x36 char]    79            USA        3420  
    34.2    [1x36 char]    79            USA        2200  
    34.5    [1x36 char]    79            USA        2150  
    31.8    [1x36 char]    79            Japan      2020  
    37.3    [1x36 char]    79            Italy      2130  
    28.4    [1x36 char]    79            USA        2670  
    28.8    [1x36 char]    79            USA        2595  
    26.8    [1x36 char]    79            USA        2700  
    33.5    [1x36 char]    79            USA        2556  
    41.5    [1x36 char]    80            Germany    2144  
    38.1    [1x36 char]    80            Japan      1968  
    32.1    [1x36 char]    80            USA        2120  
    37.2    [1x36 char]    80            Japan      2019  
      28    [1x36 char]    80            USA        2678  
    26.4    [1x36 char]    80            USA        2870  
    24.3    [1x36 char]    80            USA        3003  
    19.1    [1x36 char]    80            USA        3381  
    34.3    [1x36 char]    80            Germany    2188  
    29.8    [1x36 char]    80            Japan      2711  
    31.3    [1x36 char]    80            Japan      2542  
      37    [1x36 char]    80            Japan      2434  
    32.2    [1x36 char]    80            Japan      2265  
    46.6    [1x36 char]    80            Japan      2110  
    27.9    [1x36 char]    80            USA        2800  
    40.8    [1x36 char]    80            Japan      2110  
    44.3    [1x36 char]    80            Germany    2085  
    43.4    [1x36 char]    80            Germany    2335  
    36.4    [1x36 char]    80            Germany    2950  
      30    [1x36 char]    80            Germany    3250  
    44.6    [1x36 char]    80            Japan      1850  
    40.9    [1x36 char]    80            France     1835  
    33.8    [1x36 char]    80            Japan      2145  
    29.8    [1x36 char]    80            Germany    1845  
    32.7    [1x36 char]    80            Japan      2910  
    23.7    [1x36 char]    80            Japan      2420  
      35    [1x36 char]    80            England    2500  
    23.6    [1x36 char]    80            USA        2905  
    32.4    [1x36 char]    80            Japan      2290  
    27.2    [1x36 char]    81            USA        2490  
    26.6    [1x36 char]    81            USA        2635  
    25.8    [1x36 char]    81            USA        2620  
    23.5    [1x36 char]    81            USA        2725  
      30    [1x36 char]    81            USA        2385  
    39.1    [1x36 char]    81            Japan      1755  
      39    [1x36 char]    81            USA        1875  
    35.1    [1x36 char]    81            Japan      1760  
    32.3    [1x36 char]    81            Japan      2065  
      37    [1x36 char]    81            Japan      1975  
    37.7    [1x36 char]    81            Japan      2050  
    34.1    [1x36 char]    81            Japan      1985  
    34.7    [1x36 char]    81            USA        2215  
    34.4    [1x36 char]    81            USA        2045  
    29.9    [1x36 char]    81            USA        2380  
      33    [1x36 char]    81            Germany    2190  
    34.5    [1x36 char]    81            France     2320  
    33.7    [1x36 char]    81            Japan      2210  
    32.4    [1x36 char]    81            Japan      2350  
    32.9    [1x36 char]    81            Japan      2615  
    31.6    [1x36 char]    81            Japan      2635  
    28.1    [1x36 char]    81            France     3230  
     NaN    [1x36 char]    81            Sweden     2800  
    30.7    [1x36 char]    81            Sweden     3160  
    25.4    [1x36 char]    81            Japan      2900  
    24.2    [1x36 char]    81            Japan      2930  
    22.4    [1x36 char]    81            USA        3415  
    26.6    [1x36 char]    81            USA        3725  
    20.2    [1x36 char]    81            USA        3060  
    17.6    [1x36 char]    81            USA        3465  
      28    [1x36 char]    82            USA        2605  
      27    [1x36 char]    82            USA        2640  
      34    [1x36 char]    82            USA        2395  
      31    [1x36 char]    82            USA        2575  
      29    [1x36 char]    82            USA        2525  
      27    [1x36 char]    82            USA        2735  
      24    [1x36 char]    82            USA        2865  
      23    [1x36 char]    82            USA        3035  
      36    [1x36 char]    82            Germany    1980  
      37    [1x36 char]    82            Japan      2025  
      31    [1x36 char]    82            Japan      1970  
      38    [1x36 char]    82            USA        2125  
      36    [1x36 char]    82            USA        2125  
      36    [1x36 char]    82            Japan      2160  
      36    [1x36 char]    82            Japan      2205  
      34    [1x36 char]    82            Japan      2245  
      38    [1x36 char]    82            Japan      1965  
      32    [1x36 char]    82            Japan      1965  
      38    [1x36 char]    82            Japan      1995  
      25    [1x36 char]    82            USA        2945  
      38    [1x36 char]    82            USA        3015  
      26    [1x36 char]    82            USA        2585  
      22    [1x36 char]    82            USA        2835  
      32    [1x36 char]    82            Japan      2665  
      36    [1x36 char]    82            USA        2370  
      27    [1x36 char]    82            USA        2950  
      27    [1x36 char]    82            USA        2790  
      44    [1x36 char]    82            Germany    2130  
      32    [1x36 char]    82            USA        2295  
      28    [1x36 char]    82            USA        2625  
      31    [1x36 char]    82            USA        2720  

The summary function provides basic statistical information for each of the variables included in the dataset object. Notice that there are some missing values for Horsepower and MPG, denoted by NaNs.

clc
summary(cars);
Acceleration: [406x1 double]
     min      1st Q        median       3rd Q        max     
       8      13.7000      15.5000      17.2000      24.8000 
Cylinders: [406x1 double]
     min      1st Q      median      3rd Q      max 
       3          4           4          8        8 
Displacement: [406x1 double]
     min      1st Q      median      3rd Q      max 
      68        105         151        302      455 
Horsepower: [406x1 double]
  Columns 1 through 5
     min      1st Q        median      3rd Q      max 
      46      75.5000          95        130      230 
  Column 6
     NaNs 
        6 
MPG: [406x1 double]
  Columns 1 through 5
     min      1st Q        median      3rd Q      max     
       9      17.5000          23         29      46.6000 
  Column 6
     NaNs 
        8 
Model: [406x36 char]
Model_Year: [406x1 double]
     min      1st Q      median      3rd Q      max 
      70         73          76         79       82 
Origin: [406x7 char]
Weight: [406x1 double]
     min       1st Q      median           3rd Q      max  
     1613       2226      2.8225e+003       3620      5140 

We can index into a dataset object like a regular matrix.

clc
cars(1:10, :)
ans = 
    Acceleration    Cylinders    Displacement    Horsepower
      12            8            307             130       
    11.5            8            350             165       
      11            8            318             150       
      12            8            304             150       
    10.5            8            302             140       
      10            8            429             198       
       9            8            454             220       
     8.5            8            440             215       
      10            8            455             225       
     8.5            8            390             190       

    MPG    Model          Model_Year    Origin     Weight
    18     [1x36 char]    70            USA        3504  
    15     [1x36 char]    70            USA        3693  
    18     [1x36 char]    70            USA        3436  
    16     [1x36 char]    70            USA        3433  
    17     [1x36 char]    70            USA        3449  
    15     [1x36 char]    70            USA        4341  
    14     [1x36 char]    70            USA        4354  
    14     [1x36 char]    70            USA        4312  
    14     [1x36 char]    70            USA        4425  
    15     [1x36 char]    70            USA        3850  

We can access individual columns by referencing them by their names...

clc
cars(1:10, {'Origin', 'MPG', 'Weight'})
ans = 
    Origin     MPG    Weight
    USA        18     3504  
    USA        15     3693  
    USA        18     3436  
    USA        16     3433  
    USA        17     3449  
    USA        15     4341  
    USA        14     4354  
    USA        14     4312  
    USA        14     4425  
    USA        15     3850  

The dot-notation allows you to extract the whole content of a variable.

clc
cars.Horsepower(1:10)
ans =
   130
   165
   150
   150
   140
   198
   220
   215
   225
   190

Dataset objects store meta information.

get(cars)
       Description: ''
    VarDescription: {}
             Units: {}
          DimNames: {'Observations'  'Variables'}
          UserData: []
          ObsNames: {}
          VarNames: {1x9 cell}

We can add dataset descriptions as well as units for the variables.

clc
cars = set(cars, 'Description', 'Performance and structural information of automobiles');
cars = set(cars, 'Units', {'m/s^2', '', 'mm', 'hp', 'mpg', '', '', '', 'kg'});

summary(cars)
Performance and structural information of automobiles

Acceleration: [406x1 double, Units = m/s^2]
     min      1st Q        median       3rd Q        max     
       8      13.7000      15.5000      17.2000      24.8000 
Cylinders: [406x1 double]
     min      1st Q      median      3rd Q      max 
       3          4           4          8        8 
Displacement: [406x1 double, Units = mm]
     min      1st Q      median      3rd Q      max 
      68        105         151        302      455 
Horsepower: [406x1 double, Units = hp]
  Columns 1 through 5
     min      1st Q        median      3rd Q      max 
      46      75.5000          95        130      230 
  Column 6
     NaNs 
        6 
MPG: [406x1 double, Units = mpg]
  Columns 1 through 5
     min      1st Q        median      3rd Q      max     
       9      17.5000          23         29      46.6000 
  Column 6
     NaNs 
        8 
Model: [406x36 char]
Model_Year: [406x1 double]
     min      1st Q      median      3rd Q      max 
      70         73          76         79       82 
Origin: [406x7 char]
Weight: [406x1 double, Units = kg]
     min       1st Q      median           3rd Q      max  
     1613       2226      2.8225e+003       3620      5140 

Categorical Arrays

Notice that some of the variables take on discrete values. For instance, the Cylinders, and Origin take on a unique set of values:

clc
disp('Cylinders:');
unique(cars(:, 'Cylinders'))

disp('Origin:');
unique(cars(:, 'Origin'))
Cylinders:
ans = 
    Cylinders
    3        
    4        
    5        
    6        
    8        
Origin:
ans = 
    Origin 
    England
    France 
    Germany
    Italy  
    Japan  
    Sweden 
    USA    

Categorical arrays provide significant memory savings. We will convert Cylinders to an ordinal array, which contains ordering information. The variable Origin will be converted to a nominal array, which does not store ordering.

clc
Cylinders_cat = ordinal(Cylinders);
Origin_cat = nominal(Origin);

whos Cylinders* Origin*
  Name                 Size            Bytes  Class      Attributes

  Cylinders          406x1              3248  double               
  Cylinders_cat      406x1              1178  ordinal              
  Origin             406x7              5684  char                 
  Origin_cat         406x1              1366  nominal              

Now, let's convert the variables of the dataset object.

cars.Cylinders = ordinal(cars.Cylinders);
cars.Origin = nominal(cars.Origin);

Filtering

Dataset objects can be easily filtered by criteria.

For example, we can create a logical array that has ONEs where the origin is Germany and ZEROs where it's not Germany.

germanyMask = cars.Origin == 'Germany'
germanyMask =
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     0
     1
     0
     1
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     0
     0
     0
     0
     1
     0
     0
     0
     1
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     1
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     1
     1
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     0
     0
     1
     0
     1
     0
     0
     0
     0
     0
     1
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     0
     0
     0
     0
     0
     1
     0
     0
     0
     0
     0
     0
     0
     1
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     0
     0
     0
     0
     0
     0
     1
     0
     1
     0
     1
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     0
     0
     0
     1
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     0
     0
     0
     1
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     0
     0
     0
     0
     0
     0
     0
     1
     0
     0
     0
     0
     0
     0
     0
     1
     1
     1
     1
     0
     0
     0
     1
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
     1
     0
     0
     0

Use mask to extract out all German cars.

clc
cars(germanyMask, :)
ans = 
    Acceleration    Cylinders    Displacement    Horsepower
    20.5            4             97              46       
    14.5            4            107              90       
    12.5            4            121             113       
      20            4             97              48       
      14            4            116              90       
      19            4             97              60       
    23.5            4             97              54       
      18            4            121              76       
      21            4             97              46       
    15.5            4            116              75       
      14            4            114              91       
    16.5            4             98              83       
    15.5            4             79              67       
    14.5            4             97              78       
    16.5            4             90              71       
      14            4             90              70       
      15            4            115              95       
    16.9            4            116              81       
    14.2            4             90              70       
    12.2            4             97              71       
    16.7            6            168             120       
    14.5            4             97              78       
    14.1            4             97              78       
    12.8            4            121             110       
    21.5            4             90              48       
    15.9            5            131             103       
    14.9            4             89              71       
      14            4             89              71       
    20.1            5            183              77       
    14.7            4             98              76       
    15.8            4             97              78       
    21.7            4             90              48       
    23.7            4             90              48       
    19.9            5            121              67       
    21.8            4            146              67       
    15.3            4             89              62       
    14.2            4            105              74       
    15.3            4            105              74       
    24.6            4             97              52       

    MPG     Model          Model_Year    Origin     Weight
      26    [1x36 char]    70            Germany    1835  
      24    [1x36 char]    70            Germany    2430  
      26    [1x36 char]    70            Germany    2234  
     NaN    [1x36 char]    71            Germany    1978  
      28    [1x36 char]    71            Germany    2123  
      27    [1x36 char]    71            Germany    1834  
      23    [1x36 char]    72            Germany    2254  
      22    [1x36 char]    72            Germany    2511  
      26    [1x36 char]    73            Germany    1950  
      24    [1x36 char]    73            Germany    2158  
      20    [1x36 char]    73            Germany    2582  
      29    [1x36 char]    74            Germany    2219  
      26    [1x36 char]    74            Germany    1963  
      26    [1x36 char]    74            Germany    2300  
      25    [1x36 char]    75            Germany    2223  
      29    [1x36 char]    75            Germany    1937  
      23    [1x36 char]    75            Germany    2694  
      25    [1x36 char]    76            Germany    2220  
      29    [1x36 char]    76            Germany    1937  
    29.5    [1x36 char]    76            Germany    1825  
    16.5    [1x36 char]    76            Germany    3820  
      29    [1x36 char]    77            Germany    1940  
    30.5    [1x36 char]    77            Germany    2190  
    21.5    [1x36 char]    77            Germany    2600  
    43.1    [1x36 char]    78            Germany    1985  
    20.3    [1x36 char]    78            Germany    2830  
    31.5    [1x36 char]    78            Germany    1990  
    31.9    [1x36 char]    79            Germany    1925  
    25.4    [1x36 char]    79            Germany    3530  
    41.5    [1x36 char]    80            Germany    2144  
    34.3    [1x36 char]    80            Germany    2188  
    44.3    [1x36 char]    80            Germany    2085  
    43.4    [1x36 char]    80            Germany    2335  
    36.4    [1x36 char]    80            Germany    2950  
      30    [1x36 char]    80            Germany    3250  
    29.8    [1x36 char]    80            Germany    1845  
      33    [1x36 char]    81            Germany    2190  
      36    [1x36 char]    82            Germany    1980  
      44    [1x36 char]    82            Germany    2130  

Scatter plot grouped by the year of the make.

gscatter(cars.MPG, cars.Weight, cars.Model_Year, '', 'xos');
xlabel('Miles per Gallon')
ylabel('Weight')

We notice a general trend, but the amount of data prevents us from getting useful information.

We can use filtering to refine the visualization. Let's extract out only the cars made in 1970, 1976, or 1982.

index = cars.Model_Year == 70 | cars.Model_Year == 76 | cars.Model_Year == 82;
filtered = cars(index,:);

We have a more meaningful scatter plot for this smaller subset.

gscatter(filtered.MPG, filtered.Weight, filtered.Model_Year, '', 'xos');
xlabel('Miles per Gallon')
ylabel('Weight')

Add interactive case names to the plot

gname(filtered.Model);

Concatenate and Join

We can combine datasets by either concatenating or joining.

Concatenate

We have a different set of data that corresponds to small cars. Let's combine this with the original dataset. First, we'll create a dataset object from this data.

% load carsmall
cs = load('carsmall.mat');

% create dataset and convert variables to categorical arrays
cars_s = dataset(cs);
cars_s.Origin = nominal(cars_s.Origin);
cars_s.Cylinders = ordinal(cars_s.Cylinders);

% add additional levels match the levels from the carbig dataset
cars_s.Cylinders = addlevels(cars_s.Cylinders, getlabels(cars.Cylinders));
cars_s.Cylinders = reorderlevels(cars_s.Cylinders, getlabels(cars.Cylinders));
Warning: Ignoring duplicate levels in NEWLEVELS. 

Concatenate using the matrix concatenation notation.

cars_all = [cars; cars_s];
% alternatively,
% cars_all = vertcat(cars, cars_s);

Join

Joining allows you to take the data in one dataset array and assign it to the rows of another dataset array, based on matching values in a common key variable.

clc
tabulate(cars_all.Origin);
    Value    Count   Percent
  England        1      0.20%
   France       18      3.56%
  Germany       48      9.49%
    Italy        9      1.78%
    Japan       94     18.58%
   Sweden       13      2.57%
      USA      323     63.83%

Create a new dataset that maps countries to continents.

clc
Newdata = dataset(...
  {nominal({'England';'France';'Germany';'Italy' ;'Japan';'Sweden';'USA'          }),'Origin'   }, ...
  {nominal({'Europe' ;'Europe';'Europe' ;'Europe';'Asia' ;'Europe';'North America'}),'Continent'})
Newdata = 
    Origin     Continent    
    England    Europe       
    France     Europe       
    Germany    Europe       
    Italy      Europe       
    Japan      Asia         
    Sweden     Europe       
    USA        North America

Join the two datasets to include Continent as a new variable.

cars_all = join(cars_all, Newdata);

clc
cars_all(1:10:100, :)
ans = 
    Acceleration    Cylinders    Displacement    Horsepower
      12            8            307             130       
    17.5            4            133             115       
      15            4            113              95       
      15            6            199              90       
      13            6            232             100       
      12            8            400             170       
      19            4             71              65       
      12            8            400             175       
      14            8            307             130       
      15            4             98              80       

    MPG    Model          Model_Year    Origin    Weight
     18    [1x36 char]    70            USA       3504  
    NaN    [1x36 char]    70            France    3090  
     24    [1x36 char]    70            Japan     2372  
     21    [1x36 char]    70            USA       2648  
     19    [1x36 char]    71            USA       2634  
     13    [1x36 char]    71            USA       4746  
     31    [1x36 char]    71            Japan     1773  
     14    [1x36 char]    72            USA       4385  
     13    [1x36 char]    72            USA       4098  
     28    [1x36 char]    72            USA       2164  

    Continent    
    North America
    Europe       
    Asia         
    North America
    North America
    North America
    Asia         
    North America
    North America
    North America

Dealing with Missing Data

Notice that we have some missing data in our MPG data

clc
cars(5:20, 'MPG')
ans = 
    MPG
     17
     15
     14
     14
     14
     15
    NaN
    NaN
    NaN
    NaN
    NaN
     15
     14
    NaN
     15
     14

One way to deal with missing data is to substitute for the missing value. In this case, we will create a regression model to represent the performance measures (MPG) as functions of possible predictor variables (acceleration, cylinders, horsepower, model year, and weight)

X = [ones(length(cars.MPG),1) cars.Acceleration, double(cars.Cylinders), ...
  cars.Displacement, cars.Horsepower, cars.Model_Year, cars.Weight];
Y = [cars.MPG];
[b,bint,r,rint,stats] = regress(Y, X);

Note that cars.Horsepower contains NaNs. The regress function performs listwise deletion on the independent variables

cars.regress = X * b;
fprintf('R-squared: %f\n', stats(1));
R-squared: 0.814178

Examine the residual.

residuals = cars.MPG - cars.regress;
stem(cars.regress, residuals);
xlabel('model'); ylabel('actual - model');

For cars with low or high MPG, the model seems to underestimate the MPG, while for cars in the middle, the model overestimates the true value.

gname(cars.Model)

Clean Up Our Model

We can potentially improve the model by adding dummy variables to handle diesels, automatic transmissions, and station wagons. In addition, we can filter out 3 and 5 cylinder engines, which are rotary engines.

Dummy variable is a binary variable that has a "1" where it satisfies the criteria and "0" everywhere else.

% Load in as a dataset object
ds = dataset('file','dummy.txt');

% Concatenate
carsall = [cars, ds];
carsall = set(carsall, 'Units', [get(cars, 'Units'), {'', '', '', ''}]);

% Filter out 3- and 5- cylinder engines
index = carsall.Cylinders == '4' | carsall.Cylinders == '6' | carsall.Cylinders == '8';
carsall = carsall(index,:);

Create a New Regression Model

Create a new regression model just by looking at 4, 6, and 8 cylinders and taking into account the car type (station wagon, diesel, automatic).

X = [ones(length(carsall.MPG),1), carsall.Acceleration, ...
  double(carsall.Cylinders), carsall.Displacement, carsall.Horsepower, ...
  carsall.Model_Year, carsall.Weight, carsall.SW, carsall.Diesel, ...
  carsall.Automatic];
Y = [carsall.MPG];
[b, bint, r, rint, stats] = regress(Y, X);

carsall.regress = X * b;

residuals2 = carsall.MPG - carsall.regress;
stem(carsall.regress, residuals2)
xlabel('model'); ylabel('actual - model');
gname(carsall.Model)

Multivariate Analysis of Variance

Multivariate analysis of variance to see how similar the cars from various countries are, in terms of MPG, Acceleration, Weight, and Displacement.

X = [carsall.MPG, carsall.Acceleration, carsall.Weight, carsall.Displacement];
[d, p, stats] = manova1(X, carsall.Origin);
manovacluster(stats)

We see that Japanese and German cars are quite similar, and they are very different from English and American cars

Let's add another dummy variable that distinguished Japanese and German cars. Then redo the regression.

carsall.dummy = (carsall.Origin == 'Germany' | carsall.Origin == 'Japan');

X = [ones(length(carsall.MPG),1), carsall.Acceleration, ...
  double(carsall.Cylinders), carsall.Displacement, carsall.Horsepower, ...
  carsall.Model_Year, carsall.Weight, carsall.SW, carsall.Diesel, ...
  carsall.Automatic carsall.dummy];
Y = [carsall.MPG];
[b, bint, r, rint, stats] = regress(Y, X);

carsall.regress = X * b;

% Inspect once again
residuals2 = carsall.MPG - carsall.regress;
stem(carsall.regress, residuals2)
xlabel('model'); ylabel('actual - model');
gname(carsall.Model)

Robust Regression

We can also perform robust regression to deal with the outliers that may exist in the dataset.

X2 = [carsall.Acceleration, double(carsall.Cylinders), ...
  carsall.Displacement, carsall.Horsepower, carsall.Model_Year, ...
  carsall.Weight, carsall.SW, carsall.Diesel, carsall.Automatic, carsall.dummy];
[robustbeta, stats] = robustfit(X2, Y)
X3 = [ones(length(carsall.MPG),1), X2];

carsall.regress2 = X3 * robustbeta;
robustbeta =
   -4.1872
   -0.2086
   -1.6006
    0.0126
   -0.0166
    0.6444
   -0.0048
    0.1167
   12.4719
   -2.6949
    1.7508
stats = 
        ols_s: 2.9910
     robust_s: 2.8012
        mad_s: 2.6344
            s: 2.8477
        resid: [399x1 double]
        rstud: [399x1 double]
           se: [11x1 double]
         covb: [11x11 double]
    coeffcorr: [11x11 double]
            t: [11x1 double]
            p: [11x1 double]
            w: [399x1 double]
            R: [11x11 double]
          dfe: 374
            h: [399x1 double]

Perform Regression Substitution

We have been looking at linear regressions so far, but we might be able to apply some nonlinear regressions to get a better predictive model.

Once we have a regression model, we can go ahead substitute the missing values with the model data.

carsall.mask = isnan(carsall.MPG);
carsall.MPG(carsall.mask) = carsall.regress2(carsall.mask);

carsall(5:20, 'MPG')
ans = 
    MPG   
        17
        15
        14
        14
        14
        15
    18.881
    12.256
    13.095
    12.597
    13.732
        15
        14
    16.498
        15
        14

Contact us