Code covered by the BSD License

# Demo files for MathWorks Statistics Webinar (Oct 14, 2008)

### Jiro Doke (view profile)

22 Oct 2008 (Updated )

Demo and presentation files for the Oct 14, 2008 Webinar on statistics.

Solving Data Management and Analysis Challenges using MATLAB&reg; and Statistics Toolboxâ„˘

# Solving Data Management and Analysis Challenges using MATLAB® and Statistics Toolbox™

Demo file for the Data Management and Statistics Webinar. This demo requires the Statistics Toolbox and was created using MATLAB 7.7 (R2008b).

In this demo, we will see how we can take a set of data describing performance and characteristics of various cars, and organize, extract, and visualize useful information for further analysis.

## Automobile Data

Now let's begin. We'll work with this MAT file which contains some automobile data.

```clear;
whos
```
```  Name                Size            Bytes  Class     Attributes

Acceleration      406x1              3248  double
Cylinders         406x1              3248  double
Displacement      406x1              3248  double
Horsepower        406x1              3248  double
MPG               406x1              3248  double
Model             406x36            29232  char
Model_Year        406x1              3248  double
Origin            406x7              5684  char
Weight            406x1              3248  double
cyl4              406x5              4060  char
org               406x7              5684  char
when              406x5              4060  char

```

This data set contains information regarding 406 different cars. There are different variables for each piece of information, and each row corresponds to the same car.

## Dataset Object

Dataset objects allow you to organize information in a tabular format, and have structures very much like that of matrices. Each row represents the observations, or "cars" in this case, and the columns represent the variables, with the appropriate header names.

```clc;
cars = dataset(Acceleration, Cylinders, Displacement, Horsepower, ...
MPG, Model, Model_Year, Origin, Weight)
```
```cars =
Acceleration    Cylinders    Displacement    Horsepower
12            8             307            130
11.5            8             350            165
11            8             318            150
12            8             304            150
10.5            8             302            140
10            8             429            198
9            8             454            220
8.5            8             440            215
10            8             455            225
8.5            8             390            190
17.5            4             133            115
11.5            8             350            165
11            8             351            153
10.5            8             383            175
11            8             360            175
10            8             383            170
8            8             340            160
8            8             302            140
9.5            8             400            150
10            8             455            225
15            4             113             95
15.5            6             198             95
15.5            6             199             97
16            6             200             85
14.5            4              97             88
20.5            4              97             46
17.5            4             110             87
14.5            4             107             90
17.5            4             104             95
12.5            4             121            113
15            6             199             90
14            8             360            215
15            8             307            200
13.5            8             318            210
18.5            8             304            193
14.5            4              97             88
15.5            4             140             90
14            4             113             95
19            4              98            NaN
20            4              97             48
13            6             232            100
15.5            6             225            105
15.5            6             250            100
15.5            6             250             88
15.5            6             232            100
12            8             350            165
11.5            8             400            175
13.5            8             351            153
13            8             318            150
11.5            8             383            180
12            8             400            170
12            8             400            175
13.5            6             258            110
19            4             140             72
15            6             250            100
14.5            6             250             88
14            4             122             86
14            4             116             90
19.5            4              79             70
14.5            4              88             76
19            4              71             65
18            4              72             69
19            4              97             60
20.5            4              91             70
15.5            4             113             95
17            4            97.5             80
23.5            4              97             54
19.5            4             140             90
16.5            4             122             86
12            8             350            165
12            8             400            175
13.5            8             318            150
13            8             351            153
11.5            8             304            150
11            8             429            208
13.5            8             350            155
13.5            8             350            160
12.5            8             400            190
13.5            3              70             97
12.5            8             304            150
14            8             307            130
16            8             302            140
14            8             318            150
14.5            4             121            112
18            4             121             76
19.5            4             120             87
18            4              96             69
16            4             122             86
17            4              97             92
14.5            4             120             97
15            4              98             80
16.5            4              97             88
13            8             350            175
11.5            8             304            150
13            8             350            145
14.5            8             302            137
12.5            8             318            150
11.5            8             429            198
12            8             400            150
13            8             351            158
14.5            8             318            150
11            8             440            215
11            8             455            225
11            8             360            175
16.5            6             225            105
18            6             250            100
16            6             232            100
16.5            6             250             88
16            6             198             95
21            4              97             46
14            8             400            150
12.5            8             400            167
13            8             360            170
12.5            8             350            180
15            6             232            100
19            4              97             88
19.5            4             140             72
16.5            4             108             94
13.5            3              70             90
18.5            4             122             85
14            6             155            107
15.5            4              98             90
13            8             350            145
9.5            8             400            230
19.5            4              68             49
15.5            4             116             75
14            4             114             91
15.5            4             121            112
11            8             318            150
14            4             121            110
13.5            6             156            122
11            8             350            180
16.5            6             198             95
17            6             200            NaN
16            6             232            100
17            6             250            100
19            4              79             67
16.5            4             122             80
21            4              71             65
17            4             140             75
17            6             250            100
18            6             258            110
16.5            6             225            105
14            8             302            140
14.5            8             350            150
13.5            8             318            150
16            8             302            140
15.5            8             304            150
16.5            4              98             83
15.5            4              79             67
14.5            4              97             78
16.5            4              76             52
19            4              83             61
14.5            4              90             75
15.5            4              90             75
14            4             116             75
15            4             120             97
15.5            4             108             93
16            4              79             67
16            6             225             95
16            6             250            105
21            6             250             72
19.5            6             250             72
11.5            8             400            170
14            8             350            145
14.5            8             318            150
13.5            8             351            148
21            6             231            110
18.5            6             250            105
19            6             258            110
19            6             225             95
15            6             231            110
13.5            8             262            110
12            8             302            129
16            4              97             75
17            4             140             83
16            6             232            100
18.5            4             140             78
13.5            4             134             96
16.5            4              90             71
17            4             119             97
14.5            6             171             97
14            4              90             70
17            6             232             90
15            4             115             95
17            4             120             88
14.5            4             121             98
13.5            4             121            115
17.5            4              91             53
15.5            4             107             86
16.9            4             116             81
14.9            4             140             92
17.7            4              98             79
15.3            4             101             83
13            8             305            140
13            8             318            150
13.9            8             304            120
12.8            8             351            152
15.4            6             225            100
14.5            6             250            105
17.6            6             200             81
17.6            6             232             90
22.2            4              85             52
22.1            4              98             60
14.2            4              90             70
17.4            4              91             53
17.7            6             225            100
21            6             250             78
16.2            6             250            110
17.8            6             258             95
12.2            4              97             71
17            4              85             70
16.4            4              97             75
13.6            4             140             72
15.7            4             130            102
13.2            8             318            150
21.9            4             120             88
15.5            6             156            108
16.7            6             168            120
12.1            8             350            180
12            8             350            145
15            8             302            130
14            8             318            150
18.5            4              98             68
14.8            4             111             80
18.6            4              79             58
15.5            4             122             96
16.8            4              85             70
12.5            8             305            145
19            8             260            110
13.7            8             318            145
14.9            8             302            130
16.4            6             250            110
16.9            6             231            105
17.7            6             225            100
19            6             250             98
11.1            8             400            180
11.4            8             350            170
12.2            8             400            190
14.5            8             351            149
14.5            4              97             78
16            4             151             88
18.2            4              97             75
15.8            4             140             89
17            4              98             63
15.9            4              98             83
16.4            4              97             67
14.1            4              97             78
14.5            6             146             97
12.8            4             121            110
13.5            3              80            110
21.5            4              90             48
14.4            4              98             66
19.4            4              78             52
18.6            4              85             70
16.4            4              91             60
15.5            8             260            110
13.2            8             318            140
12.8            8             302            139
19.2            6             231            105
18.2            6             200             95
15.8            6             200             85
15.4            4             140             88
17.2            6             225            100
17.2            6             232             90
15.8            6             231            105
16.7            6             200             85
18.7            6             225            110
15.1            6             258            120
13.2            8             305            145
13.4            6             231            165
11.2            8             302            139
13.7            8             318            140
16.5            4              98             68
14.2            4             134             95
14.7            4             119             97
14.5            4             105             75
14.8            4             134             95
16.7            4             156            105
17.6            4             151             85
14.9            4             119             97
15.9            5             131            103
13.6            6             163            125
15.7            4             121            115
15.8            6             163            133
14.9            4              89             71
16.6            4              98             68
15.4            6             231            115
18.2            6             200             85
17.3            4             140             88
18.2            6             232             90
16.6            6             225            110
15.4            8             305            130
13.4            8             302            129
13.2            8             351            138
15.2            8             318            135
14.9            8             350            155
14.3            8             351            142
15            8             267            125
13            8             360            150
14            4              89             71
15.2            4              86             65
14.4            4              98             80
15            4             121             80
20.1            5             183             77
17.4            8             350            125
24.8            4             141             71
22.2            8             260             90
13.2            4             105             70
14.9            4             105             70
19.2            4              85             65
14.7            4              91             69
16            4             151             90
11.3            6             173            115
12.9            6             173            115
13.2            4             151             90
14.7            4              98             76
18.8            4              89             60
15.5            4              98             70
16.4            4              86             65
16.5            4             151             90
18.1            4             140             88
20.1            4             151             90
18.7            6             225             90
15.8            4              97             78
15.5            4             134             90
17.5            4             120             75
15            4             119             92
15.2            4             108             75
17.9            4              86             65
14.4            4             156            105
19.2            4              85             65
21.7            4              90             48
23.7            4              90             48
19.9            5             121             67
21.8            4             146             67
13.8            4              91             67
17.3            4              85            NaN
18            4              97             67
15.3            4              89             62
11.4            6             168            132
12.5            3              70            100
15.1            4             122             88
14.3            4             140            NaN
17            4             107             72
15.7            4             135             84
16.4            4             151             84
14.4            4             156             92
12.6            6             173            110
12.9            4             135             84
16.9            4              79             58
16.4            4              86             64
16.1            4              81             60
17.8            4              97             67
19.4            4              85             65
17.3            4              89             62
16            4              91             68
14.9            4             105             63
16.2            4              98             65
20.7            4              98             65
14.2            4             105             74
15.8            4             100            NaN
14.4            4             107             75
16.8            4             108             75
14.8            4             119            100
18.3            4             120             74
20.4            4             141             80
15.4            4             121            110
19.6            6             145             76
12.6            6             168            116
13.8            6             146            120
15.8            6             231            110
19            8             350            105
17.1            6             200             88
16.6            6             225             85
19.6            4             112             88
18.6            4             112             88
18            4             112             88
16.2            4             112             85
16            4             135             84
18            4             151             90
16.4            4             140             92
20.5            4             151            NaN
15.3            4             105             74
18.2            4              91             68
17.6            4              91             68
14.7            4             105             63
17.3            4              98             70
14.5            4             120             88
14.5            4             107             75
16.9            4             108             70
15            4              91             67
15.7            4              91             67
16.2            4              91             67
16.4            6             181            110
17            6             262             85
14.5            4             156             92
14.7            6             232            112
13.9            4             144             96
13            4             135             84
17.3            4             151             90
15.6            4             140             86
24.6            4              97             52
11.6            4             135             84
18.6            4             120             79
19.4            4             119             82

MPG     Model          Model_Year    Origin     Weight
18    [1x36 char]    70            USA        3504
15    [1x36 char]    70            USA        3693
18    [1x36 char]    70            USA        3436
16    [1x36 char]    70            USA        3433
17    [1x36 char]    70            USA        3449
15    [1x36 char]    70            USA        4341
14    [1x36 char]    70            USA        4354
14    [1x36 char]    70            USA        4312
14    [1x36 char]    70            USA        4425
15    [1x36 char]    70            USA        3850
NaN    [1x36 char]    70            France     3090
NaN    [1x36 char]    70            USA        4142
NaN    [1x36 char]    70            USA        4034
NaN    [1x36 char]    70            USA        4166
NaN    [1x36 char]    70            USA        3850
15    [1x36 char]    70            USA        3563
14    [1x36 char]    70            USA        3609
NaN    [1x36 char]    70            USA        3353
15    [1x36 char]    70            USA        3761
14    [1x36 char]    70            USA        3086
24    [1x36 char]    70            Japan      2372
22    [1x36 char]    70            USA        2833
18    [1x36 char]    70            USA        2774
21    [1x36 char]    70            USA        2587
27    [1x36 char]    70            Japan      2130
26    [1x36 char]    70            Germany    1835
25    [1x36 char]    70            France     2672
24    [1x36 char]    70            Germany    2430
25    [1x36 char]    70            Sweden     2375
26    [1x36 char]    70            Germany    2234
21    [1x36 char]    70            USA        2648
10    [1x36 char]    70            USA        4615
10    [1x36 char]    70            USA        4376
11    [1x36 char]    70            USA        4382
9    [1x36 char]    70            USA        4732
27    [1x36 char]    71            Japan      2130
28    [1x36 char]    71            USA        2264
25    [1x36 char]    71            Japan      2228
25    [1x36 char]    71            USA        2046
NaN    [1x36 char]    71            Germany    1978
19    [1x36 char]    71            USA        2634
16    [1x36 char]    71            USA        3439
17    [1x36 char]    71            USA        3329
19    [1x36 char]    71            USA        3302
18    [1x36 char]    71            USA        3288
14    [1x36 char]    71            USA        4209
14    [1x36 char]    71            USA        4464
14    [1x36 char]    71            USA        4154
14    [1x36 char]    71            USA        4096
12    [1x36 char]    71            USA        4955
13    [1x36 char]    71            USA        4746
13    [1x36 char]    71            USA        5140
18    [1x36 char]    71            USA        2962
22    [1x36 char]    71            USA        2408
19    [1x36 char]    71            USA        3282
18    [1x36 char]    71            USA        3139
23    [1x36 char]    71            USA        2220
28    [1x36 char]    71            Germany    2123
30    [1x36 char]    71            France     2074
30    [1x36 char]    71            Italy      2065
31    [1x36 char]    71            Japan      1773
35    [1x36 char]    71            Japan      1613
27    [1x36 char]    71            Germany    1834
26    [1x36 char]    71            USA        1955
24    [1x36 char]    72            Japan      2278
25    [1x36 char]    72            USA        2126
23    [1x36 char]    72            Germany    2254
20    [1x36 char]    72            USA        2408
21    [1x36 char]    72            USA        2226
13    [1x36 char]    72            USA        4274
14    [1x36 char]    72            USA        4385
15    [1x36 char]    72            USA        4135
14    [1x36 char]    72            USA        4129
17    [1x36 char]    72            USA        3672
11    [1x36 char]    72            USA        4633
13    [1x36 char]    72            USA        4502
12    [1x36 char]    72            USA        4456
13    [1x36 char]    72            USA        4422
19    [1x36 char]    72            Japan      2330
15    [1x36 char]    72            USA        3892
13    [1x36 char]    72            USA        4098
13    [1x36 char]    72            USA        4294
14    [1x36 char]    72            USA        4077
18    [1x36 char]    72            Sweden     2933
22    [1x36 char]    72            Germany    2511
21    [1x36 char]    72            France     2979
26    [1x36 char]    72            France     2189
22    [1x36 char]    72            USA        2395
28    [1x36 char]    72            Japan      2288
23    [1x36 char]    72            Japan      2506
28    [1x36 char]    72            USA        2164
27    [1x36 char]    72            Japan      2100
13    [1x36 char]    73            USA        4100
14    [1x36 char]    73            USA        3672
13    [1x36 char]    73            USA        3988
14    [1x36 char]    73            USA        4042
15    [1x36 char]    73            USA        3777
12    [1x36 char]    73            USA        4952
13    [1x36 char]    73            USA        4464
13    [1x36 char]    73            USA        4363
14    [1x36 char]    73            USA        4237
13    [1x36 char]    73            USA        4735
12    [1x36 char]    73            USA        4951
13    [1x36 char]    73            USA        3821
18    [1x36 char]    73            USA        3121
16    [1x36 char]    73            USA        3278
18    [1x36 char]    73            USA        2945
18    [1x36 char]    73            USA        3021
23    [1x36 char]    73            USA        2904
26    [1x36 char]    73            Germany    1950
11    [1x36 char]    73            USA        4997
12    [1x36 char]    73            USA        4906
13    [1x36 char]    73            USA        4654
12    [1x36 char]    73            USA        4499
18    [1x36 char]    73            USA        2789
20    [1x36 char]    73            Japan      2279
21    [1x36 char]    73            USA        2401
22    [1x36 char]    73            Japan      2379
18    [1x36 char]    73            Japan      2124
19    [1x36 char]    73            USA        2310
21    [1x36 char]    73            USA        2472
26    [1x36 char]    73            Italy      2265
15    [1x36 char]    73            USA        4082
16    [1x36 char]    73            USA        4278
29    [1x36 char]    73            Italy      1867
24    [1x36 char]    73            Germany    2158
20    [1x36 char]    73            Germany    2582
19    [1x36 char]    73            Sweden     2868
15    [1x36 char]    73            USA        3399
24    [1x36 char]    73            Sweden     2660
20    [1x36 char]    73            Japan      2807
11    [1x36 char]    73            USA        3664
20    [1x36 char]    74            USA        3102
21    [1x36 char]    74            USA        2875
19    [1x36 char]    74            USA        2901
15    [1x36 char]    74            USA        3336
31    [1x36 char]    74            Japan      1950
26    [1x36 char]    74            USA        2451
32    [1x36 char]    74            Japan      1836
25    [1x36 char]    74            USA        2542
16    [1x36 char]    74            USA        3781
16    [1x36 char]    74            USA        3632
18    [1x36 char]    74            USA        3613
16    [1x36 char]    74            USA        4141
13    [1x36 char]    74            USA        4699
14    [1x36 char]    74            USA        4457
14    [1x36 char]    74            USA        4638
14    [1x36 char]    74            USA        4257
29    [1x36 char]    74            Germany    2219
26    [1x36 char]    74            Germany    1963
26    [1x36 char]    74            Germany    2300
31    [1x36 char]    74            Japan      1649
32    [1x36 char]    74            Japan      2003
28    [1x36 char]    74            USA        2125
24    [1x36 char]    74            Italy      2108
26    [1x36 char]    74            Italy      2246
24    [1x36 char]    74            Japan      2489
26    [1x36 char]    74            Japan      2391
31    [1x36 char]    74            Italy      2000
19    [1x36 char]    75            USA        3264
18    [1x36 char]    75            USA        3459
15    [1x36 char]    75            USA        3432
15    [1x36 char]    75            USA        3158
16    [1x36 char]    75            USA        4668
15    [1x36 char]    75            USA        4440
16    [1x36 char]    75            USA        4498
14    [1x36 char]    75            USA        4657
17    [1x36 char]    75            USA        3907
16    [1x36 char]    75            USA        3897
15    [1x36 char]    75            USA        3730
18    [1x36 char]    75            USA        3785
21    [1x36 char]    75            USA        3039
20    [1x36 char]    75            USA        3221
13    [1x36 char]    75            USA        3169
29    [1x36 char]    75            Japan      2171
23    [1x36 char]    75            USA        2639
20    [1x36 char]    75            USA        2914
23    [1x36 char]    75            USA        2592
24    [1x36 char]    75            Japan      2702
25    [1x36 char]    75            Germany    2223
24    [1x36 char]    75            Japan      2545
18    [1x36 char]    75            USA        2984
29    [1x36 char]    75            Germany    1937
19    [1x36 char]    75            USA        3211
23    [1x36 char]    75            Germany    2694
23    [1x36 char]    75            France     2957
22    [1x36 char]    75            Sweden     2945
25    [1x36 char]    75            Sweden     2671
33    [1x36 char]    75            Japan      1795
28    [1x36 char]    76            Italy      2464
25    [1x36 char]    76            Germany    2220
25    [1x36 char]    76            USA        2572
26    [1x36 char]    76            USA        2255
27    [1x36 char]    76            France     2202
17.5    [1x36 char]    76            USA        4215
16    [1x36 char]    76            USA        4190
15.5    [1x36 char]    76            USA        3962
14.5    [1x36 char]    76            USA        4215
22    [1x36 char]    76            USA        3233
22    [1x36 char]    76            USA        3353
24    [1x36 char]    76            USA        3012
22.5    [1x36 char]    76            USA        3085
29    [1x36 char]    76            USA        2035
24.5    [1x36 char]    76            USA        2164
29    [1x36 char]    76            Germany    1937
33    [1x36 char]    76            Japan      1795
20    [1x36 char]    76            USA        3651
18    [1x36 char]    76            USA        3574
18.5    [1x36 char]    76            USA        3645
17.5    [1x36 char]    76            USA        3193
29.5    [1x36 char]    76            Germany    1825
32    [1x36 char]    76            Japan      1990
28    [1x36 char]    76            Japan      2155
26.5    [1x36 char]    76            USA        2565
20    [1x36 char]    76            Sweden     3150
13    [1x36 char]    76            USA        3940
19    [1x36 char]    76            France     3270
19    [1x36 char]    76            Japan      2930
16.5    [1x36 char]    76            Germany    3820
16.5    [1x36 char]    76            USA        4380
13    [1x36 char]    76            USA        4055
13    [1x36 char]    76            USA        3870
13    [1x36 char]    76            USA        3755
31.5    [1x36 char]    77            Japan      2045
30    [1x36 char]    77            USA        2155
36    [1x36 char]    77            France     1825
25.5    [1x36 char]    77            USA        2300
33.5    [1x36 char]    77            Japan      1945
17.5    [1x36 char]    77            USA        3880
17    [1x36 char]    77            USA        4060
15.5    [1x36 char]    77            USA        4140
15    [1x36 char]    77            USA        4295
17.5    [1x36 char]    77            USA        3520
20.5    [1x36 char]    77            USA        3425
19    [1x36 char]    77            USA        3630
18.5    [1x36 char]    77            USA        3525
16    [1x36 char]    77            USA        4220
15.5    [1x36 char]    77            USA        4165
15.5    [1x36 char]    77            USA        4325
16    [1x36 char]    77            USA        4335
29    [1x36 char]    77            Germany    1940
24.5    [1x36 char]    77            USA        2740
26    [1x36 char]    77            Japan      2265
25.5    [1x36 char]    77            USA        2755
30.5    [1x36 char]    77            USA        2051
33.5    [1x36 char]    77            USA        2075
30    [1x36 char]    77            Japan      1985
30.5    [1x36 char]    77            Germany    2190
22    [1x36 char]    77            Japan      2815
21.5    [1x36 char]    77            Germany    2600
21.5    [1x36 char]    77            Japan      2720
43.1    [1x36 char]    78            Germany    1985
36.1    [1x36 char]    78            USA        1800
32.8    [1x36 char]    78            Japan      1985
39.4    [1x36 char]    78            Japan      2070
36.1    [1x36 char]    78            Japan      1800
19.9    [1x36 char]    78            USA        3365
19.4    [1x36 char]    78            USA        3735
20.2    [1x36 char]    78            USA        3570
19.2    [1x36 char]    78            USA        3535
20.5    [1x36 char]    78            USA        3155
20.2    [1x36 char]    78            USA        2965
25.1    [1x36 char]    78            USA        2720
20.5    [1x36 char]    78            USA        3430
19.4    [1x36 char]    78            USA        3210
20.6    [1x36 char]    78            USA        3380
20.8    [1x36 char]    78            USA        3070
18.6    [1x36 char]    78            USA        3620
18.1    [1x36 char]    78            USA        3410
19.2    [1x36 char]    78            USA        3425
17.7    [1x36 char]    78            USA        3445
18.1    [1x36 char]    78            USA        3205
17.5    [1x36 char]    78            USA        4080
30    [1x36 char]    78            USA        2155
27.5    [1x36 char]    78            Japan      2560
27.2    [1x36 char]    78            Japan      2300
30.9    [1x36 char]    78            USA        2230
21.1    [1x36 char]    78            Japan      2515
23.2    [1x36 char]    78            USA        2745
23.8    [1x36 char]    78            USA        2855
23.9    [1x36 char]    78            Japan      2405
20.3    [1x36 char]    78            Germany    2830
17    [1x36 char]    78            Sweden     3140
21.6    [1x36 char]    78            Sweden     2795
16.2    [1x36 char]    78            France     3410
31.5    [1x36 char]    78            Germany    1990
29.5    [1x36 char]    78            Japan      2135
21.5    [1x36 char]    79            USA        3245
19.8    [1x36 char]    79            USA        2990
22.3    [1x36 char]    79            USA        2890
20.2    [1x36 char]    79            USA        3265
20.6    [1x36 char]    79            USA        3360
17    [1x36 char]    79            USA        3840
17.6    [1x36 char]    79            USA        3725
16.5    [1x36 char]    79            USA        3955
18.2    [1x36 char]    79            USA        3830
16.9    [1x36 char]    79            USA        4360
15.5    [1x36 char]    79            USA        4054
19.2    [1x36 char]    79            USA        3605
18.5    [1x36 char]    79            USA        3940
31.9    [1x36 char]    79            Germany    1925
34.1    [1x36 char]    79            Japan      1975
35.7    [1x36 char]    79            USA        1915
27.4    [1x36 char]    79            USA        2670
25.4    [1x36 char]    79            Germany    3530
23    [1x36 char]    79            USA        3900
27.2    [1x36 char]    79            France     3190
23.9    [1x36 char]    79            USA        3420
34.2    [1x36 char]    79            USA        2200
34.5    [1x36 char]    79            USA        2150
31.8    [1x36 char]    79            Japan      2020
37.3    [1x36 char]    79            Italy      2130
28.4    [1x36 char]    79            USA        2670
28.8    [1x36 char]    79            USA        2595
26.8    [1x36 char]    79            USA        2700
33.5    [1x36 char]    79            USA        2556
41.5    [1x36 char]    80            Germany    2144
38.1    [1x36 char]    80            Japan      1968
32.1    [1x36 char]    80            USA        2120
37.2    [1x36 char]    80            Japan      2019
28    [1x36 char]    80            USA        2678
26.4    [1x36 char]    80            USA        2870
24.3    [1x36 char]    80            USA        3003
19.1    [1x36 char]    80            USA        3381
34.3    [1x36 char]    80            Germany    2188
29.8    [1x36 char]    80            Japan      2711
31.3    [1x36 char]    80            Japan      2542
37    [1x36 char]    80            Japan      2434
32.2    [1x36 char]    80            Japan      2265
46.6    [1x36 char]    80            Japan      2110
27.9    [1x36 char]    80            USA        2800
40.8    [1x36 char]    80            Japan      2110
44.3    [1x36 char]    80            Germany    2085
43.4    [1x36 char]    80            Germany    2335
36.4    [1x36 char]    80            Germany    2950
30    [1x36 char]    80            Germany    3250
44.6    [1x36 char]    80            Japan      1850
40.9    [1x36 char]    80            France     1835
33.8    [1x36 char]    80            Japan      2145
29.8    [1x36 char]    80            Germany    1845
32.7    [1x36 char]    80            Japan      2910
23.7    [1x36 char]    80            Japan      2420
35    [1x36 char]    80            England    2500
23.6    [1x36 char]    80            USA        2905
32.4    [1x36 char]    80            Japan      2290
27.2    [1x36 char]    81            USA        2490
26.6    [1x36 char]    81            USA        2635
25.8    [1x36 char]    81            USA        2620
23.5    [1x36 char]    81            USA        2725
30    [1x36 char]    81            USA        2385
39.1    [1x36 char]    81            Japan      1755
39    [1x36 char]    81            USA        1875
35.1    [1x36 char]    81            Japan      1760
32.3    [1x36 char]    81            Japan      2065
37    [1x36 char]    81            Japan      1975
37.7    [1x36 char]    81            Japan      2050
34.1    [1x36 char]    81            Japan      1985
34.7    [1x36 char]    81            USA        2215
34.4    [1x36 char]    81            USA        2045
29.9    [1x36 char]    81            USA        2380
33    [1x36 char]    81            Germany    2190
34.5    [1x36 char]    81            France     2320
33.7    [1x36 char]    81            Japan      2210
32.4    [1x36 char]    81            Japan      2350
32.9    [1x36 char]    81            Japan      2615
31.6    [1x36 char]    81            Japan      2635
28.1    [1x36 char]    81            France     3230
NaN    [1x36 char]    81            Sweden     2800
30.7    [1x36 char]    81            Sweden     3160
25.4    [1x36 char]    81            Japan      2900
24.2    [1x36 char]    81            Japan      2930
22.4    [1x36 char]    81            USA        3415
26.6    [1x36 char]    81            USA        3725
20.2    [1x36 char]    81            USA        3060
17.6    [1x36 char]    81            USA        3465
28    [1x36 char]    82            USA        2605
27    [1x36 char]    82            USA        2640
34    [1x36 char]    82            USA        2395
31    [1x36 char]    82            USA        2575
29    [1x36 char]    82            USA        2525
27    [1x36 char]    82            USA        2735
24    [1x36 char]    82            USA        2865
23    [1x36 char]    82            USA        3035
36    [1x36 char]    82            Germany    1980
37    [1x36 char]    82            Japan      2025
31    [1x36 char]    82            Japan      1970
38    [1x36 char]    82            USA        2125
36    [1x36 char]    82            USA        2125
36    [1x36 char]    82            Japan      2160
36    [1x36 char]    82            Japan      2205
34    [1x36 char]    82            Japan      2245
38    [1x36 char]    82            Japan      1965
32    [1x36 char]    82            Japan      1965
38    [1x36 char]    82            Japan      1995
25    [1x36 char]    82            USA        2945
38    [1x36 char]    82            USA        3015
26    [1x36 char]    82            USA        2585
22    [1x36 char]    82            USA        2835
32    [1x36 char]    82            Japan      2665
36    [1x36 char]    82            USA        2370
27    [1x36 char]    82            USA        2950
27    [1x36 char]    82            USA        2790
44    [1x36 char]    82            Germany    2130
32    [1x36 char]    82            USA        2295
28    [1x36 char]    82            USA        2625
31    [1x36 char]    82            USA        2720
```

The summary function provides basic statistical information for each of the variables included in the dataset object. Notice that there are some missing values for Horsepower and MPG, denoted by NaNs.

```clc
summary(cars);
```
```Acceleration: [406x1 double]
min      1st Q        median       3rd Q        max
8      13.7000      15.5000      17.2000      24.8000
Cylinders: [406x1 double]
min      1st Q      median      3rd Q      max
3          4           4          8        8
Displacement: [406x1 double]
min      1st Q      median      3rd Q      max
68        105         151        302      455
Horsepower: [406x1 double]
Columns 1 through 5
min      1st Q        median      3rd Q      max
46      75.5000          95        130      230
Column 6
NaNs
6
MPG: [406x1 double]
Columns 1 through 5
min      1st Q        median      3rd Q      max
9      17.5000          23         29      46.6000
Column 6
NaNs
8
Model: [406x36 char]
Model_Year: [406x1 double]
min      1st Q      median      3rd Q      max
70         73          76         79       82
Origin: [406x7 char]
Weight: [406x1 double]
min       1st Q      median           3rd Q      max
1613       2226      2.8225e+003       3620      5140
```

We can index into a dataset object like a regular matrix.

```clc
cars(1:10, :)
```
```ans =
Acceleration    Cylinders    Displacement    Horsepower
12            8            307             130
11.5            8            350             165
11            8            318             150
12            8            304             150
10.5            8            302             140
10            8            429             198
9            8            454             220
8.5            8            440             215
10            8            455             225
8.5            8            390             190

MPG    Model          Model_Year    Origin     Weight
18     [1x36 char]    70            USA        3504
15     [1x36 char]    70            USA        3693
18     [1x36 char]    70            USA        3436
16     [1x36 char]    70            USA        3433
17     [1x36 char]    70            USA        3449
15     [1x36 char]    70            USA        4341
14     [1x36 char]    70            USA        4354
14     [1x36 char]    70            USA        4312
14     [1x36 char]    70            USA        4425
15     [1x36 char]    70            USA        3850
```

We can access individual columns by referencing them by their names...

```clc
cars(1:10, {'Origin', 'MPG', 'Weight'})
```
```ans =
Origin     MPG    Weight
USA        18     3504
USA        15     3693
USA        18     3436
USA        16     3433
USA        17     3449
USA        15     4341
USA        14     4354
USA        14     4312
USA        14     4425
USA        15     3850
```

The dot-notation allows you to extract the whole content of a variable.

```clc
cars.Horsepower(1:10)
```
```ans =
130
165
150
150
140
198
220
215
225
190
```

Dataset objects store meta information.

```get(cars)
```
```       Description: ''
VarDescription: {}
Units: {}
DimNames: {'Observations'  'Variables'}
UserData: []
ObsNames: {}
VarNames: {1x9 cell}
```

We can add dataset descriptions as well as units for the variables.

```clc
cars = set(cars, 'Description', 'Performance and structural information of automobiles');
cars = set(cars, 'Units', {'m/s^2', '', 'mm', 'hp', 'mpg', '', '', '', 'kg'});

summary(cars)
```
```Performance and structural information of automobiles

Acceleration: [406x1 double, Units = m/s^2]
min      1st Q        median       3rd Q        max
8      13.7000      15.5000      17.2000      24.8000
Cylinders: [406x1 double]
min      1st Q      median      3rd Q      max
3          4           4          8        8
Displacement: [406x1 double, Units = mm]
min      1st Q      median      3rd Q      max
68        105         151        302      455
Horsepower: [406x1 double, Units = hp]
Columns 1 through 5
min      1st Q        median      3rd Q      max
46      75.5000          95        130      230
Column 6
NaNs
6
MPG: [406x1 double, Units = mpg]
Columns 1 through 5
min      1st Q        median      3rd Q      max
9      17.5000          23         29      46.6000
Column 6
NaNs
8
Model: [406x36 char]
Model_Year: [406x1 double]
min      1st Q      median      3rd Q      max
70         73          76         79       82
Origin: [406x7 char]
Weight: [406x1 double, Units = kg]
min       1st Q      median           3rd Q      max
1613       2226      2.8225e+003       3620      5140
```

## Categorical Arrays

Notice that some of the variables take on discrete values. For instance, the Cylinders, and Origin take on a unique set of values:

```clc
disp('Cylinders:');
unique(cars(:, 'Cylinders'))

disp('Origin:');
unique(cars(:, 'Origin'))
```
```Cylinders:
ans =
Cylinders
3
4
5
6
8
Origin:
ans =
Origin
England
France
Germany
Italy
Japan
Sweden
USA
```

Categorical arrays provide significant memory savings. We will convert Cylinders to an ordinal array, which contains ordering information. The variable Origin will be converted to a nominal array, which does not store ordering.

```clc
Cylinders_cat = ordinal(Cylinders);
Origin_cat = nominal(Origin);

whos Cylinders* Origin*
```
```  Name                 Size            Bytes  Class      Attributes

Cylinders          406x1              3248  double
Cylinders_cat      406x1              1178  ordinal
Origin             406x7              5684  char
Origin_cat         406x1              1366  nominal

```

Now, let's convert the variables of the dataset object.

```cars.Cylinders = ordinal(cars.Cylinders);
cars.Origin = nominal(cars.Origin);
```

## Filtering

Dataset objects can be easily filtered by criteria.

For example, we can create a logical array that has ONEs where the origin is Germany and ZEROs where it's not Germany.

```germanyMask = cars.Origin == 'Germany'
```
```germanyMask =
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
1
1
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
```

Use mask to extract out all German cars.

```clc
```
```ans =
Acceleration    Cylinders    Displacement    Horsepower
20.5            4             97              46
14.5            4            107              90
12.5            4            121             113
20            4             97              48
14            4            116              90
19            4             97              60
23.5            4             97              54
18            4            121              76
21            4             97              46
15.5            4            116              75
14            4            114              91
16.5            4             98              83
15.5            4             79              67
14.5            4             97              78
16.5            4             90              71
14            4             90              70
15            4            115              95
16.9            4            116              81
14.2            4             90              70
12.2            4             97              71
16.7            6            168             120
14.5            4             97              78
14.1            4             97              78
12.8            4            121             110
21.5            4             90              48
15.9            5            131             103
14.9            4             89              71
14            4             89              71
20.1            5            183              77
14.7            4             98              76
15.8            4             97              78
21.7            4             90              48
23.7            4             90              48
19.9            5            121              67
21.8            4            146              67
15.3            4             89              62
14.2            4            105              74
15.3            4            105              74
24.6            4             97              52

MPG     Model          Model_Year    Origin     Weight
26    [1x36 char]    70            Germany    1835
24    [1x36 char]    70            Germany    2430
26    [1x36 char]    70            Germany    2234
NaN    [1x36 char]    71            Germany    1978
28    [1x36 char]    71            Germany    2123
27    [1x36 char]    71            Germany    1834
23    [1x36 char]    72            Germany    2254
22    [1x36 char]    72            Germany    2511
26    [1x36 char]    73            Germany    1950
24    [1x36 char]    73            Germany    2158
20    [1x36 char]    73            Germany    2582
29    [1x36 char]    74            Germany    2219
26    [1x36 char]    74            Germany    1963
26    [1x36 char]    74            Germany    2300
25    [1x36 char]    75            Germany    2223
29    [1x36 char]    75            Germany    1937
23    [1x36 char]    75            Germany    2694
25    [1x36 char]    76            Germany    2220
29    [1x36 char]    76            Germany    1937
29.5    [1x36 char]    76            Germany    1825
16.5    [1x36 char]    76            Germany    3820
29    [1x36 char]    77            Germany    1940
30.5    [1x36 char]    77            Germany    2190
21.5    [1x36 char]    77            Germany    2600
43.1    [1x36 char]    78            Germany    1985
20.3    [1x36 char]    78            Germany    2830
31.5    [1x36 char]    78            Germany    1990
31.9    [1x36 char]    79            Germany    1925
25.4    [1x36 char]    79            Germany    3530
41.5    [1x36 char]    80            Germany    2144
34.3    [1x36 char]    80            Germany    2188
44.3    [1x36 char]    80            Germany    2085
43.4    [1x36 char]    80            Germany    2335
36.4    [1x36 char]    80            Germany    2950
30    [1x36 char]    80            Germany    3250
29.8    [1x36 char]    80            Germany    1845
33    [1x36 char]    81            Germany    2190
36    [1x36 char]    82            Germany    1980
44    [1x36 char]    82            Germany    2130
```

Scatter plot grouped by the year of the make.

```gscatter(cars.MPG, cars.Weight, cars.Model_Year, '', 'xos');
xlabel('Miles per Gallon')
ylabel('Weight')
```

We notice a general trend, but the amount of data prevents us from getting useful information.

We can use filtering to refine the visualization. Let's extract out only the cars made in 1970, 1976, or 1982.

```index = cars.Model_Year == 70 | cars.Model_Year == 76 | cars.Model_Year == 82;
filtered = cars(index,:);
```

We have a more meaningful scatter plot for this smaller subset.

```gscatter(filtered.MPG, filtered.Weight, filtered.Model_Year, '', 'xos');
xlabel('Miles per Gallon')
ylabel('Weight')
```

Add interactive case names to the plot

```gname(filtered.Model);
```

## Concatenate and Join

We can combine datasets by either concatenating or joining.

Concatenate

We have a different set of data that corresponds to small cars. Let's combine this with the original dataset. First, we'll create a dataset object from this data.

```% load carsmall

% create dataset and convert variables to categorical arrays
cars_s = dataset(cs);
cars_s.Origin = nominal(cars_s.Origin);
cars_s.Cylinders = ordinal(cars_s.Cylinders);

cars_s.Cylinders = reorderlevels(cars_s.Cylinders, getlabels(cars.Cylinders));
```
```Warning: Ignoring duplicate levels in NEWLEVELS.
```

Concatenate using the matrix concatenation notation.

```cars_all = [cars; cars_s];
% alternatively,
% cars_all = vertcat(cars, cars_s);
```

Join

Joining allows you to take the data in one dataset array and assign it to the rows of another dataset array, based on matching values in a common key variable.

```clc
tabulate(cars_all.Origin);
```
```    Value    Count   Percent
England        1      0.20%
France       18      3.56%
Germany       48      9.49%
Italy        9      1.78%
Japan       94     18.58%
Sweden       13      2.57%
USA      323     63.83%
```

Create a new dataset that maps countries to continents.

```clc
Newdata = dataset(...
{nominal({'England';'France';'Germany';'Italy' ;'Japan';'Sweden';'USA'          }),'Origin'   }, ...
{nominal({'Europe' ;'Europe';'Europe' ;'Europe';'Asia' ;'Europe';'North America'}),'Continent'})
```
```Newdata =
Origin     Continent
England    Europe
France     Europe
Germany    Europe
Italy      Europe
Japan      Asia
Sweden     Europe
USA        North America
```

Join the two datasets to include Continent as a new variable.

```cars_all = join(cars_all, Newdata);

clc
cars_all(1:10:100, :)
```
```ans =
Acceleration    Cylinders    Displacement    Horsepower
12            8            307             130
17.5            4            133             115
15            4            113              95
15            6            199              90
13            6            232             100
12            8            400             170
19            4             71              65
12            8            400             175
14            8            307             130
15            4             98              80

MPG    Model          Model_Year    Origin    Weight
18    [1x36 char]    70            USA       3504
NaN    [1x36 char]    70            France    3090
24    [1x36 char]    70            Japan     2372
21    [1x36 char]    70            USA       2648
19    [1x36 char]    71            USA       2634
13    [1x36 char]    71            USA       4746
31    [1x36 char]    71            Japan     1773
14    [1x36 char]    72            USA       4385
13    [1x36 char]    72            USA       4098
28    [1x36 char]    72            USA       2164

Continent
North America
Europe
Asia
North America
North America
North America
Asia
North America
North America
North America
```

## Dealing with Missing Data

Notice that we have some missing data in our MPG data

```clc
cars(5:20, 'MPG')
```
```ans =
MPG
17
15
14
14
14
15
NaN
NaN
NaN
NaN
NaN
15
14
NaN
15
14
```

One way to deal with missing data is to substitute for the missing value. In this case, we will create a regression model to represent the performance measures (MPG) as functions of possible predictor variables (acceleration, cylinders, horsepower, model year, and weight)

```X = [ones(length(cars.MPG),1) cars.Acceleration, double(cars.Cylinders), ...
cars.Displacement, cars.Horsepower, cars.Model_Year, cars.Weight];
Y = [cars.MPG];
[b,bint,r,rint,stats] = regress(Y, X);
```

Note that cars.Horsepower contains NaNs. The regress function performs listwise deletion on the independent variables

```cars.regress = X * b;
fprintf('R-squared: %f\n', stats(1));
```
```R-squared: 0.814178
```

Examine the residual.

```residuals = cars.MPG - cars.regress;
stem(cars.regress, residuals);
xlabel('model'); ylabel('actual - model');
```

For cars with low or high MPG, the model seems to underestimate the MPG, while for cars in the middle, the model overestimates the true value.

```gname(cars.Model)
```

## Clean Up Our Model

We can potentially improve the model by adding dummy variables to handle diesels, automatic transmissions, and station wagons. In addition, we can filter out 3 and 5 cylinder engines, which are rotary engines.

Dummy variable is a binary variable that has a "1" where it satisfies the criteria and "0" everywhere else.

```% Load in as a dataset object
ds = dataset('file','dummy.txt');

% Concatenate
carsall = [cars, ds];
carsall = set(carsall, 'Units', [get(cars, 'Units'), {'', '', '', ''}]);

% Filter out 3- and 5- cylinder engines
index = carsall.Cylinders == '4' | carsall.Cylinders == '6' | carsall.Cylinders == '8';
carsall = carsall(index,:);
```

## Create a New Regression Model

Create a new regression model just by looking at 4, 6, and 8 cylinders and taking into account the car type (station wagon, diesel, automatic).

```X = [ones(length(carsall.MPG),1), carsall.Acceleration, ...
double(carsall.Cylinders), carsall.Displacement, carsall.Horsepower, ...
carsall.Model_Year, carsall.Weight, carsall.SW, carsall.Diesel, ...
carsall.Automatic];
Y = [carsall.MPG];
[b, bint, r, rint, stats] = regress(Y, X);

carsall.regress = X * b;

residuals2 = carsall.MPG - carsall.regress;
stem(carsall.regress, residuals2)
xlabel('model'); ylabel('actual - model');
gname(carsall.Model)
```

## Multivariate Analysis of Variance

Multivariate analysis of variance to see how similar the cars from various countries are, in terms of MPG, Acceleration, Weight, and Displacement.

```X = [carsall.MPG, carsall.Acceleration, carsall.Weight, carsall.Displacement];
[d, p, stats] = manova1(X, carsall.Origin);
manovacluster(stats)
```

We see that Japanese and German cars are quite similar, and they are very different from English and American cars

Let's add another dummy variable that distinguished Japanese and German cars. Then redo the regression.

```carsall.dummy = (carsall.Origin == 'Germany' | carsall.Origin == 'Japan');

X = [ones(length(carsall.MPG),1), carsall.Acceleration, ...
double(carsall.Cylinders), carsall.Displacement, carsall.Horsepower, ...
carsall.Model_Year, carsall.Weight, carsall.SW, carsall.Diesel, ...
carsall.Automatic carsall.dummy];
Y = [carsall.MPG];
[b, bint, r, rint, stats] = regress(Y, X);

carsall.regress = X * b;

% Inspect once again
residuals2 = carsall.MPG - carsall.regress;
stem(carsall.regress, residuals2)
xlabel('model'); ylabel('actual - model');
gname(carsall.Model)
```

## Robust Regression

We can also perform robust regression to deal with the outliers that may exist in the dataset.

```X2 = [carsall.Acceleration, double(carsall.Cylinders), ...
carsall.Displacement, carsall.Horsepower, carsall.Model_Year, ...
carsall.Weight, carsall.SW, carsall.Diesel, carsall.Automatic, carsall.dummy];
[robustbeta, stats] = robustfit(X2, Y)
X3 = [ones(length(carsall.MPG),1), X2];

carsall.regress2 = X3 * robustbeta;
```
```robustbeta =
-4.1872
-0.2086
-1.6006
0.0126
-0.0166
0.6444
-0.0048
0.1167
12.4719
-2.6949
1.7508
stats =
ols_s: 2.9910
robust_s: 2.8012
s: 2.8477
resid: [399x1 double]
rstud: [399x1 double]
se: [11x1 double]
covb: [11x11 double]
coeffcorr: [11x11 double]
t: [11x1 double]
p: [11x1 double]
w: [399x1 double]
R: [11x11 double]
dfe: 374
h: [399x1 double]
```

## Perform Regression Substitution

We have been looking at linear regressions so far, but we might be able to apply some nonlinear regressions to get a better predictive model.

Once we have a regression model, we can go ahead substitute the missing values with the model data.

```carsall.mask = isnan(carsall.MPG);

carsall(5:20, 'MPG')
```
```ans =
MPG
17
15
14
14
14
15
18.881
12.256
13.095
12.597
13.732
15
14
16.498
15
14
```