Stack or Unstack Dataset Arrays

This example shows how to reformat dataset arrays between wide and tall (or long) format using stack and unstack.

Load sample data.

Navigate to the folder containing sample data. Import the data from the comma-separated text file testScores.csv.

cd(matlabroot)
cd('help/toolbox/stats/examples')
ds = dataset('File','testScores.csv','Delimiter',',')
ds = 

    LastName          Sex             Test1    Test2    Test3    Test4
    'HOWARD'          'male'          90       87       93       92   
    'WARD'            'male'          87       85       83       90   
    'TORRES'          'male'          86       85       88       86   
    'PETERSON'        'female'        75       80       72       77   
    'GRAY'            'female'        89       86       87       90   
    'RAMIREZ'         'female'        96       92       98       95   
    'JAMES'           'male'          78       75       77       77   
    'WATSON'          'female'        91       94       92       90   
    'BROOKS'          'female'        86       83       85       89   
    'KELLY'           'male'          79       76       82       80   

Each of the 10 students has 4 test scores, displayed here in wide format.

Perform calculations on dataset array.

With the data in this format, you can, for example, calculate the average test score for each student. The test scores are in columns 3 to 6.

ds.TestAve = mean(double(ds(:,3:6)),2);
ds(:,{'LastName','Sex','TestAve'})
ans = 

    LastName          Sex             TestAve
    'HOWARD'          'male'           90.5  
    'WARD'            'male'          86.25  
    'TORRES'          'male'          86.25  
    'PETERSON'        'female'           76  
    'GRAY'            'female'           88  
    'RAMIREZ'         'female'        95.25  
    'JAMES'           'male'          76.75  
    'WATSON'          'female'        91.75  
    'BROOKS'          'female'        85.75  
    'KELLY'           'male'          79.25  

A new variable with average test scores is added to the dataset array, ds.

Reformat the dataset array into tall format.

Stack the test score variables into a new variable, Scores.

 dsTall = stack(ds,{'Test1','Test2','Test3','Test4'},...
            'newDataVarName','Scores')
dsTall = 

    LastName          Sex             TestAve    Scores_Indicator    Scores
    'HOWARD'          'male'           90.5      Test1               90    
    'HOWARD'          'male'           90.5      Test2               87    
    'HOWARD'          'male'           90.5      Test3               93    
    'HOWARD'          'male'           90.5      Test4               92    
    'WARD'            'male'          86.25      Test1               87    
    'WARD'            'male'          86.25      Test2               85    
    'WARD'            'male'          86.25      Test3               83    
    'WARD'            'male'          86.25      Test4               90    
    'TORRES'          'male'          86.25      Test1               86    
    'TORRES'          'male'          86.25      Test2               85    
    'TORRES'          'male'          86.25      Test3               88    
    'TORRES'          'male'          86.25      Test4               86    
    'PETERSON'        'female'           76      Test1               75    
    'PETERSON'        'female'           76      Test2               80    
    'PETERSON'        'female'           76      Test3               72    
    'PETERSON'        'female'           76      Test4               77    
    'GRAY'            'female'           88      Test1               89    
    'GRAY'            'female'           88      Test2               86    
    'GRAY'            'female'           88      Test3               87    
    'GRAY'            'female'           88      Test4               90    
    'RAMIREZ'         'female'        95.25      Test1               96    
    'RAMIREZ'         'female'        95.25      Test2               92    
    'RAMIREZ'         'female'        95.25      Test3               98    
    'RAMIREZ'         'female'        95.25      Test4               95    
    'JAMES'           'male'          76.75      Test1               78    
    'JAMES'           'male'          76.75      Test2               75    
    'JAMES'           'male'          76.75      Test3               77    
    'JAMES'           'male'          76.75      Test4               77    
    'WATSON'          'female'        91.75      Test1               91    
    'WATSON'          'female'        91.75      Test2               94    
    'WATSON'          'female'        91.75      Test3               92    
    'WATSON'          'female'        91.75      Test4               90    
    'BROOKS'          'female'        85.75      Test1               86    
    'BROOKS'          'female'        85.75      Test2               83    
    'BROOKS'          'female'        85.75      Test3               85    
    'BROOKS'          'female'        85.75      Test4               89    
    'KELLY'           'male'          79.25      Test1               79    
    'KELLY'           'male'          79.25      Test2               76    
    'KELLY'           'male'          79.25      Test3               82    
    'KELLY'           'male'          79.25      Test4               80    

The original test variable names, Test1, Test2, Test3, and Test4, appear as levels in the combined test scores indicator variable, Scores_Indicator.

Plot data grouped by category.

With the data in this format, you can use Scores_Indicator as a grouping variable, and draw box plots of test scores grouped by test.

figure()
boxplot(dsTall.Scores,dsTall.Scores_Indicator)

Reformat the dataset array into wide format.

Reformat dsTall back into its original wide format.

dsWide = unstack(dsTall,'Scores','Scores_Indicator');
dsWide(:,{'LastName','Test1','Test2','Test3','Test4'})
ans = 

    LastName          Test1    Test2    Test3    Test4
    'HOWARD'          90       87       93       92   
    'WARD'            87       85       83       90   
    'TORRES'          86       85       88       86   
    'PETERSON'        75       80       72       77   
    'GRAY'            89       86       87       90   
    'RAMIREZ'         96       92       98       95   
    'JAMES'           78       75       77       77   
    'WATSON'          91       94       92       90   
    'BROOKS'          86       83       85       89   
    'KELLY'           79       76       82       80    

The dataset array is back in wide format. unstack reassigns the levels of the indicator variable, Scores_Indicator, as variable names in the unstacked dataset array.

See Also

| | |

Related Examples

More About

Was this topic helpful?