Documentation Center

  • Trial Software
  • Product Updates

Calculations on Dataset Arrays

This example shows how to perform calculations on dataset arrays.

Load sample data.

Navigate to the folder containing sample data. Import the data from the comma-separated text file testScores.csv.

cd(matlabroot)
cd('help/toolbox/stats/examples')
ds = dataset('File','testScores.csv','Delimiter',',')
ds = 

    LastName          Sex             Test1    Test2    Test3    Test4
    'HOWARD'          'male'          90       87       93       92   
    'WARD'            'male'          87       85       83       90   
    'TORRES'          'male'          86       85       88       86   
    'PETERSON'        'female'        75       80       72       77   
    'GRAY'            'female'        89       86       87       90   
    'RAMIREZ'         'female'        96       92       98       95   
    'JAMES'           'male'          78       75       77       77   
    'WATSON'          'female'        91       94       92       90   
    'BROOKS'          'female'        86       83       85       89   
    'KELLY'           'male'          79       76       82       80   

There are 4 test scores for each of 10 students, in wide format.

Average dataset array variables.

Compute the average (mean) test score for each student in the dataset array, and store it in a new variable, TestAvg. Test scores are in columns 3 to 6.

Use double to convert the specified dataset array variables into a numeric array. Then, calculate the mean across the second dimension (across columns) to get the test average for each student.

ds.TestAvg = mean(double(ds(:,3:6)),2);
ds(:,{'LastName','TestAvg'})
ans = 

    LastName          TestAvg
    'HOWARD'           90.5  
    'WARD'            86.25  
    'TORRES'          86.25  
    'PETERSON'           76  
    'GRAY'               88  
    'RAMIREZ'         95.25  
    'JAMES'           76.75  
    'WATSON'          91.75  
    'BROOKS'          85.75  
    'KELLY'           79.25  

Summarize the dataset array using a grouping variable.

Compute the mean and maximum average test scores for each gender.

stats = grpstats(ds,'Sex',{'mean','max'},'DataVars','TestAvg')
stats = 

              Sex             GroupCount    mean_TestAvg    max_TestAvg
    male      'male'          5              83.8            90.5      
    female    'female'        5             87.35           95.25      

This returns a new dataset array containing the specified summary statistics for each level of the grouping variable, Sex.

Replace data values.

The denominator for each test score is 100. Convert the test score denominator to 25.

scores = double(ds(:,3:6));
newScores = scores*25/100;
ds = replacedata(ds,newScores,3:6)
ds = 

    LastName          Sex             Test1    Test2    Test3    Test4    TestAvg
    'HOWARD'          'male'           22.5    21.75    23.25       23     90.5  
    'WARD'            'male'          21.75    21.25    20.75     22.5    86.25  
    'TORRES'          'male'           21.5    21.25       22     21.5    86.25  
    'PETERSON'        'female'        18.75       20       18    19.25       76  
    'GRAY'            'female'        22.25     21.5    21.75     22.5       88  
    'RAMIREZ'         'female'           24       23     24.5    23.75    95.25  
    'JAMES'           'male'           19.5    18.75    19.25    19.25    76.75  
    'WATSON'          'female'        22.75     23.5       23     22.5    91.75  
    'BROOKS'          'female'         21.5    20.75    21.25    22.25    85.75  
    'KELLY'           'male'          19.75       19     20.5       20    79.25  

The first two lines of code extract the test data and perform the desired calculation. Then, replacedata inserts the new test scores back into the dataset array.

The variable of test score averages, TestAvg, is now the final score for each student.

Change variable name.

Change the variable name to Final.

ds.Properties.VarNames{end} = 'Final';
ds
ds = 

    LastName          Sex             Test1    Test2    Test3    Test4    Final
    'HOWARD'          'male'           22.5    21.75    23.25       23     90.5
    'WARD'            'male'          21.75    21.25    20.75     22.5    86.25
    'TORRES'          'male'           21.5    21.25       22     21.5    86.25
    'PETERSON'        'female'        18.75       20       18    19.25       76
    'GRAY'            'female'        22.25     21.5    21.75     22.5       88
    'RAMIREZ'         'female'           24       23     24.5    23.75    95.25
    'JAMES'           'male'           19.5    18.75    19.25    19.25    76.75
    'WATSON'          'female'        22.75     23.5       23     22.5    91.75
    'BROOKS'          'female'         21.5    20.75    21.25    22.25    85.75
    'KELLY'           'male'          19.75       19     20.5       20    79.25

See Also

| | |

Related Examples

More About

Was this topic helpful?