# Documentation

## Calculations on Dataset Arrays

This example shows how to perform calculations on dataset arrays.

Navigate to the folder containing sample data. Import the data from the comma-separated text file `testScores.csv`.

```cd(matlabroot) cd('help/toolbox/stats/examples') ds = dataset('File','testScores.csv','Delimiter',',')```
```ds = LastName Sex Test1 Test2 Test3 Test4 'HOWARD' 'male' 90 87 93 92 'WARD' 'male' 87 85 83 90 'TORRES' 'male' 86 85 88 86 'PETERSON' 'female' 75 80 72 77 'GRAY' 'female' 89 86 87 90 'RAMIREZ' 'female' 96 92 98 95 'JAMES' 'male' 78 75 77 77 'WATSON' 'female' 91 94 92 90 'BROOKS' 'female' 86 83 85 89 'KELLY' 'male' 79 76 82 80 ```

There are 4 test scores for each of 10 students, in wide format.

Average dataset array variables.

Compute the average (mean) test score for each student in the dataset array, and store it in a new variable, `TestAvg`. Test scores are in columns 3 to 6.

Use `double` to convert the specified dataset array variables into a numeric array. Then, calculate the mean across the second dimension (across columns) to get the test average for each student.

```ds.TestAvg = mean(double(ds(:,3:6)),2); ds(:,{'LastName','TestAvg'})```
```ans = LastName TestAvg 'HOWARD' 90.5 'WARD' 86.25 'TORRES' 86.25 'PETERSON' 76 'GRAY' 88 'RAMIREZ' 95.25 'JAMES' 76.75 'WATSON' 91.75 'BROOKS' 85.75 'KELLY' 79.25 ```

Summarize the dataset array using a grouping variable.

Compute the mean and maximum average test scores for each gender.

`stats = grpstats(ds,'Sex',{'mean','max'},'DataVars','TestAvg')`
```stats = Sex GroupCount mean_TestAvg max_TestAvg male 'male' 5 83.8 90.5 female 'female' 5 87.35 95.25 ```

This returns a new dataset array containing the specified summary statistics for each level of the grouping variable, `Sex`.

Replace data values.

The denominator for each test score is 100. Convert the test score denominator to 25.

```scores = double(ds(:,3:6)); newScores = scores*25/100; ds = replacedata(ds,newScores,3:6)```
```ds = LastName Sex Test1 Test2 Test3 Test4 TestAvg 'HOWARD' 'male' 22.5 21.75 23.25 23 90.5 'WARD' 'male' 21.75 21.25 20.75 22.5 86.25 'TORRES' 'male' 21.5 21.25 22 21.5 86.25 'PETERSON' 'female' 18.75 20 18 19.25 76 'GRAY' 'female' 22.25 21.5 21.75 22.5 88 'RAMIREZ' 'female' 24 23 24.5 23.75 95.25 'JAMES' 'male' 19.5 18.75 19.25 19.25 76.75 'WATSON' 'female' 22.75 23.5 23 22.5 91.75 'BROOKS' 'female' 21.5 20.75 21.25 22.25 85.75 'KELLY' 'male' 19.75 19 20.5 20 79.25 ```

The first two lines of code extract the test data and perform the desired calculation. Then, `replacedata` inserts the new test scores back into the dataset array.

The variable of test score averages, `TestAvg`, is now the final score for each student.

Change variable name.

Change the variable name to `Final`.

```ds.Properties.VarNames{end} = 'Final'; ds```
```ds = LastName Sex Test1 Test2 Test3 Test4 Final 'HOWARD' 'male' 22.5 21.75 23.25 23 90.5 'WARD' 'male' 21.75 21.25 20.75 22.5 86.25 'TORRES' 'male' 21.5 21.25 22 21.5 86.25 'PETERSON' 'female' 18.75 20 18 19.25 76 'GRAY' 'female' 22.25 21.5 21.75 22.5 88 'RAMIREZ' 'female' 24 23 24.5 23.75 95.25 'JAMES' 'male' 19.5 18.75 19.25 19.25 76.75 'WATSON' 'female' 22.75 23.5 23 22.5 91.75 'BROOKS' 'female' 21.5 20.75 21.25 22.25 85.75 'KELLY' 'male' 19.75 19 20.5 20 79.25```