# Calculations on Data in Tables

This example shows how to perform calculations on data in tables. Tables can contain both numeric and nonnumeric data. After you identify the table variables that contain numeric data, you can access the data in those variables by using either curly braces or dot notation. Then you can perform arithmetic operations or call functions on the numeric data and assign the result back into the table, all in one line of code. You also can use the `rowfun`

function for calculations across the rows of a table and the `varfun`

function for calculations along the variables. If your table has groups of data within it, you can use the `groupsummary`

, `rowfun`

, and `varfun`

functions to perform calculations for each group in the table.

### Read Sample Data into Table

Read data from a CSV (comma-separated values) file, `testScores.csv`

, into a table by using the `readtable`

function. The sample file contains test scores for 10 students who attend two different schools. The output table contains variables that have numeric data and other variables that have text data. One of these variables, `School`

, has a fixed set of values or categories. These categories denote two groups of students within this table. Convert `School`

to a `categorical`

variable.

scores = readtable("testScores.csv","TextType","string"); scores.School = categorical(scores.School)

`scores=`*10×5 table*
LastName School Test1 Test2 Test3
__________ __________ _____ _____ _____
"Jeong" XYZ School 90 87 93
"Collins" XYZ School 87 85 83
"Torres" XYZ School 86 85 88
"Phillips" ABC School 75 80 72
"Ling" ABC School 89 86 87
"Ramirez" ABC School 96 92 98
"Lee" XYZ School 78 75 77
"Walker" ABC School 91 94 92
"Garcia" ABC School 86 83 85
"Chang" XYZ School 79 76 82

### Create Subtable with Numeric Data

Create a subtable that has only the numeric data. Because the first two variables have nonnumeric data, one straightforward way is to index into the table specifying the other variables.

numericScores = scores(:,3:end)

`numericScores=`*10×3 table*
Test1 Test2 Test3
_____ _____ _____
90 87 93
87 85 83
86 85 88
75 80 72
89 86 87
96 92 98
78 75 77
91 94 92
86 83 85
79 76 82

As an alternative, you can specify variables by data type using the `vartype`

function. This function is useful when you have a large table with many variables that have different data types. It returns a subscript that you can use to specify table variables.

`numericVars = vartype("numeric")`

numericVars = table vartype subscript: Select table variables matching the type 'numeric' See Access Data in a Table.

numericScores = scores(:,numericVars)

`numericScores=`*10×3 table*
Test1 Test2 Test3
_____ _____ _____
90 87 93
87 85 83
86 85 88
75 80 72
89 86 87
96 92 98
78 75 77
91 94 92
86 83 85
79 76 82

### Perform Arithmetic and Call Functions on Table Data

The test scores are based on a 100-point scale. To convert them to scores on a different scale, multiply the scores by the same factor.

However, you cannot use the same syntax that you would use for a numeric array. Instead, extract the data into an array, scale it, and assign it back into the table. In this way, you can perform arithmetic operations or call functions on the contents of a table in one line of code.

To extract the *contents* of a table, use curly braces. For example, this syntax specifies all rows and all table variables of `numericScores`

and returns the result as a numeric matrix. As long as the table variables have compatible data types, you can concatenate their contents in this way.

A = numericScores{:,:}

`A = `*10×3*
90 87 93
87 85 83
86 85 88
75 80 72
89 86 87
96 92 98
78 75 77
91 94 92
86 83 85
79 76 82

If a matrix has the same size as a table and a compatible data type, you can assign the matrix to the contents of the table.

Scale the numeric data so the test scores are on a 25-point scale, and assign it back into the table.

numericScores{:,:} = numericScores{:,:} * 0.25

`numericScores=`*10×3 table*
Test1 Test2 Test3
_____ _____ _____
22.5 21.75 23.25
21.75 21.25 20.75
21.5 21.25 22
18.75 20 18
22.25 21.5 21.75
24 23 24.5
19.5 18.75 19.25
22.75 23.5 23
21.5 20.75 21.25
19.75 19 20.5

As an alternative to curly braces, you can use * T.Variables* to specify all table variables.

*represents the same thing as*

`T.Variables`

*syntax.*

`T{:,:}`

For example, subtract the minimum value within each table variable from that variable.

numericScores.Variables = numericScores.Variables - min(numericScores.Variables)

`numericScores=`*10×3 table*
Test1 Test2 Test3
_____ _____ _____
3.75 3 5.25
3 2.5 2.75
2.75 2.5 4
0 1.25 0
3.5 2.75 3.75
5.25 4.25 6.5
0.75 0 1.25
4 4.75 5
2.75 2 3.25
1 0.25 2.5

You can perform calculations on one variable at a time by using dot notation and variable names. For example, add a correction worth five points to the last set of scores in `Test3`

. The other table variables are unaffected by this operation.

numericScores.Test3 = numericScores.Test3 + 5

`numericScores=`*10×3 table*
Test1 Test2 Test3
_____ _____ _____
3.75 3 10.25
3 2.5 7.75
2.75 2.5 9
0 1.25 5
3.5 2.75 8.75
5.25 4.25 11.5
0.75 0 6.25
4 4.75 10
2.75 2 8.25
1 0.25 7.5

### Calculate Across Each Table Row

Find the mean, minimum, and maximum values of the test scores for each student. Calculate these values across each row. Assign them to the original table as new table variables.

One simple, useful way is to extract the data into a matrix, call a function on it, and then assign the output to a new table variable. For example, calculate the mean test scores across each row. Then add them to `scores`

in a new table variable, `TestMean`

. Use curly braces to extract the numeric data from `Test1`

, `Test2`

, and `Test3`

into a matrix. To calculate the mean across rows, specify the dimension as `2`

when you call `mean`

.

vars = ["Test1","Test2","Test3"]; scores.TestMean = mean(scores{:,vars},2)

`scores=`*10×6 table*
LastName School Test1 Test2 Test3 TestMean
__________ __________ _____ _____ _____ ________
"Jeong" XYZ School 90 87 93 90
"Collins" XYZ School 87 85 83 85
"Torres" XYZ School 86 85 88 86.333
"Phillips" ABC School 75 80 72 75.667
"Ling" ABC School 89 86 87 87.333
"Ramirez" ABC School 96 92 98 95.333
"Lee" XYZ School 78 75 77 76.667
"Walker" ABC School 91 94 92 92.333
"Garcia" ABC School 86 83 85 84.667
"Chang" XYZ School 79 76 82 79

Another way to perform calculations across rows is to use the `rowfun`

function. You do not need to extract data from the table when using `rowfun`

. Instead, pass the table and a function to apply to the data as input arguments to `rowfun`

. While the syntax is a little more complex, `rowfun`

can be useful when the function that you apply takes multiple input arguments or returns multiple output arguments.

For example, use the `bounds`

function to find the minimum and maximum test scores. The `bounds`

function returns two output arguments, so apply it to `scores`

by using `rowfun`

. The output of `rowfun`

is a new table that has `TestMin`

and `TestMax`

variables. In this case, also specify `"SeparateInputs"`

as `false`

so that values across each row are combined into a vector before being passed to `bounds`

.

minmaxTest = rowfun(@bounds, ... scores, ... "InputVariables",vars, ... "OutputVariableNames",["TestMin","TestMax"], ... "SeparateInputs",false)

`minmaxTest=`*10×2 table*
TestMin TestMax
_______ _______
87 93
83 87
85 88
72 80
86 89
92 98
75 78
91 94
83 86
76 82

Concatenate `scores`

and `minmaxTest`

so that these values are in one table.

scores = [scores minmaxTest]

`scores=`*10×8 table*
LastName School Test1 Test2 Test3 TestMean TestMin TestMax
__________ __________ _____ _____ _____ ________ _______ _______
"Jeong" XYZ School 90 87 93 90 87 93
"Collins" XYZ School 87 85 83 85 83 87
"Torres" XYZ School 86 85 88 86.333 85 88
"Phillips" ABC School 75 80 72 75.667 72 80
"Ling" ABC School 89 86 87 87.333 86 89
"Ramirez" ABC School 96 92 98 95.333 92 98
"Lee" XYZ School 78 75 77 76.667 75 78
"Walker" ABC School 91 94 92 92.333 91 94
"Garcia" ABC School 86 83 85 84.667 83 86
"Chang" XYZ School 79 76 82 79 76 82

### Calculate Along Each Table Variable

Find the mean score for each test. Calculate these values along the table variables.

The simplest way is to use `mean`

. First use curly braces to extract the numeric data from `Test1`

, `Test2`

, and `Test3`

into a matrix. Then call `mean`

to calculate the mean of each column of the matrix. The output is a numeric vector where each element is the mean of a table variable.

vars = ["Test1","Test2","Test3"]; meanOfEachTest = mean(scores{:,vars})

`meanOfEachTest = `*1×3*
85.7000 84.3000 85.7000

Another way to perform calculations along table variables is to use the `varfun`

function. You do not need to extract data from the table when using `varfun`

. Instead, pass the table and a function to apply to the data as input arguments to `varfun`

.

Calculate the mean scores using `varfun`

. The output is a new table with meaningful names for the table variables.

meanOfEachTest = varfun(@mean, ... scores, ... "InputVariables",vars)

`meanOfEachTest=`*1×3 table*
mean_Test1 mean_Test2 mean_Test3
__________ __________ __________
85.7 84.3 85.7

### Calculate Using Groups of Data Within Table

If your table has one or more *grouping variables*, then you can perform calculations on groups of data within the table. You can use the values in a grouping variable to specify the groups that the rows belong to.

For example, the `School`

variable in `scores`

has two values, `ABC School`

and `XYZ School`

. You can think of these two values as categories that denote groups of data in `scores`

. In this case, you can perform calculations by school.

To apply a function and use grouping variables, you can use the `varfun`

function. You can specify a function, such as `mean`

, and then use `varfun`

to apply it to each table variable that you specify. When you also specify grouping variables, `varfun`

applies the function to each group within each table variable.

Calculate the mean score for each test by school.

vars = ["Test1","Test2","Test3"]; meanScoresBySchool = varfun(@mean, ... scores, ... "InputVariables",vars, ... "GroupingVariables","School")

`meanScoresBySchool=`*2×5 table*
School GroupCount mean_Test1 mean_Test2 mean_Test3
__________ __________ __________ __________ __________
ABC School 5 87.4 87 86.8
XYZ School 5 84 81.6 84.6

Starting in R2018a, you also can use the `groupsummary`

function to perform calculations on groups of data in each table variable.

meanScoresBySchool = groupsummary(scores,"School","mean",vars)

`meanScoresBySchool=`*2×5 table*
School GroupCount mean_Test1 mean_Test2 mean_Test3
__________ __________ __________ __________ __________
ABC School 5 87.4 87 86.8
XYZ School 5 84 81.6 84.6

The syntax for `groupsummary`

is a bit simpler. Also, you can use `groupsummary`

to specify multiple methods at once. For example, find both the minimum and maximum scores of each test by school.

minmaxBySchool = groupsummary(scores,"School",["min","max"],vars)

`minmaxBySchool=`*2×8 table*
School GroupCount min_Test1 max_Test1 min_Test2 max_Test2 min_Test3 max_Test3
__________ __________ _________ _________ _________ _________ _________ _________
ABC School 5 75 96 80 94 72 98
XYZ School 5 78 90 75 87 77 93

To use all the predefined methods of `groupsummary`

, specify `"all"`

as the method. Calculate all statistics on the mean test score by school.

allStatsBySchool = groupsummary(scores,"School","all","TestMean")

`allStatsBySchool=`*2×13 table*
School GroupCount mean_TestMean sum_TestMean min_TestMean max_TestMean range_TestMean median_TestMean mode_TestMean var_TestMean std_TestMean nummissing_TestMean nnz_TestMean
__________ __________ _____________ ____________ ____________ ____________ ______________ _______________ _____________ ____________ ____________ ___________________ ____________
ABC School 5 87.067 435.33 75.667 95.333 19.667 87.333 75.667 57.967 7.6136 0 5
XYZ School 5 83.4 417 76.667 90 13.333 85 76.667 29.856 5.464 0 5

Sometimes you might want to find a particular value in one table variable and then find the corresponding value in another table variable. In such cases use `rowfun`

.

For example, find the student in each school who had the highest mean test score. The attached supporting function, `findNameAtMax`

, returns both the highest score and the name of the student who had that score. Use `rowfun`

to apply `findNameAtMax`

to each group of students. The `rowfun`

function is suitable because `findNameAtMax`

has multiple input arguments (last names and test scores) and also returns multiple output arguments.

maxScoresBySchool = rowfun(@findNameAtMax, ... scores, ... "InputVariables",["LastName","TestMean"], ... "GroupingVariables","School", ... "OutputVariableNames",["max_TestMean","LastName"])

`maxScoresBySchool=`*2×4 table*
School GroupCount max_TestMean LastName
__________ __________ ____________ _________
ABC School 5 95.333 "Ramirez"
XYZ School 5 90 "Jeong"

### Supporting Function

function [maxValue,lastName] = findNameAtMax(names,values) % Return maximum value and the last name % from the row at which the maximum value occurred [maxValue,maxIndex] = max(values); lastName = names(maxIndex); end

## See Also

`table`

| `rowfun`

| `varfun`

| `groupsummary`

| `readtable`

| `vartype`