Main Content

varfun

Apply function to table or timetable variables

Description

example

B = varfun(func,A) applies the function func separately to each variable of the table or timetable A and returns the results in the table or timetable B.

The function func must take one input argument and return arrays with the same number of rows each time it is called. The ith value in the output argument, B{:,i}, is equal to func(A{:,i}).

example

B = varfun(func,A,Name,Value) applies the function func separately to each variable of the table or timetable A with additional options specified by one or more Name,Value arguments.

For example, you can use the "GroupingVariables" name-value argument to carry out calculations on groups of data within table variables. For more information about calculations on groups of data, see Calculations on Groups of Data.

Examples

collapse all

Apply an element-wise function to the variables of a table.

Create a table containing numeric variables.

A = table([10.71;-2.05;-0.35;-0.82;1.57],[9.23;3.12;-1.18;0.23;16.41])
A=5×2 table
    Var1     Var2 
    _____    _____

    10.71     9.23
    -2.05     3.12
    -0.35    -1.18
    -0.82     0.23
     1.57    16.41

Round the numeric values in A by using the round function. To specify a function as an input argument to varfun, use the @ symbol. The variables of the output table B have names based on the function and the variable names from A.

B = varfun(@round,A)
B=5×2 table
    round_Var1    round_Var2
    __________    __________

        11             9    
        -2             3    
         0            -1    
        -1             0    
         2            16    

You can apply a function, such as sum or max, that reduces table variables along the first dimension. For example, use varfun to calculate the mean of each variable in a table.

Create a table that contains numeric variables.

A = table([0.71;-2.05;-0.35;-0.82;1.57],[0.23;0.12;-0.18;0.23;0.41])
A=5×2 table
    Var1     Var2 
    _____    _____

     0.71     0.23
    -2.05     0.12
    -0.35    -0.18
    -0.82     0.23
     1.57     0.41

Specify func as the mean function. To specify func as a function handle, use the @ symbol.

func = @mean
func = function_handle with value:
    @mean

Apply the function to all the variables of table A.

B = varfun(func,A)
B=1×2 table
    mean_Var1    mean_Var2
    _________    _________

     -0.188        0.162  

B is a table that contains the mean value from each variable.

To have varfun return a numeric vector instead of a table, specify "OutputFormat" as "uniform". To use the "uniform" output format, func must always return a scalar.

B = varfun(func,A,"OutputFormat","uniform")
B = 1×2

   -0.1880    0.1620

Create a table that has numeric variables with data and a nonnumeric variable that is a grouping variable. Then perform a calculation on each group within the numeric variables.

Read data from a CSV (comma-separated values) file, testScores.csv, into a table by using the readtable function. The sample file contains test scores for 10 students who attend two different schools. The output table contains variables that have numeric data and other variables that have text data. One of these variables, School, has a fixed set of values or categories. These categories denote two groups of students within this table. Convert School to a categorical variable.

scores = readtable("testScores.csv","TextType","string");
scores.School = categorical(scores.School)
scores=10×5 table
     LastName       School      Test1    Test2    Test3
    __________    __________    _____    _____    _____

    "Jeong"       XYZ School     90       87       93  
    "Collins"     XYZ School     87       85       83  
    "Torres"      XYZ School     86       85       88  
    "Phillips"    ABC School     75       80       72  
    "Ling"        ABC School     89       86       87  
    "Ramirez"     ABC School     96       92       98  
    "Lee"         XYZ School     78       75       77  
    "Walker"      ABC School     91       94       92  
    "Garcia"      ABC School     86       83       85  
    "Chang"       XYZ School     79       76       82  

Calculate the mean score for each test by school. The variables Test1, Test2, and Test3 are the numeric variables that contain data. The School variable is a grouping variable. When you specify a grouping variable, its unique values specify groups that corresponding values in the data variables belong to.

vars = ["Test1","Test2","Test3"];
meanScoresBySchool = varfun(@mean, ...
                            scores, ...
                            "InputVariables",vars, ...
                            "GroupingVariables","School")
meanScoresBySchool=2×5 table
      School      GroupCount    mean_Test1    mean_Test2    mean_Test3
    __________    __________    __________    __________    __________

    ABC School        5            87.4            87          86.8   
    XYZ School        5              84          81.6          84.6   

The output meanScoresBySchool also contains a variable called GroupCount to indicate the number of entries from the input table scores in that group.

Create a timetable containing sample data. The row times of the timetable can also define groups because row times can be duplicates.

Timestamps = datetime(2023,1,1)+days([0 1 1 2 3 3])';
A = timetable(Timestamps, ...
              [0.71;-2.05;-0.35;-0.82;1.57;0.09], ...
              [0.23;0.12;-0.18;0.23;0.41;0.02], ...
              'VariableNames',["x","y"])
A=6×2 timetable
    Timestamps       x        y  
    ___________    _____    _____

    01-Jan-2023     0.71     0.23
    02-Jan-2023    -2.05     0.12
    02-Jan-2023    -0.35    -0.18
    03-Jan-2023    -0.82     0.23
    04-Jan-2023     1.57     0.41
    04-Jan-2023     0.09     0.02

Compute the mean values of the variables in the timetable. Specify the vector of row times as the grouping variable. The output B is a timetable because the input A is a timetable. When you specify the vector of row times as the grouping variable, you cannot specify any variable as another grouping variable.

B = varfun(@mean,A,"GroupingVariables","Timestamps")
B=4×3 timetable
    Timestamps     GroupCount    mean_x    mean_y
    ___________    __________    ______    ______

    01-Jan-2023        1          0.71      0.23 
    02-Jan-2023        2          -1.2     -0.03 
    03-Jan-2023        1         -0.82      0.23 
    04-Jan-2023        2          0.83     0.215 

Input Arguments

collapse all

Function, specified as a function handle. You can specify a handle for an existing function, define the function in a file, or specify it as an anonymous function. If func corresponds to more than one function file (that is, if func represents a set of overloaded functions), MATLAB® determines which function to call based on the class of the input arguments.

Use the "OutputFormat","cell" name-value argument, if the function func take one input argument and returns arrays with a different numbers of rows each time it is called. Otherwise, func must return arrays with the same number of rows.

Example: func = @sum; calculates the sum of an input.

Example: func = @(x) x.^2; calculates the square of each element of an input.

Input table, specified as a table or a timetable.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: InputVariables=["Var2","Var3"] uses only the variables named Var2 and Var3 in A as the inputs to func.

Specifiers for selecting variables of A to pass to func, specified as "InputVariables" and a positive integer, vector of positive integers, string array, character vector, cell array of character vectors, pattern scalar, logical vector, or a function handle.

If you specify "InputVariables" as a function handle, then it must return a logical scalar, and varfun passes only the variables in A where the function returns 1 (true).

Specifiers for selecting variables of A to be grouping variables, specified as "GroupingVariables" and a positive integer, vector of positive integers, string array, character vector, cell array of character vectors, pattern scalar, or logical vector.

The unique values in the grouping variables specify groups. Rows in A where the grouping variables have the same values belong to the same groups. varfun applies func to each group of rows within each of the remaining variables of A, rather than to entire variables. For more information on calculations using grouping variables, see Calculations on Groups of Data

Grouping variables can have any of the data types listed in the table.

Values That Specify Groups

Data Type of Grouping Variable

Numbers

Numeric or logical vector

Text

String array or cell array of character vectors

Dates and times

datetime, duration, or calendarDuration vector

Categories

categorical vector

Bins

Vector of binned values, created by binning a continuous distribution of numeric, datetime, or duration values

If any grouping variable contains NaNs or missing values (such as NaTs, undefined categorical values, or missing strings), then the corresponding rows do not belong to any group, and are excluded from the output.

Row labels can be grouping variables. You can group on row labels alone, on one or more variables in A, or on row labels and variables together.

  • If A is a table, then the labels are row names.

  • If A is a timetable, then the labels are row times.

The output, B, has one row for each group of rows from the input, A. If B is a table or timetable, then B has:

  • Variables corresponding to the input table variables that func was applied to.

  • Variables corresponding to the grouping variables.

  • A new variable, GroupCount, whose values are the number of rows of the input A that are in each group.

Note: If B is a timetable, then B also has:

  • Row times, where the first row time from each group of rows in A is the corresponding row time in B. To return B as a table without row times, specify "OutputFormat" as "table".

Format of B, specified as "OutputFormat" and either the value of "auto", "table", "timetable", "uniform", or "cell".

"auto" (default) (since R2023a)

varfun returns an output whose data type matches the data type of the input A.

"table"

varfun returns a table with one variable for each variable in A (or each variable specified with "InputVariables"). For grouped calculations, B also contains the grouping variables and a new GroupCount variable.

"table" allows you to use a function that returns values of different sizes or data types for the different variables in A. However, for ungrouped calculations, func must return arrays with the same number of rows each time it is called. For grouped calculations, func must return values with the same number of rows each time it is called for a given group.

If A is a table, then this is the default output format.

"timetable"

varfun returns a timetable with one variable for each variable in A (or each variable specified with "InputVariables"). For grouped calculations, B also contains the grouping variables and a new GroupCount variable.

varfun creates the row times of B from the row times of A. If the row times assigned to B do not make sense in the context of the calculations performed using func, then specify the output format as "OutputFormat","table".

If A is a timetable, then this is the default output format.

"uniform"

varfun concatenates the output values into a vector. func must return a scalar with the same data type each time it is called.

"cell"

varfun returns the output as a cell array. "cell" allows you to use a function that returns values of different sizes or data types.

Function to call if func fails, specified as "ErrorHandler" and a function handle. Define this function so that it rethrows the error or returns valid outputs for function func.

MATLAB calls the specified error-handling function with two input arguments:

  • A structure with these fields:

    identifier

    Error identifier.

    message

    Error message text.

    index

    Index of the variable for which the error occurred.

    name

    Name of the variable for which the error occurred.

  • The set of input arguments to function func at the time of the error.

For example,

function [A, B] = errorFunc(S, varargin)
warning(S.identifier, S.message)
A = NaN; B = NaN;

Output Arguments

collapse all

Output values, returned as a table, timetable, cell array, or vector.

If B is a table or timetable, then it can store metadata such as descriptions, variable units, variable names, and row names. For more information, see the Properties sections of table or timetable.

More About

collapse all

Calculations on Groups of Data

In data analysis, you commonly perform calculations on groups of data. For such calculations, you split one or more data variables into groups of data, perform a calculation on each group, and combine the results into one or more output variables. You can specify the groups using one or more grouping variables. The unique values in the grouping variables define the groups that the corresponding values of the data variables belong to.

For example, the diagram shows a simple grouped calculation that splits a 6-by-1 numeric vector into two groups of data, calculates the mean of each group, and then combines the outputs into a 2-by-1 numeric vector. The 6-by-1 grouping variable has two unique values, AB and XYZ.

Calculation that splits a data variable based on a grouping variable, performs calculations on individual groups of data by applying the same function, and then concatenates the outputs of those function calls

You can specify grouping variables that have numbers, text, dates and times, categories, or bins.

Extended Capabilities

Thread-Based Environment
Run code in the background using MATLAB® backgroundPool or accelerate code with Parallel Computing Toolbox™ ThreadPool.

Version History

Introduced in R2013b

expand all