Main Content

rowfun

Apply function to table or timetable rows

Description

B = rowfun(func,A) applies the function func to each row of the table or timetable A and returns the results in the table or timetable B.

The number of inputs that the function func accepts must equal the number of variables in A. For example, if func must be called with two input arguments, then A must have two variables. To find the number of variables in a table, use the width function.

example

B = rowfun(func,A,Name,Value) applies the function func to each row of the table A with additional options specified by one or more Name,Value arguments.

For example, you can use the "GroupingVariables" name-value argument to carry out calculations on groups of rows. For more information about calculations on groups of data, see Calculations on Groups of Data.

Examples

collapse all

Create a table, A, with two variables of numeric data.

rng('default')
X = randi(10,[5,1]);
Y = randi(10,[5,1]);
A = table(X,Y)
A=5×2 table
    X     Y 
    __    __

     9     1
    10     3
     2     6
    10    10
     7    10

Apply the function, plus, to each row of A. The function call plus(X,Y) is equivalent to the operation X + Y. The plus function accepts two inputs and returns one output. To specify a function as an input argument to rowfun, use the @ symbol.

B = rowfun(@plus,A,"OutputVariableNames","Sum")
B=5×1 table
    Sum
    ___

    10 
    13 
     8 
    20 
    17 

Append the output table, B, to the input table, A.

C = [A B]
C=5×3 table
    X     Y     Sum
    __    __    ___

     9     1    10 
    10     3    13 
     2     6     8 
    10    10    20 
     7    10    17 

Apply a function that returns multiple outputs to the rows of a table. The rowfun function stores each output from the applied function in a variable of the output table.

Read data from a CSV (comma-separated values) file, testScores.csv, into a table by using the readtable function. The sample file contains test scores for 10 students who attend two different schools. The output table contains variables that have numeric data and other variables that have text data. One of these variables, School, has a fixed set of values or categories. These categories denote two groups of students within this table. Convert School to a categorical variable.

scores = readtable("testScores.csv","TextType","string");
scores.School = categorical(scores.School)
scores=10×5 table
     LastName       School      Test1    Test2    Test3
    __________    __________    _____    _____    _____

    "Jeong"       XYZ School     90       87       93  
    "Collins"     XYZ School     87       85       83  
    "Torres"      XYZ School     86       85       88  
    "Phillips"    ABC School     75       80       72  
    "Ling"        ABC School     89       86       87  
    "Ramirez"     ABC School     96       92       98  
    "Lee"         XYZ School     78       75       77  
    "Walker"      ABC School     91       94       92  
    "Garcia"      ABC School     86       83       85  
    "Chang"       XYZ School     79       76       82  

To find the minimum and maximum test scores across each row, use the bounds function. The bounds function returns two output arguments, so apply it to scores by using rowfun. The output of rowfun is a new table that has TestMin and TestMax variables. In this case, also specify "SeparateInputs" as false so that values across each row are combined into a vector before being passed to bounds.

vars = ["Test1","Test2","Test3"];
minmaxTest = rowfun(@bounds, ...
                    scores, ...
                    "InputVariables",vars, ...
                    "OutputVariableNames",["TestMin","TestMax"], ...
                    "SeparateInputs",false)
minmaxTest=10×2 table
    TestMin    TestMax
    _______    _______

      87         93   
      83         87   
      85         88   
      72         80   
      86         89   
      92         98   
      75         78   
      91         94   
      83         86   
      76         82   

You can append the minimum and maximum to scores.

scores = [scores minmaxTest]
scores=10×7 table
     LastName       School      Test1    Test2    Test3    TestMin    TestMax
    __________    __________    _____    _____    _____    _______    _______

    "Jeong"       XYZ School     90       87       93        87         93   
    "Collins"     XYZ School     87       85       83        83         87   
    "Torres"      XYZ School     86       85       88        85         88   
    "Phillips"    ABC School     75       80       72        72         80   
    "Ling"        ABC School     89       86       87        86         89   
    "Ramirez"     ABC School     96       92       98        92         98   
    "Lee"         XYZ School     78       75       77        75         78   
    "Walker"      ABC School     91       94       92        91         94   
    "Garcia"      ABC School     86       83       85        83         86   
    "Chang"       XYZ School     79       76       82        76         82   

Apply a function to data taken from groups of rows of the input table. The output table has one row for each group.

Read data from a CSV (comma-separated values) file, testScores.csv, into a table. The file has test scores for 10 students from two different schools.

scores = readtable("testScores.csv","TextType","string");
scores.School = categorical(scores.School)
scores=10×5 table
     LastName       School      Test1    Test2    Test3
    __________    __________    _____    _____    _____

    "Jeong"       XYZ School     90       87       93  
    "Collins"     XYZ School     87       85       83  
    "Torres"      XYZ School     86       85       88  
    "Phillips"    ABC School     75       80       72  
    "Ling"        ABC School     89       86       87  
    "Ramirez"     ABC School     96       92       98  
    "Lee"         XYZ School     78       75       77  
    "Walker"      ABC School     91       94       92  
    "Garcia"      ABC School     86       83       85  
    "Chang"       XYZ School     79       76       82  

Calculate the mean test score for each student and add it as a new table variable. One way to do that is to extract the numeric test scores and calculate the means along the second dimension. The result is a column vector that you can attach to scores as a new variable.

scores.TestMean = mean(scores{:,["Test1","Test2","Test3"]},2)
scores=10×6 table
     LastName       School      Test1    Test2    Test3    TestMean
    __________    __________    _____    _____    _____    ________

    "Jeong"       XYZ School     90       87       93           90 
    "Collins"     XYZ School     87       85       83           85 
    "Torres"      XYZ School     86       85       88       86.333 
    "Phillips"    ABC School     75       80       72       75.667 
    "Ling"        ABC School     89       86       87       87.333 
    "Ramirez"     ABC School     96       92       98       95.333 
    "Lee"         XYZ School     78       75       77       76.667 
    "Walker"      ABC School     91       94       92       92.333 
    "Garcia"      ABC School     86       83       85       84.667 
    "Chang"       XYZ School     79       76       82           79 

Find the student whose mean test score is the maximum for each school. The attached supporting function, findNameAtMax, returns both the highest score and the name of the student who had that score. To apply findNameAtMax to each group of students, use rowfun. The rowfun function is suitable because findNameAtMax has multiple input arguments (last names and test scores) and also returns multiple output arguments. The variable GroupCount in the output table indicates the number of rows in scores for each school.

maxScoresBySchool = rowfun(@findNameAtMax, ...
                           scores, ...
                           "InputVariables",["LastName","TestMean"], ...
                           "GroupingVariables","School", ...
                           "OutputVariableNames",["max_TestMean","LastName"])
maxScoresBySchool=2×4 table
      School      GroupCount    max_TestMean    LastName 
    __________    __________    ____________    _________

    ABC School        5            95.333       "Ramirez"
    XYZ School        5                90       "Jeong"  

function [maxValue,lastName] = findNameAtMax(names,values)
    % Return maximum value and the last name 
    % from the row at which the maximum value occurred
    [maxValue,maxIndex] = max(values);
    lastName = names(maxIndex);
end

Input Arguments

collapse all

Function, specified as a function handle. You can specify a handle for an existing function, define the function in a file, or specify it as an anonymous function. If func corresponds to more than one function file (that is, if func represents a set of overloaded functions), MATLAB® determines which function to call based on the class of the input arguments.

The function func must accept width(A) inputs. By default, rowfun returns the first output of func. To return more than one output from func, use the "NumOutputs" or "OutputVariableNames" name-value arguments.

Example: func = @minus; takes two inputs and subtracts the second input from the first.

Example: func = @(x,y) x.^2+y.^2; takes two inputs and finds the sum of the squares.

Input table, specified as a table or a timetable.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: InputVariables=["Var2","Var3"] uses only the variables named Var2 and Var3 in A as the inputs to func.

Specifiers for selecting variables of A to pass to func, specified as "InputVariables" and a positive integer, vector of positive integers, string array, character vector, cell array of character vectors, pattern scalar, logical vector, or a function handle.

If you specify "InputVariables" as a function handle, then it must return a logical scalar, and rowfun passes only the variables in A where the function returns 1 (true).

Specifiers for selecting variables of A to be grouping variables, specified as "GroupingVariables" and a positive integer, vector of positive integers, string array, character vector, cell array of character vectors, pattern scalar, or logical vector.

The unique values in the grouping variables specify groups. Rows in A where the grouping variables have the same values belong to the same groups. rowfun applies func to each group of rows, rather than separately to each row of A. The output, B, contains one row for each group. For more information on calculations using grouping variables, see Calculations on Groups of Data

Grouping variables can have any of the data types listed in the table.

Values That Specify Groups

Data Type of Grouping Variable

Numbers

Numeric or logical vector

Text

String array or cell array of character vectors

Dates and times

datetime, duration, or calendarDuration vector

Categories

categorical vector

Bins

Vector of binned values, created by binning a continuous distribution of numeric, datetime, or duration values

If any grouping variable contains NaNs or missing values (such as NaTs, undefined categorical values, or missing strings), then the corresponding rows do not belong to any group, and are excluded from the output.

Row labels can be grouping variables. You can group on row labels alone, on one or more variables in A, or on row labels and variables together.

  • If A is a table, then the labels are row names.

  • If A is a timetable, then the labels are row times.

The output, B, has one row for each group of rows from the input, A. If B is a table or timetable, then B has:

  • Variables corresponding to the input table variables that func was applied to.

  • Variables corresponding to the grouping variables.

  • A new variable, GroupCount, whose values are the number of rows of the input A that are in each group.

Note: If B is a timetable, then B also has:

  • Row times, where the first row time from each group of rows in A is the corresponding row time in B. To return B as a table without row times, specify "OutputFormat" as "table".

Indicator for calling func with separate inputs, specified as "SeparateInputs" and either true, false, 1, or 0.

true (default)

func expects separate inputs. rowfun calls func with width(A) inputs, one argument for each data variable.

false

func expects one argument containing all inputs. rowfun creates the input argument to func by concatenating the values in each row of A.

For example, if A is a table that has three variables, and each variable is a numeric vector, then specifying "SeparateInputs",false causes rowfun to concatenate the three numeric vectors into one numeric matrix. The matrix has three columns. Then rowfun passes that matrix as one input argument to func.

Indicator to pass values from cell variables to func, specified as "ExtractCellContents" and either false, true, 0, or 1.

true

rowfun extracts the contents of a variable in A whose data type is cell and passes the values, rather than the cells, to func

For grouped calculations, the values within each group in a cell variable must allow vertical concatenation.

false

rowfun passes the cells of a variable in A whose data type is cell to func.

This is the default behavior.

Variable names for outputs of func, specified as "OutputVariableNames" and a character vector, cell array of character vectors, or string array, with names that are nonempty and distinct. The number of names must equal the number of outputs desired from func.

Furthermore, the variable names must be valid MATLAB identifiers. If valid MATLAB identifiers are not available for use as variable names, MATLAB uses a cell array of N character vectors of the form {'Var1' ... 'VarN'} where N is the number of variables. You can determine valid MATLAB variable names using the function isvarname.

Number of outputs from func, specified as "NumOutputs" and 0 or a positive integer. The integer must be less than or equal to the possible number of outputs from func.

Example: "NumOutputs",2 causes rowfun to call func with two outputs.

Format of B, specified as "OutputFormat" and either the value of "auto", "table", "timetable", "uniform", or "cell".

"auto" (default) (since R2023a)

rowfun returns an output whose data type matches the data type of the input A.

"table"

rowfun returns a table with one variable for each output of func. For grouped calculations, B also contains the grouping variables and a new GroupCount variable.

"table" allows you to use a function that returns values of different sizes or data types. However, for ungrouped calculations, all of the outputs from func must have one row each time it is called. For grouped calculations, all of the outputs from func must have the same number of rows.

If A is a table, then this is the default output format.

"timetable"

rowfun returns a timetable with one variable for each variable in A (or each variable specified with "InputVariables"). For grouped calculations, B also contains the grouping variables and a new GroupCount variable.

rowfun creates the row times of B from the row times of A. If the row times assigned to B do not make sense in the context of the calculations performed using func, then specify the output format as "OutputFormat","table".

If A is a timetable, then this is the default output format.

"uniform"

rowfun concatenates the values returned by func into a vector. All of the outputs from func must be scalars with the same data type.

"cell"

rowfun returns the output as a cell array. "cell" allows you to use a function that returns values of different sizes or data types.

Function to call if func fails, specified as "ErrorHandler" and a function handle. Define this function so that it rethrows the error or returns valid outputs for function func.

MATLAB calls the specified error-handling function with two input arguments:

  • A structure with these fields:

    identifier

    Error identifier.

    message

    Error message text.

    index

    Row or group index at which the error occurred.

  • The set of input arguments to function func at the time of the error.

For example,

function [A, B] = errorFunc(S, varargin)
warning(S.identifier, S.message);
A = NaN; B = NaN;

Output Arguments

collapse all

Output values, returned as a table, timetable, cell array, or vector.

If B is a table or timetable, then it can store metadata such as descriptions, variable units, variable names, and row names. For more information, see the Properties sections of table or timetable.

More About

collapse all

Calculations on Groups of Data

In data analysis, you commonly perform calculations on groups of data. For such calculations, you split one or more data variables into groups of data, perform a calculation on each group, and combine the results into one or more output variables. You can specify the groups using one or more grouping variables. The unique values in the grouping variables define the groups that the corresponding values of the data variables belong to.

For example, the diagram shows a simple grouped calculation that splits a 6-by-1 numeric vector into two groups of data, calculates the mean of each group, and then combines the outputs into a 2-by-1 numeric vector. The 6-by-1 grouping variable has two unique values, AB and XYZ.

Calculation that splits a data variable based on a grouping variable, performs calculations on individual groups of data by applying the same function, and then concatenates the outputs of those function calls

You can specify grouping variables that have numbers, text, dates and times, categories, or bins.

Extended Capabilities

Thread-Based Environment
Run code in the background using MATLAB® backgroundPool or accelerate code with Parallel Computing Toolbox™ ThreadPool.

Version History

Introduced in R2013b

expand all