groupfilter

Filter by group

Description

example

G = groupfilter(T,groupvars,method) filters the data in a table or timetable according to the computation in method, grouping by the unique combinations of grouping variables specified in groupvars. The method argument is a function handle that operates on each nongrouping variable. The output G is a table or timetable containing only the rows of T that satisfy the condition in method. For example, G = groupfilter(T,'Trial',@(x) numel(x) > 5) groups the data in T by Trial, keeping the rows that belong to groups with more than five trials.

example

G = groupfilter(T,groupvars,groupbins,method) bins the data according to groupbins before applying the filter, horizontally concatenating the groups to the output table. For example, G = groupfilter(T,'SaleDate','year',@(x) numel(x) > 5) bins the data in SaleDate by year, and keeps the rows whose bin has more than five elements.

example

G = groupfilter(___,datavars) specifies the table variables to apply the filter to for either of the previous syntaxes. For example, G = groupfilter(T,'Trial',@(x) x == max(x),'Height') keeps the rows of T corresponding to the maximum height for each trial.

G = groupfilter(T,groupvars,groupbins,method,___,'IncludedEdge',LR) specifies the included bin edge as 'left' or 'right' to indicate which end of the bin interval is inclusive when groupbins is specified.

B = groupfilter(A,groupvars,method) filters the data according to the unique combinations of grouping vectors in groupvars when A is a vector or matrix. groupvars can be a column vector, matrix, or cell array of column vectors. The output B is a matrix containing the rows of A that satisfy the condition defined by method.

example

B = groupfilter(A,groupvars,groupbins,method) bins the data according to groupbins.

B = groupfilter(A,groupvars,groupbins,method,'IncludedEdge',LR) specifies whether to include the left or right edge in each bin when groupbins is specified.

example

[B,BG] = groupfilter(A,___) also returns the rows of the grouping vectors corresponding to the unfiltered rows in A.

Examples

collapse all

Create a table containing two variables.

groupID = [1 1 1 2 2 3]';
sample = [3 1 2 9 8 5]';
T = table(groupID,sample)
T=6×2 table
    groupID    sample
    _______    ______

       1         3   
       1         1   
       1         2   
       2         9   
       2         8   
       3         5   

Group by ID number, and return only rows corresponding to groups with more than two samples.

Gnumel = groupfilter(T,'groupID',@(x) numel(x) > 2)
Gnumel=3×2 table
    groupID    sample
    _______    ______

       1         3   
       1         1   
       1         2   

Return only rows whose group samples are between 0 and 6.

Gvals = groupfilter(T,'groupID',@(x) min(x) > 0 && max(x) < 6)
Gvals=4×2 table
    groupID    sample
    _______    ______

       1         3   
       1         1   
       1         2   
       3         5   

Create a table containing two variables that represent a day number and temperature.

daynum = [1 1 1 1 2 2 2 2]';
temp = [67 65 71 55 61 79 58 78]';
T = table(daynum,temp)
T=8×2 table
    daynum    temp
    ______    ____

      1        67 
      1        65 
      1        71 
      1        55 
      2        61 
      2        79 
      2        58 
      2        78 

Group by day number, and return the largest two temperatures for each day.

G = groupfilter(T,'daynum',@(x) ismember(x,maxk(x,2)))
G=4×2 table
    daynum    temp
    ______    ____

      1        67 
      1        71 
      2        79 
      2        78 

Create a table of dates and corresponding profits.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 4 10;...
                       2017 4 14; 2017 4 30; 2017 5 25;...
                       2017 5 29; 2017 5 21]);
profit = [2032 3071 1185 2587 1998 2899 3112 909 2619]';
T = table(timeStamps,profit)
T=9×2 table
    timeStamps     profit
    ___________    ______

    04-Mar-2017     2032 
    02-Mar-2017     3071 
    15-Mar-2017     1185 
    10-Apr-2017     2587 
    14-Apr-2017     1998 
    30-Apr-2017     2899 
    25-May-2017     3112 
    29-May-2017      909 
    21-May-2017     2619 

Group the dates by month, and return only rows that correspond to the maximum profit for that month.

Gmax = groupfilter(T,'timeStamps','month',@(x) x == max(x))
Gmax=3×3 table
    timeStamps     profit    month_timeStamps
    ___________    ______    ________________

    02-Mar-2017     3071         Mar-2017    
    30-Apr-2017     2899         Apr-2017    
    25-May-2017     3112         May-2017    

Return rows whose month had an average profit greater than $2300.

Gavg = groupfilter(T,'timeStamps','month',@(x) mean(x) > 2300)
Gavg=3×3 table
    timeStamps     profit    month_timeStamps
    ___________    ______    ________________

    10-Apr-2017     2587         Apr-2017    
    14-Apr-2017     1998         Apr-2017    
    30-Apr-2017     2899         Apr-2017    

Create a table containing three variables.

groupID = [1 2 3 1 2 3 1 2 3]';
height = [62 61 59 66 70 72 57 67 71]';
gender = ["M" "F" "F" "M" "M" "F" "M" "M" "M"]';
T = table(groupID,height,gender)
T=9×3 table
    groupID    height    gender
    _______    ______    ______

       1         62       "M"  
       2         61       "F"  
       3         59       "F"  
       1         66       "M"  
       2         70       "M"  
       3         72       "F"  
       1         57       "M"  
       2         67       "M"  
       3         71       "M"  

Group by ID number, and return rows whose group members all have a minimum height of 60.

G1 = groupfilter(T,'groupID',@(x) min(x) >= 60,'height')
G1=3×3 table
    groupID    height    gender
    _______    ______    ______

       2         61       "F"  
       2         70       "M"  
       2         67       "M"  

Return rows whose group members are all male.

G2 = groupfilter(T,'groupID',@(x) all(x == "M"),'gender')
G2=3×3 table
    groupID    height    gender
    _______    ______    ______

       1         62       "M"  
       1         66       "M"  
       1         57       "M"  

Group by ID number and gender, and return rows containing the maximum height for males and females in each group.

G3 = groupfilter(T,{'groupID','gender'},@(x) x == max(x))
G3=5×3 table
    groupID    height    gender
    _______    ______    ______

       2         61       "F"  
       1         66       "M"  
       2         70       "M"  
       3         72       "F"  
       3         71       "M"  

Create a vector of dates and a vector of corresponding profit values.

timeStamps = datetime([2017 3 4; 2017 3 2; 2017 3 15; 2017 3 10; ...
                       2017 3 14; 2017 3 31; 2017 3 25; ...
                       2017 3 29; 2017 3 21; 2017 3 18]);
profit = [2032 3071 1185 2587 1998 2899 3112 909 2619 3085]';

Group by day of the week, and compute the maximum profit for each group. Display the maximum profits and their corresponding groups.

[maxDailyProfit,dayOfWeek] = groupfilter(profit,timeStamps,'dayname',@(x) x == max(x))
maxDailyProfit = 5×1

        3071
        1185
        2899
        3112
        2619

dayOfWeek = 5x1 categorical array
     Thursday 
     Wednesday 
     Friday 
     Saturday 
     Tuesday 

Input Arguments

collapse all

Input data, specified as a table or timetable.

Input array, specified as a vector or matrix.

Grouping variables or vectors, specified as a scalar, vector, matrix, cell array, or function handle.

For table or timetable input, groupvars indicates which columns to use to compute the groups, and can be one of the following:

  • A character vector or string scalar specifying a single table variable name

  • A cell array of character vectors or a string array, where each element is a table variable name

  • A vector of table variable indices

  • A logical vector whose elements each correspond to a table variable, where true includes the corresponding variable and false excludes it

  • A function handle that takes a table variable as input and returns a logical scalar

For array input, groupvars can be either a column vector with the same number of rows as A or a group of column vectors arranged in a matrix or a cell array.

Example: 'Age'

Example: {'Height','Weight'}

Computation method, specified as a function handle.

method defines the function used to filter out members from each group. The function must return a logical scalar or a logical column vector with the same number of rows as the input data indicating which group members to select. If the function returns a logical scalar, then either all members of the group are filtered out (when the value is false) or none are filtered out (when the value is true). If the function returns a logical vector, then members of groups are filtered out when the corresponding element is false. Members are kept when the corresponding element is true.

To define the function handle, use a syntax of the form @(inputargs) myfunc where myfunc depends on inputargs. For example, @(x) mean(x) designates the variable x as the input argument to the mean function. You can then define a condition by which to filter, such as @(x) mean(x) > 10, which returns a logical scalar. This computation keeps rows whose group has a mean larger than 10. For more information, see Create Function Handle and Anonymous Functions.

When groupfilter applies the method to more than one nongrouping variable at a time, the method must return true for each variable in order to keep the row.

Binning scheme, specified as one of the following options:

  • 'none', indicating no binning

  • A list of bin edges, specified as a numeric vector, or a datetime vector for datetime grouping variables

  • A number of bins, specified as an integer scalar

  • A time duration, specified as a scalar of type duration or calendarDuration, indicating bin widths (for datetime or duration grouping variables only)

  • A cell array listing binning rules for each grouping variable

  • A time bin for datetime and duration grouping variables only, specified as one of the following character vectors:

    ValueDescriptionData Type
    'second'

    Each bin is 1 second.

    datetime and duration
    'minute'

    Each bin is 1 minute.

    datetime and duration
    'hour'

    Each bin is 1 hour.

    datetime and duration
    'day'

    Each bin is 1 calendar day. This value accounts for Daylight Saving Time shifts.

    datetime and duration
    'week'Each bin is 1 calendar week.datetime only
    'month'Each bin is 1 calendar month.datetime only
    'quarter'Each bin is 1 calendar quarter.datetime only
    'year'

    Each bin is 1 calendar year. This value accounts for leap days.

    datetime and duration
    'decade'Each bin is 1 decade (10 calendar years).datetime only
    'century'Each bin is 1 century (100 calendar years).datetime only
    'secondofminute'

    Bins are seconds from 0 to 59.

    datetime only
    'minuteofhour'

    Bins are minutes from 0 to 59.

    datetime only
    'hourofday'

    Bins are hours from 0 to 23.

    datetime only
    'dayofweek'

    Bins are days from 1 to 7. The first day of the week is Sunday.

    datetime only
    'dayname'Bins are full day names such as 'Sunday'.datetime only
    'dayofmonth'Bins are days from 1 to 31.datetime only
    'dayofyear'Bins are days from 1 to 366.datetime only
    'weekofmonth'Bins are weeks from 1 to 6.datetime only
    'weekofyear'Bins are weeks from 1 to 54.datetime only
    'monthname'Bins are full month names such as 'January'.datetime only
    'monthofyear'

    Bins are months from 1 to 12.

    datetime only
    'quarterofyear'Bins are quarters from 1 to 4.datetime only

When multiple grouping variables are specified, you can provide a single binning rule that is applied to all grouping variables, or a cell array containing a binning method for each grouping variable such as {'none',[0 2 4 Inf]}.

Included bin edge, specified as either 'left' or 'right', indicating which end of the bin interval is inclusive.

This argument can be specified only when groupbins is specified and the value applies all binning schemes for all grouping variables or vectors.

Data variables for table or timetable input, specified as a scalar, vector, cell array, or function handle. datavars indicates which columns of the input table to apply the methods to, and can be one of the following options:

  • A character vector or string scalar specifying a single table variable name

  • A cell array of character vectors or string array where each element is a table variable name

  • A vector of table variable indices

  • A logical vector whose elements each correspond to a table variable, where true includes the corresponding variable and false excludes it

  • A function handle that takes a table variable as input and returns a logical scalar

When datavars is not specified, groupfilter applies the computations to each nongrouping variable.

Example: 'Profit'

Example: {'Income','Expenses'}

Example: @isnumeric

Output Arguments

collapse all

Output table, returned as a table containing the unfiltered rows of data.

Output array, returned as a vector or matrix containing the unfiltered data.

Grouping vectors for array input data, returned as a column vector or cell array of column vectors. BG contains the rows of the grouping vectors corresponding to the unfiltered rows in A.

Introduced in R2019b