# findgroups

Find groups and return group numbers

## Syntax

``G = findgroups(A)``
``G = findgroups(A1,...,AN)``
``````[G,ID] = findgroups(A)``````
``[G,ID1,...,IDN] = findgroups(A1,...,AN)``
``G = findgroups(T)``
``````[G,TID] = findgroups(T)``````

## Description

example

````G = findgroups(A)` returns `G`, a vector of group numbers created from the grouping variable `A`. The output argument `G` contains integer values from 1 to `N`, indicating `N` distinct groups for the `N` unique values in `A`. For example, if `A` is `{'b','a','a','b'}`, then `findgroups` returns `G` as ```[2 1 1 2]```. You can use `G` to split groups of data out of other variables. Use `G` as an input argument to `splitapply` in the Split-Apply-Combine Workflow.`findgroups` treats empty character vectors and `NaN`, `NaT`, and undefined categorical values in `A` as missing values and returns `NaN` as the corresponding elements of `G`.```

example

````G = findgroups(A1,...,AN)` creates group numbers from `A1,...,AN`. The `findgroups` function defines groups as the unique combinations of values across `A1,...,AN`. For example, if `A1` is `{'a','a','b','b'}` and `A2` is ```[0 1 0 0]```, then `findgroups(A1,A2)` returns `G` as ```[1 2 3 3]```, because the combination `'b' 0` occurs twice.```

example

``````[G,ID] = findgroups(A)``` also returns the unique values for each group in `ID`. For example, if `A` is `{'b','a','a','b'}`, then `findgroups` returns `G` as ```[2 1 1 2]``` and `ID` as `{'a','b'}`. The arguments `A` and `ID` are the same data type, but need not be the same size.```

example

````[G,ID1,...,IDN] = findgroups(A1,...,AN)` also returns the unique values for each group across `ID1,...,IDN`. The values across `ID1,...,IDN` define the groups. For example, if `A1` is `{'a','a','b','b'}` and `A2` is ```[0 1 0 0]```, then `findgroups(A1,A2)` returns `G` as ```[1 2 3 3]```, and `ID1` and `ID2` as `{'a','a','b'}` and ```[0 1 0]```.```

example

````G = findgroups(T)` returns `G`, a vector of group numbers created from the variables in table `T`. The `findgroups` function treats all the variables in `T` as grouping variables.```

example

``````[G,TID] = findgroups(T)``` also returns `TID`, a table that contains the unique values for each group. `TID` contains the unique combinations of values across the variables of `T`. The variables in `T` and `TID` have the same names, but the tables need not have the same number of rows.```

## Examples

collapse all

Use group numbers to split patient height measurements into groups by gender. Then calculate the mean height for each group.

Load patient heights and genders from the data file `patients.mat`.

```load patients whos Gender Height```
``` Name Size Bytes Class Attributes Gender 100x1 11412 cell Height 100x1 800 double ```

Specify groups by gender with `findgroups`.

`G = findgroups(Gender);`

Compare the first five elements of `Gender` and `G`. Where `Gender` contains `'Female'`, `G` contains `1`. Where `Gender` contains `'Male'`, `G` contains `2`.

`Gender(1:5)`
```ans = 5x1 cell {'Male' } {'Male' } {'Female'} {'Female'} {'Female'} ```
`G(1:5)`
```ans = 5×1 2 2 1 1 1 ```

Split the `Height` variable into two groups of heights using `G`. Apply the `mean` function. The groups contain the mean heights of female and male patients, respectively.

`splitapply(@mean,Height,G)`
```ans = 2×1 65.1509 69.2340 ```

Calculate mean blood pressures for groups of patients from measurements grouped by gender and status as a smoker.

Load blood pressure readings, gender, and smoking data for patients from the data file `patients.mat`.

```load patients whos Systolic Diastolic Gender Smoker```
``` Name Size Bytes Class Attributes Diastolic 100x1 800 double Gender 100x1 11412 cell Smoker 100x1 100 logical Systolic 100x1 800 double ```

Specify groups using gender and smoking information about the patients. `G` contains integers from one to four because there are four possible combinations of values from `Smoker` and `Gender`.

```G = findgroups(Smoker,Gender); G(1:10)```
```ans = 10×1 4 2 1 1 1 1 3 2 2 1 ```

Calculate the mean blood pressure for each group.

```meanSystolic = splitapply(@mean,Systolic,G); meanDiastolic = splitapply(@mean,Diastolic,G); mBP = [meanSystolic,meanDiastolic]```
```mBP = 4×2 119.4250 79.0500 119.3462 79.8846 129.0000 89.2308 129.5714 90.3333 ```

Calculate the median heights for groups of patients, and display the results in a table. To define the groups of patients, use the additional output argument from `findgroups`.

Load patient heights and genders from the data file `patients.mat`.

```load patients whos Gender Height```
``` Name Size Bytes Class Attributes Gender 100x1 11412 cell Height 100x1 800 double ```

Specify groups by gender with `findgroups`. The values in the output argument `gender` define the groups that `findgroups` finds in the grouping variable.

`[G,gender] = findgroups(Gender);`

Calculate the median heights. Create a table that contains the median heights.

```medianHeight = splitapply(@median,Height,G); T = table(gender,medianHeight)```
```T=2×2 table gender medianHeight __________ ____________ {'Female'} 65 {'Male' } 69 ```

Calculate mean blood pressures for groups of patients, and display the results in a table. To define the groups of patients, use the additional output arguments from `findgroups`.

Load blood pressure readings, gender, and smoking data for 100 patients from the data file `patients.mat`.

```load patients whos Systolic Diastolic Gender Smoker```
``` Name Size Bytes Class Attributes Diastolic 100x1 800 double Gender 100x1 11412 cell Smoker 100x1 100 logical Systolic 100x1 800 double ```

Specify groups using gender and smoking information about the patients. Calculate mean blood pressure for each group. The values across the output arguments `gender` and `smoker` define the groups that `findgroups` finds in the grouping variables.

```[G,gender,smoker] = findgroups(Gender,Smoker); meanSystolic = splitapply(@mean,Systolic,G); meanDiastolic = splitapply(@mean,Diastolic,G);```

Create a table with the mean blood pressure for each group of patients.

`T = table(gender,smoker,meanSystolic,meanDiastolic)`
```T=4×4 table gender smoker meanSystolic meanDiastolic __________ ______ ____________ _____________ {'Female'} false 119.42 79.05 {'Female'} true 129 89.231 {'Male' } false 119.35 79.885 {'Male' } true 129.57 90.333 ```

Calculate mean blood pressures for patients using grouping variables that are in a table.

Load gender and smoking data for 100 patients into a table.

```load patients T = table(Gender,Smoker); T(1:5,:)```
```ans=5×2 table Gender Smoker __________ ______ {'Male' } true {'Male' } false {'Female'} false {'Female'} false {'Female'} false ```

Specify groups of patients using the `Gender` and `Smoker` variables in `T`.

`G = findgroups(T);`

Calculate mean blood pressures from the data variables `Systolic` and `Diastolic`.

```meanSystolic = splitapply(@mean,Systolic,G); meanDiastolic = splitapply(@mean,Diastolic,G); mBP = [meanSystolic,meanDiastolic]```
```mBP = 4×2 119.4250 79.0500 129.0000 89.2308 119.3462 79.8846 129.5714 90.3333 ```

Create a table of mean blood pressures for patients grouped by gender and status as a smoker or nonsmoker.

Load gender and smoking data for patients into a table.

```load patients T = table(Gender,Smoker);```

Specify groups of patients using the `Gender` and `Smoker` variables in `T`. The output table `TID` identifies the groups.

```[G,TID] = findgroups(T); TID```
```TID=4×2 table Gender Smoker __________ ______ {'Female'} false {'Female'} true {'Male' } false {'Male' } true ```

Calculate mean blood pressures from the data variables `Systolic` and `Diastolic`. Append mean blood pressures to `TID`.

```TID.meanSystolic = splitapply(@mean,Systolic,G); TID.meanDiastolic = splitapply(@mean,Diastolic,G)```
```TID=4×4 table Gender Smoker meanSystolic meanDiastolic __________ ______ ____________ _____________ {'Female'} false 119.42 79.05 {'Female'} true 129 89.231 {'Male' } false 119.35 79.885 {'Male' } true 129.57 90.333 ```

## Input Arguments

collapse all

Grouping variable, specified as a vector, a cell array of character vectors, or a string array. The unique values in `A` identify groups.

If `A` is a vector, then it can be numeric or of data type `categorical`, `calendarDuration`, `datetime`, `duration`, `logical`, or `string`.

Grouping variables, specified as a table. `findgroups` treats each table variable as a separate grouping variable. The variables can be numeric or of data type `categorical`, `calendarDuration`, `datetime`, `duration`, `logical`, or `string`.

## Output Arguments

collapse all

Group numbers, returned as a vector of positive integers. For `N` groups identified in the grouping variables, every integer between 1 and `N` specifies a group. `G` contains `NaN` where any grouping variable contains an empty character vector or a `NaN`, `NaT`, or undefined categorical value.

• If the grouping variables are vectors, then `G` and the grouping variables all are the same size.

• If the grouping variables are in a table, the length of `G` is equal to the number of rows of the table.

Values that identify each group, returned as a vector or cell array of character vectors. The values of `ID` are the sorted unique values of `A`.

The unique values that identify each group, returned as a table. The variables of `TID` have the sorted unique values from the corresponding variables of `T`. However, `TID` and `T` need not have the same number of rows.

The Split-Apply-Combine workflow is common in data analysis. In this workflow, the analyst splits the data into groups, applies a function to each group, and combines the results. The diagram shows a typical example of the workflow and the parts of the workflow implemented by `findgroups` and `splitapply`.