# Documentation

## Categorize Numeric Data

This example shows how to categorize numeric data into a categorical ordinal array using `ordinal`. This is useful for discretizing continuous data.

### Load sample data.

The dataset array, `hospital`, contains variables measured on a sample of patients. Compute the minimum, median, and maximum of the variable `Age`.

```load hospital quantile(hospital.Age,[0,.5,1]) ```
```ans = 25 39 50 ```

The patient ages range from 25 to 50.

### Convert a numeric array to an ordinal array.

Group patients into the age categories `Under 30`, `30-39`, `Over 40`.

```hospital.AgeCat = ordinal(hospital.Age,{'Under 30','30-39','Over 40'},... [],[25,30,40,50]); getlevels(hospital.AgeCat) ```
```ans = Under 30 30-39 Over 40 ```

The last input argument to `ordinal` has the endpoints for the categories. The first category begins at age 25, the second at age 30, and so on. The last category contains ages 40 and above, so begins at 40 and ends at 50 (the maximum age in the data set). To specify three categories, you must specify four endpoints (the last endpoint is the upper bound of the last category).

### Explore categories.

Display the age and age category for the second patient.

```dataset({hospital.Age(2),'Age'},... {hospital.AgeCat(2),'AgeCategory'}) ```
```ans = Age AgeCategory 43 Over 40 ```

When you discretize a numeric array into categories, the categorical array loses all information about the actual numeric values. In this example, `AgeCat` is not numeric, and you cannot recover the raw data values from it.

### Categorize a numeric array into quartiles.

The variable `Weight` has weight measurements for the sample patients. Categorize the patient weights into four categories, by quartile.

```p = 0:.25:1; breaks = quantile(hospital.Weight,p); hospital.WeightQ = ordinal(hospital.Weight,{'Q1','Q2','Q3','Q4'},... [],breaks); getlevels(hospital.WeightQ) ```
```ans = Q1 Q2 Q3 Q4 ```

### Explore categories.

Display the weight and weight quartile for the second patient.

```dataset({hospital.Weight(2),'Weight'},... {hospital.WeightQ(2),'WeightQuartile'}) ```
```ans = Weight WeightQuartile 163 Q3 ```

### Summary statistics grouped by category levels.

Compute the mean systolic and diastolic blood pressure for each age and weight category.

```grpstats(hospital,{'AgeCat','WeightQ'},'mean','DataVars','BloodPressure') ```
```ans = AgeCat WeightQ GroupCount mean_BloodPressure Under 30_Q1 Under 30 Q1 6 123.17 79.667 Under 30_Q2 Under 30 Q2 3 120.33 79.667 Under 30_Q3 Under 30 Q3 2 127.5 86.5 Under 30_Q4 Under 30 Q4 4 122 78 30-39_Q1 30-39 Q1 12 121.75 81.75 30-39_Q2 30-39 Q2 9 119.56 82.556 30-39_Q3 30-39 Q3 9 121 83.222 30-39_Q4 30-39 Q4 11 125.55 87.273 Over 40_Q1 Over 40 Q1 7 122.14 84.714 Over 40_Q2 Over 40 Q2 13 123.38 79.385 Over 40_Q3 Over 40 Q3 14 123.07 84.643 Over 40_Q4 Over 40 Q4 10 124.6 85.1 ```

The variable `BloodPressure` is a matrix with two columns. The first column is systolic blood pressure, and the second column is diastolic blood pressure. The group in the sample with the highest mean diastolic blood pressure, `87.273`, is aged 30–39 and in the highest weight quartile, `30-39_Q4`.