Accelerating the pace of engineering and science

# coxphfit

Cox proportional hazards regression

## Description

example

b = coxphfit(X,T) returns a p-by-1 vector, b, of coefficient estimates for a Cox proportional hazards regression of the observed responses in an n-by-1 vector, T, on the predictors in an n-by-p matrix X.

The model does not include a constant term, and X cannot contain a column of 1s.

example

b = coxphfit(X,T,Name,Value) returns a vector of coefficient estimates, with additional options specified by one or more Name,Value pair arguments.

example

[b,logl,H,stats] = coxphfit(___) also returns the loglikelihood, logl, a structure, stats, that contains additional statistics, and a two-column matrix, H, that contains the T values in the first column and the estimated baseline cumulative hazard, in the second column. You can use any of the input arguments in the previous syntaxes.

## Examples

expand all

Navigate to a folder containing sample data.

```cd(matlabroot)
cd('help/toolbox/stats/examples')```

`load lightbulb`

The first column of the light bulb data has the lifetime (in hours) of two different types of bulbs. The second column has the binary variable indicating whether the bulb is fluorescent or incandescent. 0 indicates that the bulb is incandescent, and 1 indicates that it is fluorescent. The third column contains the censorship information, where 0 indicates the bulb was observed until failure, and 1 indicates the bulb was censored.

Fit a Cox proportional hazards model for the lifetime of the light bulbs, also accounting for censoring. The predictor variable is the type of bulb.

```b = coxphfit(lightbulb(:,2),lightbulb(:,1),...
'censoring',lightbulb(:,3))```
```b =

4.7262```

The estimate of the hazard ratio is exp(b) = 112.8646. This means that the hazard for the incandescent bulbs is 112.86 times the hazard for the fluorescent bulbs.

### Change the Algorithm Parameters

Navigate to a folder containing sample data.

```cd(matlabroot)
cd('help/toolbox/stats/examples')```

`load lightbulb`

The first column of the data has the lifetime (in hours) of two types of bulbs. The second column has the binary variable indicating whether the bulb is fluorescent or incandescent. 0 indicates that the bulb is incandescent, and 1 indicates that it is fluorescent. The third column contains the censorship information, where 0 indicates the bulb is observed until failure, and 1 indicates the item (bulb) is censored.

Fit a Cox proportional hazards model, also accounting for censoring. The predictor variable is the type of bulb.

```b = coxphfit(lightbulb(:,2),lightbulb(:,1),...
'censoring',lightbulb(:,3))```
```b =

4.7262```

Display the default control parameters for the algorithm coxphfit uses to estimate the coefficients.

`statset('coxphfit')`
```ans =

Display: 'off'
MaxFunEvals: 200
MaxIter: 100
TolBnd: 1.0000e-06
TolFun: 1.0000e-08
TolTypeFun: []
TolX: 1.0000e-08
TolTypeX: []
Jacobian: []
DerivStep: []
FunValCheck: []
Robust: []
RobustWgtFun: []
WgtFun: []
Tune: []
UseParallel: []
UseSubstreams: []
Streams: {}
OutputFcn: []
```

Save the options under a different name and change how the results will be displayed and the maximum number of iterations, Display and MaxIter.

```coxphopt = statset('coxphfit');
coxphopt.Display = 'final';
coxphopt.MaxIter = 50;```

Run coxphfit with the new algorithm parameters.

```b = coxphfit(lightbulb(:,2),lightbulb(:,1),...
'censoring',lightbulb(:,3),'options',coxphopt)```
```Successful convergence: Norm of gradient less than OPTIONS.TolFun

b =

4.7262```

coxphfit displays a report on the final iteration. Changing the maximum number of iterations did not affect the coefficient estimate.

### Fit and Compare Cox and Weibull Survivor Functions

Generate Weibull data depending on predictor X.

```rng('default') % for reproducibility
X = 4*rand(100,1);
A = 50*exp(-0.5*X);
B = 2;
y = wblrnd(A,B); ```

The response values are generated from a Weibull distribution with a shape parameter depending on the predictor variable X and a scale parameter of 2.

Fit a Cox proportional hazards model.

```[b,logL,H,stats] = coxphfit(X,y);
[b logL]```
```ans =

0.9409 -331.1479```

The coefficient estimate is 0.9409 and the log likelihood value is –331.1479.

Request the model statistics.

`stats`
```stats =

covb: 0.0158
beta: 0.9409
se: 0.1256
z: 7.4889
p: 6.9462e-14```

The covariance matrix of the coefficient estimates, covb, contains only one value, which is equal to the variance of the coefficient estimate in this example. The coefficient estimate, beta, is the same as b and is equal to 0.9409. The standard error of the coefficient estimate, se, is 0.1256, which is the square root of the variance 0.0158. The z-statistic, z, is beta/se = 0.9409/0.1256 = 7.4880. The p-value, p, indicates that the effect of X is significant.

Plot the Cox estimate of the baseline survivor function together with the known Weibull function.

```stairs(H(:,1),exp(-H(:,2)),'LineWidth',2)
xx = linspace(0,100);
line(xx,1-wblcdf(xx,50*exp(-0.5*mean(X)),B),'color','r','LineWidth',2)
xlim([0,50])
legend('Estimated Survivor Function','Weibull Survivor Function')```

The fitted model gives a close estimate to the survivor function of the actual distribution.

## Input Arguments

expand all

### X — Observations on predictor variablesmatrix

Observations on predictor variables, specified as an n-by-p matrix of p predictors for each of n observations.

The model does not include a constant term, thus X cannot contain a column of 1s.

Data Types: single | double

### T — Time-to-event datavector

Time-to-event data, specified as an n-by-1 vector.

Data Types: single | double

### Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'baseline',0,'censoring',censoreddata,'frequency',freq specifies that coxphfit calculates the baseline hazard rate relative to 0, considering the censoring information in the vector censoreddata, and the frequency of observations on T and X given in the vector freq.

### 'baseline' — X values at which to compute the baseline hazardmean(X) (default) | scalar value

X values at which to compute the baseline hazard, specified as the comma-separated pair consisting of 'baseline' and a scalar value.

The default is mean(X), so the hazard rate at X is h(t)*exp((X-mean(X))*b). Enter 0 to compute the baseline relative to 0, so the hazard rate at X is h(t)*exp(X*b). Changing the baseline does not affect the coefficient estimates, but the hazard ratio changes.

Example: 'baseline',0

Data Types: single | double

### 'censoring' — Indicator for censoringarray of 0s (default) | array of 0s and 1s

Indicator for censoring, specified as the comma-separated pair consisting of 'censoring' and a Boolean array of the same size as T. Use 1 for observations that are right censored and 0 for observations that are fully observed. The default is all observations are fully observed.

Example: 'censoring',cens

Data Types: logical

### 'frequency' — Frequency of observationsarray of 1s (default) | vector of nonnegative integer counts

Frequency of observations, specified as the comma-separated pair consisting of 'frequency' and an array that is the same size as T containing nonnegative integer counts.

The jth element of this vector gives the number of times the method observes the jth element of T and the jth row of X. The default is one observation per row of X and T.

Example: 'frequency',freq

Data Types: single | double

### 'init' — Initial values for estimated coefficientsvector

Initial values for estimated coefficients, specified as the comma-separated pair consisting of 'init' and a vector containing the coefficient initial values.

Example: 'init',initcoef

Data Types: single | double

### 'options' — Algorithm control parametersstructure

Algorithm control parameters for the iterative algorithm used to estimate b, specified as the comma-separated pair consisting of 'options' and a structure. A call to statset creates this argument. For parameter names and default values, type statset('coxphfit'). You can set the options under a new name and use that in the name-value pair argument.

Example: 'options',statset('coxphfit')

Data Types: char

## Output Arguments

expand all

### b — Coefficient estimatesvector

Coefficient estimates for a Cox proportional hazards regression, returned as a p-by-1 vector.

### logl — Loglikelihoodscalar

Loglikelihood of the fitted model, returned as a scalar.

You can use log likelihood values to compare different models and assess the significance of effects of terms in the model.

### H — Estimated baseline cumulative hazardtwo-column matrix

Estimated baseline cumulative hazard rate evaluated at T values, returned as a two-column matrix. The first column of the matrix contains T values, and the second column contains cumulative hazard rate estimates.

### stats — Coefficient statistics structure

Coefficient statistics, returned as a structure that contains the following fields.

 beta Coefficient estimates (same as b) se Standard errors of coefficient estimates, b z z-statistics for b (that is, b divided by standard error) p p-values for b covb Estimated covariance matrix for b

expand all

### Cox Proportional Hazards Regression

Cox proportional hazards regression is a semiparametric method for adjusting survival rate estimates to remove the effect of confounding variables and to quantify the effect of predictor variables. The method represents the effects of explanatory and confounding variables as a multiplier of a common baseline hazard function, h0(t).

For a baseline relative to 0, this model corresponds to

${h}_{X}\left(t\right)={h}_{0}\left(t\right){e}^{\sum _{i}{X}_{i}{b}_{i}},$

where hX(t) is the hazard rate at X and h0(t) is the baseline hazard rate function. The baseline hazard function is the nonparametric part of the Cox proportional hazards regression function, whereas the impact of the predictor variables is a loglinear regression. The assumption is that the baseline hazard function depends on time, t, but the predictor variables do not depend on time.

## References

[1] Cox, D.R., and D. Oakes. Analysis of Survival Data. London: Chapman & Hall, 1984.

[2] Lawless, J. F. Statistical Models and Methods for Lifetime Data. Hoboken, NJ: Wiley-Interscience, 2002.

[3] Kleinbaum, D. G., and M. Klein. Survival Analysis. Statistics for Biology and Health. 2nd edition. Springer, 2005.