Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Cox proportional hazards regression is a semiparametric method
for adjusting survival rate estimates to quantify the effect of predictor
variables. The method represents the effects of explanatory variables
as a multiplier of a common baseline hazard function, *h*_{0}(*t*).
The hazard function is the nonparametric part of the Cox proportional
hazards regression function, whereas the impact of the predictor variables
is a loglinear regression. For a baseline relative to 0, this model
corresponds to

$$h\left({X}_{i},t\right)={h}_{0}(t)\mathrm{exp}\left[{\displaystyle \sum _{j=1}^{p}{x}_{ij}{b}_{j}}\right],$$

where $${X}_{i}=({x}_{i1},{x}_{i2},\cdots ,{x}_{ip})$$ is
the predictor variable for the *i*th subject, *h*(*X*_{i},*t*)
is the hazard rate at time *t* for *X*_{i},
and *h*_{0}(*t*)
is the baseline hazard rate function.

The Cox proportional hazards model relates the hazard rate for
individuals or items at the value *X*_{i},
to the hazard rate for individuals or items at the baseline value.
It produces an estimate for the hazard ratio:

$$HR({X}_{i})=\frac{h\left({X}_{i},t\right)}{{h}_{0}\left(t\right)}=\mathrm{exp}\left[{\displaystyle \sum _{j=1}^{p}{x}_{ij}{b}_{j}}\right].$$

The hazard ratio represents the relative risk of instant failure
for individuals or items having the predictive variable value *X*_{i} compared
to the ones having the baseline values. For example, if the predictive
variable is smoking status, where nonsmoking is the baseline category,
the hazard ratio shows the relative instant failure rate of smokers
compared to the baseline category, that is, nonsmokers. For a baseline
relative to *X*^{*} and the
predictor variable value *X*_{i},
the hazard ratio is

$$HR({X}_{i})=\frac{h\left({X}_{i},t\right)}{h\left({X}^{*},t\right)}=\mathrm{exp}\left[{\displaystyle \sum _{j=1}^{p}\left({x}_{ij}-{x}_{j}{}^{*}\right){b}_{j}}\right].$$

`mean(X)`

),
then the hazard ratio becomes$$HR({X}_{i})=\frac{h\left({X}_{i},t\right)}{h\left(\overline{X},t\right)}=\mathrm{exp}\left[{\displaystyle \sum _{j=1}^{p}\left({x}_{ij}-{\overline{x}}_{j}\right){b}_{j}}\right].$$

Hazard rates are related to survival rates, such that the survival
rate at time *t* for an individual with the explanatory
variable value *X*_{i} is

$${S}_{{X}_{i}}\left(t\right)={S}_{0}{\left(t\right)}^{HR({X}_{i})},$$

where *S*_{0}(*t*)
is the survivor function with the baseline hazard rate function *h*_{0}(*t*),
and *HR*(*X*_{i})
is the hazard ratio of the predictor variable value *X*_{i} relative
to the baseline value.

When you have variables that do not satisfy the proportional hazards (PH) assumption, you can consider using two extensions of Cox proportional hazards model: the stratified Cox model and the Cox model with time-dependent variables.

If the variables that do not satisfy the PH assumption are categorizable, use the stratified Cox model:

$${h}_{s}\left({X}_{i},t\right)={h}_{0s}(t)\mathrm{exp}\left[{\displaystyle \sum _{j=1}^{p}{x}_{ij}{b}_{j}}\right],$$

`coxphfit`

by using the name-value
pair `'Strata'`

.If the variables that do not satisfy the PH assumption are time-dependent variables, use the Cox model with time-dependent variables:

$$h\left({X}_{i},t\right)={h}_{0}(t)\mathrm{exp}\left[{\displaystyle \sum _{j=1}^{{p}_{1}}{x}_{ij}{b}_{j}}+{\displaystyle \sum _{k=1}^{{p}_{2}}{x}_{ik}(t){c}_{k}}\right],$$

`coxphfit`

,
see Cox Proportional Hazards Model with Time-Dependent Covariates.A point estimate of the effect of each explanatory variable,
that is, the estimated hazard ratio for the effect of each explanatory
variable is exp(*b*), given all other variables are
held constant, where *b* is the coefficient estimate
for that variable. The coefficient estimates are found by maximizing
the partial likelihood function of the model. The partial likelihood
function for the proportional hazards regression model is based on
the observed order of events. It is the product of partial likelihoods
of failures estimated for each failure time. If there are *n* failures
at *n* distinct failure times, $${t}_{1}<{t}_{2}<\cdots <{t}_{n}$$,
then the partial likelihood is

$$L=\left[\frac{h\left({X}_{1},{t}_{1}\right)}{{\displaystyle {\sum}_{j=1}^{n}h\left({X}_{j},{t}_{j}\right)}}\right]\times \left[\frac{h\left({X}_{2},{t}_{2}\right)}{{\displaystyle {\sum}_{j=2}^{n}h\left({X}_{j},{t}_{j}\right)}}\right]\times \cdot \cdot \cdot \times \left[\frac{h\left({X}_{n},{t}_{n}\right)}{h\left({X}_{n},{t}_{n}\right)}\right]={\displaystyle \prod _{i=1}^{n}\frac{h\left({X}_{i},{t}_{i}\right)}{{\displaystyle {\sum}_{j=i}^{n}h\left({X}_{j},t{}_{j}\right)}}}.$$

$$L={\displaystyle \prod _{i=1}^{n}\frac{h({X}_{i},{t}_{i})}{{\displaystyle \sum _{j\in {R}_{i}}h({X}_{j},{t}_{j})}}},$$

You can use a likelihood ratio test to assess the significance
of adding a term or terms in a model. Consider the two models where
the first model has *p* predictive variables and
the second model has *p* + *r* predictive
variables. Then, comparing the two models, –2*(*L*_{1}/*L*_{2})
has a chi-square distribution with *r* degrees of
freedom (the number of terms being tested).

When you have tied events, `coxphfit`

approximates
the partial likelihood of the model by either Breslow’s method
(default) or Efron’s method, instead of computing the exact
partial likelihood. Computing the exact partial likelihood requires
a large amount of computation, which involves an entire permutation
of the risk sets for the tied event times.

The simplest approximation method is Breslow’s method. This method uses the same denominator for each tied set.

$$L={\displaystyle \prod _{i=1}^{d}{\displaystyle \prod _{j\in {D}_{i}}\frac{h({X}_{j},{t}_{j})}{{\displaystyle \sum _{k\in {R}_{i}}h({X}_{k},{t}_{k})}}}},$$

Efron’s method is more accurate than Breslow’s method, yet simple. This method adjusts the denominator of the tied events as follows:

$$L={\displaystyle \prod _{i=1}^{d}{\displaystyle \prod _{j\in {D}_{i}}\frac{h({X}_{j},{t}_{j})}{{\displaystyle \sum _{k\in {R}_{i}}h({X}_{k},{t}_{k})-\frac{j-1}{{d}_{i}}{\displaystyle \sum _{k\in {D}_{i}}h({X}_{k},{t}_{k})}}}}},$$

For an example, assume that the first two events are tied, that
is, *t*_{1} = *t*_{2} and $${t}_{2}<{t}_{3}<\cdots <{t}_{n}$$.
In Breslow’s method, the denominators of the first two terms
are the same:

$$L=\left[\frac{h\left({X}_{1},{t}_{1}\right)}{{\displaystyle {\sum}_{j=1}^{n}h\left({X}_{j},{t}_{j}\right)}}\right]\times \left[\frac{h\left({X}_{2},{t}_{2}\right)}{{\displaystyle {\sum}_{j=1}^{n}h\left({X}_{j},{t}_{j}\right)}}\right]\times \left[\frac{h\left({X}_{3},{t}_{3}\right)}{{\displaystyle {\sum}_{j=3}^{n}h\left({X}_{j},{t}_{j}\right)}}\right]\times \left[\frac{h\left({X}_{4},{t}_{4}\right)}{{\displaystyle {\sum}_{j=4}^{n}h\left({X}_{j},{t}_{j}\right)}}\right]\times \cdot \cdot \cdot \times \left[\frac{h\left({X}_{n},{t}_{n}\right)}{h\left({X}_{n},{t}_{n}\right)}\right].$$

$$L=\left[\frac{h\left({X}_{1},{t}_{1}\right)}{{\displaystyle {\sum}_{j=1}^{n}h\left({X}_{j},{t}_{j}\right)}}\right]\times \left[\frac{h\left({X}_{2},{t}_{2}\right)}{0.5h\left({X}_{1},{t}_{1}\right)+0.5h\left({X}_{2},{t}_{2}\right)+{\displaystyle {\sum}_{j=3}^{n}h\left({X}_{j},{t}_{j}\right)}}\right]\times \left[\frac{h\left({X}_{3},{t}_{3}\right)}{{\displaystyle {\sum}_{j=3}^{n}h\left({X}_{j},{t}_{j}\right)}}\right]\times \left[\frac{h\left({X}_{4},{t}_{4}\right)}{{\displaystyle {\sum}_{j=4}^{n}h\left({X}_{j},{t}_{j}\right)}}\right]\times \cdot \cdot \cdot \times \left[\frac{h\left({X}_{n},{t}_{n}\right)}{h\left({X}_{n},{t}_{n}\right)}\right].$$

You can specify an approximation method by using the name-value
pair `'Ties'`

in `coxphfit`

.

The Cox proportional hazards model can incorporate with the
frequency or weights of observations. Let *w*_{i} be
the weight of the *i*th observation. Then, the partial
likelihoods of the Cox model with weights become as follows:

Partial likelihood with weights

$$L={\displaystyle \prod _{i=1}^{n}\frac{{w}_{i}h({X}_{i},{t}_{i})}{{\displaystyle \sum _{j\in {R}_{i}}{w}_{j}h({X}_{j},{t}_{j})}}}$$

Partial likelihood with weights and Breslow’s method

$$L={\displaystyle \prod _{i=1}^{d}{\displaystyle \prod _{j\in {D}_{i}}\frac{{w}_{j}h({X}_{j},{t}_{j})}{{\left[{\displaystyle \sum _{k\in {R}_{i}}{w}_{k}h({X}_{k},{t}_{k})}\right]}^{\frac{1}{{d}_{i}}{\displaystyle \sum _{j\in {D}_{i}}{w}_{j}}}}}}$$

Partial likelihood with weights and Efron’s method

$$L={\displaystyle \prod _{i=1}^{d}{\displaystyle \prod _{j\in {D}_{i}}\frac{{w}_{j}h({X}_{j},{t}_{j})}{{\left[{\displaystyle \sum _{k\in {R}_{i}}{w}_{k}h({X}_{k},{t}_{k})-\frac{j-1}{{d}_{i}}{\displaystyle \sum _{k\in {D}_{i}}{w}_{k}h({X}_{k},{t}_{k})}}\right]}^{\frac{1}{{d}_{i}}{\displaystyle \sum _{j\in {D}_{i}}{w}_{j}}}}}}$$

You can specify the frequency or weights of observations
by using the name-value pair `'Frequency'`

in `coxphfit`

.

[1] Cox, D. R., and D. Oakes. *Analysis
of Survival Data*. London: Chapman & Hall, 1984.

[2] Lawless, J. F. *Statistical
Models and Methods for Lifetime Data*. Hoboken, NJ: Wiley-Interscience,
2002.

[3] Kleinbaum, D. G., and M. Klein. *Survival Analysis*.
Statistics for Biology and Health. 2nd edition. Springer, 2005.

[4] Klein, J. P., and M. L. Moeschberger. *Survival
Analysis*. Statistics for Biology and Health. 2nd edition.
Springer, 2003.

- Hazard and Survivor Functions for Different Groups
- Survivor Functions for Two Groups
- Cox Proportional Hazards Model for Censored Data
- Cox Proportional Hazards Model with Time-Dependent Covariates

Was this topic helpful?