Cox proportional hazards regression is a semiparametric method for adjusting survival rate estimates to quantify the effect of predictor variables. The method represents the effects of explanatory variables as a multiplier of a common baseline hazard function, h0(t). The hazard function is the nonparametric part of the Cox proportional hazards regression function, whereas the impact of the predictor variables is a loglinear regression. For a baseline relative to 0, this model corresponds to
where is the predictor variable for the ith subject, h(Xi,t) is the hazard rate at time t for Xi, and h0(t) is the baseline hazard rate function.
The Cox proportional hazards model relates the hazard rate for individuals or items at the value Xi, to the hazard rate for individuals or items at the baseline value. It produces an estimate for the hazard ratio:
The model is based on the assumption that the baseline hazard function depends on time, t, but the predictor variables do not. This assumption is also called the proportional hazards assumption, which states that the hazard ratio does not change over time for any individual.
The hazard ratio represents the relative risk of instant failure for individuals or items having the predictive variable value Xi compared to the ones having the baseline values. For example, if the predictive variable is smoking status, where nonsmoking is the baseline category, the hazard ratio shows the relative instant failure rate of smokers compared to the baseline category, that is, nonsmokers. For a baseline relative to X* and the predictor variable value Xi, the hazard ratio is
For example, if the baseline is
the mean values of the predictor variables (
then the hazard ratio becomes
Hazard rates are related to survival rates, such that the survival rate at time t for an individual with the explanatory variable value Xi is
where S0(t) is the survivor function with the baseline hazard rate function h0(t), and HR(Xi) is the hazard ratio of the predictor variable value Xi relative to the baseline value.
When you have variables that do not satisfy the proportional hazards (PH) assumption, you can consider using two extensions of Cox proportional hazards model: the stratified Cox model and the Cox model with time-dependent variables.
If the variables that do not satisfy the PH assumption are categorizable, use the stratified Cox model:
where the subscript s indicates
the sth stratum. The stratified Cox model has a
different baseline hazard rate function for each stratum but shares
coefficients. Therefore, it has the same hazard ratio across all strata
if the predictor variable values are the same. You can include stratification
coxphfit by using the name-value
If the variables that do not satisfy the PH assumption are time-dependent variables, use the Cox model with time-dependent variables:
where xij is
an element of a time-independent predictor and xik(t)
is an element of a time-dependent predictor. For an example of how
to include time-dependent variables in
see Cox Proportional Hazards Model with Time-Dependent Covariates.
A point estimate of the effect of each explanatory variable, that is, the estimated hazard ratio for the effect of each explanatory variable is exp(b), given all other variables are held constant, where b is the coefficient estimate for that variable. The coefficient estimates are found by maximizing the partial likelihood function of the model. The partial likelihood function for the proportional hazards regression model is based on the observed order of events. It is the product of partial likelihoods of failures estimated for each failure time. If there are n failures at n distinct failure times, , then the partial likelihood is
You can rewrite the partial likelihood by using a risk set Ri:
where Ri represents the index set of subjects who are under study but do not experience the event until the ith failure time.
You can use a likelihood ratio test to assess the significance of adding a term or terms in a model. Consider the two models where the first model has p predictive variables and the second model has p + r predictive variables. Then, comparing the two models, –2*(L1/L2) has a chi-square distribution with r degrees of freedom (the number of terms being tested).
When you have tied events,
the partial likelihood of the model by either Breslow’s method
(default) or Efron’s method, instead of computing the exact
partial likelihood. Computing the exact partial likelihood requires
a large amount of computation, which involves an entire permutation
of the risk sets for the tied event times.
The simplest approximation method is Breslow’s method. This method uses the same denominator for each tied set.
where d is the number of distinct event times, and Di is the index set of all subjects whose event time is equal to the ith event time.
Efron’s method is more accurate than Breslow’s method, yet simple. This method adjusts the denominator of the tied events as follows:
where di is the number of indexes in Di.
For an example, assume that the first two events are tied, that is, t1 = t2 and . In Breslow’s method, the denominators of the first two terms are the same:
Efron’s method adjusts the denominator of the second term:
You can specify an approximation method by using the name-value
The Cox proportional hazards model can incorporate with the frequency or weights of observations. Let wi be the weight of the ith observation. Then, the partial likelihoods of the Cox model with weights become as follows:
Partial likelihood with weights
Partial likelihood with weights and Breslow’s method
Partial likelihood with weights and Efron’s method
You can specify the frequency or weights of observations
by using the name-value pair
 Cox, D. R., and D. Oakes. Analysis of Survival Data. London: Chapman & Hall, 1984.
 Lawless, J. F. Statistical Models and Methods for Lifetime Data. Hoboken, NJ: Wiley-Interscience, 2002.
 Kleinbaum, D. G., and M. Klein. Survival Analysis. Statistics for Biology and Health. 2nd edition. Springer, 2005.
 Klein, J. P., and M. L. Moeschberger. Survival Analysis. Statistics for Biology and Health. 2nd edition. Springer, 2003.