Main Content

This example shows how to fit and analyze a Cox proportional hazards model using a `CoxModel`

object. Cox proportional hazards models relate to lifetime or failure time data. For background, see What Is Survival Analysis? and Cox Proportional Hazards Model.

Generate the data for three lifetime models with the following types of hazard rates. These models correspond to three stratification levels; see Extension of Cox Proportional Hazards Model.

Bathtub, whose failure rate is high at the beginning, decreases to a low level, then climbs toward a constant level

Logarithmically increasing: $$\mathrm{log}(x)/10$$

Constant $$1/4$$

The three models share a predictor with three multipliers for the base hazard rates:

`1`

`1/20`

`1/100`

In total, the data has nine types of population members, one from each stratification level and one from each predictor level. The functions for creating the bathtub and logarithmic hazard rates are in the Helper Functions section at the end of this example.

Plot the three hazard rates when the predictor value is `1`

.

t = linspace(1,40); plot(t,hazard(t),t,log(t)/10,t,1/4*ones(size(t))) legend('Hazard 1','Hazard 2','Hazard 3','Location','northeast') ylim([0 1])

Create pseudorandom data for the lifetimes associated with the nine models. Create 9000 samples chosen randomly (about 1000 of each type) by inverting the cumulative distributions. (For details, see Inverse transform sampling).

N = 9000; rng default mults = [1;1/20;1/100]; % Three predictors strata = randi(3,N,1); % Three strata t1 = zeros(N,1); a0 = randi(3,N,1); % Predictor a1 = mults(a0); v1 = rand(N,1); for i = 1:N switch strata(i) case 1 % Bathtub t1(i) = zeropt(a1(i),v1(i)); case 2 % Logarithmic t1(i) = zeroptold(a1(i),v1(i)); case 3 % Constant t1(i) = 1 + exprnd(4/a1(i)); end end

Place data into a table.

a3 = categorical(a1); tbldata = table(t1,a3,strata,'VariableNames',["Lifetime" "Predictors" "Strata"]);

Fit a Cox proportional hazards model to the table data.

coxMdl = fitcox(tbldata,'Lifetime ~ Predictors',"Stratification",'Strata');

Plot the probability of survival as a function of time for predictor value `1`

and the three stratification levels.

oo = categorical(1); plotSurvival(coxMdl,[oo;oo;oo],[1;2;3]) xlim([1,30])

Plot the probability of survival as a function of time for predictor value `1/20`

and the three stratification levels.

tt = categorical(1/20); figure plotSurvival(coxMdl,[tt;tt;tt],[1;2;3]) xlim([1,200])

Plot the probability of survival as a function of time for predictor value `1/100`

and the three stratification levels.

uu = categorical(1/100); figure plotSurvival(coxMdl,[uu;uu;uu],[1;2;3]) xlim([1,1000])

Even though the hazard rates are proportional for the different predictors, the three survival plots are not proportional.

Examine the coefficients of the fitted model.

disp(coxMdl.Coefficients)

Beta SE zStat pValue ______ ________ ______ ______ Predictors_0.05 1.5301 0.031783 48.143 0 Predictors_1 4.5593 0.052149 87.427 0

Notice that the `pValue`

entries are `0`

, which means that the listed predictor `Beta`

values are not zero.

View the confidence intervals for the model coefficients at the 0.01 level of significance.

coefci(coxMdl,0.01)

`ans = `*2×2*
1.4483 1.6120
4.4249 4.6936

To infer the hazard for the 0.01 level predictor, recall the definition of the Cox model:

$$h({X}_{i},t)={h}_{0}(t)\mathrm{exp}(\sum \phantom{\rule{0.5em}{0ex}}{x}_{ij}{b}_{j})$$.

The `fitcox`

function uses dummy variables with a reference group to handle categorical data. In this case, the function treats the 0.01 level predictor as the reference group, and encodes the predictor as all 0s when fitting the model. If you enter all 0s into the hazard function, you get

$$h([0,0],t)={h}_{0}(t)\mathrm{exp}(0*b1+0*b2)={h}_{0}(t)\mathrm{exp}(0)={h}_{0}(t).$$

$${h}_{0}(t)$$ is the baseline hazard, which is stored in `coxMdl.Hazard`

. Therefore, to get the hazard for the 0.01 level predictor, you can examine `coxMdl.Hazard`

. Plot the baseline cumulative hazard for the three stratification levels.

figure hold on for i = 1:3 pred3 = find(coxMdl.Hazard(:,3) == i); % Find indices of stratification i plot(coxMdl.Hazard(pred3,1),coxMdl.Hazard(pred3,2)) end xlabel('Time') ylabel('Cumulative Hazard') xlim([1 300]) legend('X = 0.01, Stratification 1',... 'X = 0.01, Stratification 2',... 'X = 0.01, Stratification 3','Location','northwest') hold off

The cumulative hazard for the other predictor values is $$\mathrm{exp}(Beta)$$ times the baseline cumulative hazard, where $$Beta$$ is the inferred coefficient.

disp(exp(coxMdl.Coefficients.Beta))

4.6188 95.5127

These relative hazard values are close to the theoretical values for the data, which was generated with multipliers 1, 1/20, and 1/100. The baseline value corresponds to the 1/100 multiplier, so the theoretical multipliers are 5 and 100.

View the `linhyptest`

table.

linhyptest(coxMdl)

`ans=`*2×2 table*
Predictor pValue
___________________ ______
{'Empty Model' } 0
{'Predictors_0.05'} 0

Again, the model requires the `1/20`

value predictor and the 1 value predictor.

Check whether the data supports the hypothesis that the data is from a Cox proportional hazards model.

supports = coxMdl.ProportionalHazardsPValueGlobal

supports = 0.9730

The null hypothesis for this test is that the data is from a Cox proportional hazards model. To reject this hypothesis, `supports`

must be less than 0.05 or some other small significance level. The statistic indicates support for the hypothesis that the data is consistent with a Cox model.

Calculate the hazard ratio for the predictor values `1`

, `1/20`

, and `1/100`

compared to a baseline of `0`

for the three stratification levels.

hazards = hazardratio(coxMdl,... categorical([1;1;1;1/20;1/20;1/20;1/100;1/100;1/100]),... [1;2;3;1;2;3;1;2;3],'Baseline',0)

`hazards = `*9×1*
95.5127
95.5127
95.5127
4.6188
4.6188
4.6188
1.0000
1.0000
1.0000

The hazard ratios are near their theoretical values of 100, 5, and 1, as explained in the previous section. The hazard ratios do not depend on the stratification level.

Stratification level 3 has a constant hazard rate of 1/4. Theoretically, a constant hazard rate means an exponential survival function, whose logarithm is a straight line. Plot the survival of level 3 stratification using the predictor value `1/20`

, which leads to an exponential rate of 1/80.

tt = categorical(1/20); h = figure; axes1 = axes('Parent',h); plotSurvival(coxMdl,axes1,tt,3); xlim([1 400]); axes1.YScale = 'log'; hold on tms = linspace(1,400); semilogy(tms,exp(-tms/80),'r--') hold off

The data matches the theoretical line for probabilities well above 1/100.

This function creates the bathtub hazard rate.

function h = hazard(x) h = exp(-2*(x - 1)) + (1 + tanh(x/10 - 3)); end

This function creates the integral of the bathtub hazard rate from `1`

to `x`

.

function eh = exphazard(x) eh = 1/2 - exp(-2*(x-1))/2; eh2 = (10*log(cosh(x/10 - 3)) - 10*log(cosh(1/10 - 3)) + x - 1); eh = eh + eh2; end

This function solves for the root of the cumulative hazard rate with multiplier `a`

to level `v`

.

function zz = zeropt(a,v) zz = fzero(@(x)(1 - exp(-a*exphazard(x)) - v),[1 100*max(1,1/a)]); end

This function creates the integral of the logarithmic hazard rate with multiplier `1/10`

from `1`

to `x`

.

function cr = cumrisk(x) cr = 1/10*(x.*(log(x) - 1) + 1); end

This function solves for the root of the cumulative hazard rate with multiplier `a`

to level `v`

.

function zz = zeroptold(a,v) zz = fzero(@(x)(1 - exp(-a*cumrisk(x)) - v),[1 50*max(1,1/a)]); end