Cox Proportional Hazards Model with Time-Dependent Covariates
This example shows how to convert survival data to counting process form and then construct a Cox proportional hazards model with time-dependent covariates.
Contents
Step 1. Compare standard layout and counting process form.
A Cox model with time-dependent covariates requires survival data to be in counting process form and not in standard layout. To see the difference between survival data in standard layout and in counting process form, load the following sample data.
load(fullfile(matlabroot,'examples','stats','simplesurvivaldata.mat'));
This sample data contains two tables: relapseS and relapseCP. These two tables represent the same simple survival data in standard layout and in counting process form, respectively.
Display the data in standard layout.
relapseS
relapseS = 2x5 table ID Time Censoring Age StopTreatment __ ____ _________ ___ _____________ 1 5 0 20 NaN 2 20 1 30 12
This data represents two patients whose treatment status changes over time. Patient 1 was not taking treatment for the interval from week 0 to 5 and relapsed at the end of the interval. Patient 2 was taking treatment for the interval from week 0 to 12, but not for the interval from week 12 to 20. Patient 2 did not relapse and left the study after week 20.
Now display the same data in counting process form.
relapseCP
relapseCP = 3x6 table ID tStart tStop Censoring Age TreatmentStatus __ ______ _____ _________ ___ _______________ 1 0 5 0 20 0 2 0 12 1 30 1 2 12 20 1 30 0
In counting process form, each row represents the risk interval (tStart,tStop] instead of a single value of an event time. Censoring is 0 if the event is observed at the end of the risk interval, and 1 if it is not. TreatmentStatus corresponds to a time-dependent covariate, which represents the same information with StopTreatment in standard layout. Note that a Cox model assumes time-dependent covariates to be constant in each risk interval.
Step 2. Load sample data.
Next, load sample data to convert.
load(fullfile(matlabroot,'examples','stats','survivaldatacp.mat'));
This sample data contains a table labS, which is simulated survival data including repeated measurement for each patient in standard layout.
Display the simulated survival data in standard layout.
labS
labS = 6x7 table ID Time Censoring Sex Lab_0 Lab_50 Lab_100 __ ____ _________ ___ _____ ______ _______ 1 46 0 1 0.3 NaN NaN 2 138 1 0 0.2 0.23 0.39 3 94 0 1 0.18 0.22 NaN 4 50 0 0 0.21 0.2 NaN 5 106 0 0 0.25 0.21 0.42 6 98 0 0 0.21 0.22 NaN
In standard layout, each row of the table shows information for one patient.
- ID indicates the ID of a patient. You do not include ID as an input of a Cox model. Include ID in a data set to confirm that the data set is correctly converted to counting process form.
- Time represents time to event in days, which corresponds to a response variable.
- Censoring has the censorship information for each patient, where 1 indicates censored data and 0 indicates that the exact time to event is observed at the end of the observation period.
- Sex is a time-independent predictor where 1 indicates female, and 0 indicates male.
- Lab_0, Lab_50, and Lab_100 represent three consecutive laboratory results measured at day 0, 50, and 100, which correspond to a time-dependent predictor.
Step 3. Convert survival data to counting process form.
To convert the survival data labS to counting process form, execute the code below. This code converts Time to a risk interval (tStart,tStop] and combines three vectors of the time-dependent predictor, Lab_0, Lab_50, and Lab_100, into one vector, Lab.
mTime = [0 50 100]; % Measurement time threeLabs = [labS.Lab_0 labS.Lab_50 labS.Lab_100]; nLabMeasure = sum(sum(~isnan(threeLabs))); % Number of lab measurements data = zeros(nLabMeasure,6); % One row for each observation oID = 0; % Observation ID for i = 1 : size(labS,1) idx = find(mTime <= labS.Time(i)); for j = 1 : length(idx)-1 oID = oID + 1; data(oID,:) = [labS.ID(i) mTime(j:j+1) 1 labS.Sex(i) threeLabs(i,j)]; end oID = oID + 1; data(oID,:) = [labS.ID(i) mTime(length(idx)) labS.Time(i) ... labS.Censoring(i) labS.Sex(i) threeLabs(i,length(idx))]; end labCP = table(data(:,1),data(:,2),data(:,3),data(:,4),data(:,5),data(:,6), ... 'VariableNames', {'ID','tStart','tStop','Censoring','Sex','Lab'});
Display the survival data in counting process form.
labCP
labCP = 13x6 table ID tStart tStop Censoring Sex Lab __ ______ _____ _________ ___ ____ 1 0 46 0 1 0.3 2 0 50 1 0 0.2 2 50 100 1 0 0.23 2 100 138 1 0 0.39 3 0 50 1 1 0.18 3 50 94 0 1 0.22 4 0 50 1 0 0.21 4 50 50 0 0 0.2 5 0 50 1 0 0.25 5 50 100 1 0 0.21 5 100 106 0 0 0.42 6 0 50 1 0 0.21 6 50 98 0 0 0.22
In counting process form, each row of table labCP shows information of one observation corresponding to one risk interval. Note that a Cox model assumes Lab to be constant in the risk interval (tStart,tStop]. The value in Censoring is 0 if an event is observed at the end of the risk interval, and 1 if an event is not observed.
For example, patient 3 has two laboratory measurements at day 0 and 50, so there are two rows of data for patient 3 in counting process form. A Cox model assumes the lab results 0.18 and 0.22 to be constant in the interval (0,50] and (50,94], respectively. Censoring is 1 in (0,50] and 0 in (50,94] because the exact event time of patient 3 is observed at day 94.
Step 4. Adjust zero-length risk interval.
Find a patient who has a zero-length risk interval.
idxInvalid = labCP.ID(find(labCP.tStart == labCP.tStop))
idxInvalid = 4
Review the data for patient 4.
labCP(find(labCP.ID==idxInvalid),:)
ans = 2x6 table ID tStart tStop Censoring Sex Lab __ ______ _____ _________ ___ ____ 4 0 50 1 0 0.21 4 50 50 0 0 0.2
The time to event of patient 4 coincides with the measurement day 50. However, (50,50] is an invalid risk interval for a Cox model because the model does not accept a zero length interval. Adjust the risk interval to be valid. You can choose any value less than the time unit as an adjustment amount. The choice of an adjustment amount is arbitrary, and it does not change the result.
idxAdjust = find(labCP.ID==idxInvalid); labCP.tStop(idxAdjust(1)) = labCP.tStop(idxAdjust(1))-0.5; labCP.tStart(idxAdjust(2)) = labCP.tStart(idxAdjust(2))-0.5; labCP(idxAdjust,:)
ans = 2x6 table ID tStart tStop Censoring Sex Lab __ ______ _____ _________ ___ ____ 4 0 49.5 1 0 0.21 4 49.5 50 0 0 0.2
Step 5. Construct a Cox proportional hazards model.
Fit a Cox proportional hazards model with the time-independent variable Sex and time-dependent variable Lab.
X = [labCP.Sex labCP.Lab]; T = [labCP.tStart labCP.tStop]; b = coxphfit(X,T,'Censoring',labCP.Censoring,'Baseline',0)
b = 2.0054 29.7530
For details on how to assess a Cox proportional hazards model, see docid:stats_ug.btoo9tf-1.