We studied the consistency of the semi-parametric maximum likelihood estimator (SMLE) under the Cox regression model with right-censored (RC) data.
Consistency proofs of the MLE are often based on the Shannon-Kolmogorov inequality, which requires finite E(lnL), where L is the likelihood function.
The results of this study show that one property of the semi-parametric MLE (SMLE) is established.
Under the Cox model with RC data, E(lnL) may not exist. We used the Kullback-Leibler information inequality in our proof.
Open Peer Review Details | |||
---|---|---|---|
Manuscript submitted on 14-05-2020 |
Original Manuscript | Consistency of the Semi-parametric MLE under the Cox Model with Right-Censored Data |
We studied the consistency of the semi-parametric maximum likelihood estimator (SMLE) under the Cox model with right-censored (RC) data.
Let Y be a random survival time, X a p-dimensional random covariate. Conditional on X = x, Y satisfies the Cox model if its hazard function satisfies
(1.1) |
where h_{o} is the baseline hazard function, i.e., h_{o} (y) = f_{o} (y) /S_{o} (y-), f_{o} is a density function, S_{o} (y) = S(y|0) P (Y > y |X = 0), F_{o} = 1 - S_{o}, τ_{Y} = sup{t:S_{Y}(t) > 0}, h(y|x) = , S(·|·) f(·|·) orF(·|·)) is the conditional survival function (density function (df) or cumulative distribution function (cdf)) of Y given X = x. The restriction y<τ_{Y} is not in the original definition of the PH model, but is necessary if S_{o} is discontinuous at τ_{Y} (see Remark 1 [1Yu. Qiqing, "A note on the proportional hazards model with discontinuous data", Stat. Probab. Lett., vol. 77, no. 7, pp. 735-739.
[http://dx.doi.org/10.1016/j.spl.2006.11.008] ])
In this paper, we shall make use of the assumptions as follows:
AS1. Suppose that C is a random variable with the df f_{C} (t) and the survival function S_{C} (t), X takes at least p +1 values, say 0 , x_{1}, ..., x_{p}, where x_{1}, ..., x_{p} are linearly independent, (Y,X) and C are independent. Let (Y_{1},X_{1},C_{1}), ..., (Y_{n},X_{n},C_{n}) be i.i.d. random vectors from (Y,X,C). M = min(Y,C) and δ = 1(Y ≤ C), where 1(A) is the indicator function of the event A. Let (M_{1}, δ_{1}X_{1}), ..., (M_{n}, δ_{n}, X_{n}) be i.i.d. RC observations from (M, δ, X) with the df are as follows:
(1.2) |
and S(t|x) is a function of (S_{o}, β) (see Eq. (1.1)), but not f_{x} and f_{C} (the df’s of X and C).
Due to (AS1) and Eq. (1.2), the generalized likelihood function can be written as:
(1.3) |
which coincides with the standard form of the generalized likelihood [2J. Kiefer, "J and J. Wolfowitz. “Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters", Ann. Math. Stat., vol. 1, pp. 887-906.
[http://dx.doi.org/10.1214/aoms/1177728066] ]. Eq. (1.3) is identical to the next expression:
(1.4) |
where η_{n} = min{|M_{i}-M_{j}|: M_{i} ≠ M_{j}, i, j {1,2, ..., n}}. This form allows S_{o} to be arbitrary (discrete or continuous, or others), thus is more convenient in the later proofs. If Y is continuous then S(t|x) = (S(t|0))^{exp(}^{X'}^{β}^{)} = (S_{o} (t))^{exp(}^{X'}^{β}^{)}, but
(1.5) |
If Y is discrete then S(t|x) = ∏_{s≤t}(1 - h(s|x)) = ∏_{s≤t}(1 - h_{0 }(s)e^{X'β}) If Y has a mixture distribution, then S(t|x)= p (S_{01}(t))^{exp(X'β}) + (1 - p) ∏_{s≤t}(1 - h_{02}(s)e^{X'β} where p (0,1), h_{01} and h_{02} are two hazard functions. h_{0 } (t) = ph_{01} + (1 - p)h_{02} and S_{0 } (t) = pS_{01} + (1 - p)S_{02}
The SMLE of (S_{o}, β) maximizes L (S, b) overall possible survival function S and bR^{p}, denoted by (). The SMLE of S(t|x) is denoted by (t|x), which is a function of (). The computation issue of the SMLE under the Cox model has been studied, but its consistency has not been established under the model [3G.Y.C. Wong, M.P. Osborne, Q.G. Diao, and Q.Q. Yu, "The piece-wise cox model with right-censored data", Comm. Statist. Comput. Simul., vol. 46, pp. 7894-7908.
[http://dx.doi.org/10.1080/03610918.2016.1255968] ]. Their simulation results suggest that the SMLE is more efficient than the partial likelihood estimator under the Cox model.
The partial likelihood estimator is a common estimator under the Cox model, which maximizes the partial likelihood: , where D is the collection of indices of the exact observations and R_{i} is the risk set {j: M_{j} ≥ Y_{j}}. The asymptotic properties of the estimator are well understood [4D.R. Cox, and D. Oakes, Analysis of Survival Data., Chapman & Hall NY, .].
The consistency of the SMLE under the continuous Cox model with interval-censored (IC) data has been established, making use of the following result [5Q.Q. Yu, and Q.G Diao, "Consistency of the semi-parametric MLE under the Cox model with linearly time-dependent covariates and interval-censored data", J. Adv.Stat., vol. 4, no. 1, .]:
The Shannon-Kolmogorov (S-K) inequality. Let f_{o} and f be two densities with respect to (w.r.t.) a measure μ and ∫ f_{0 } (t)ln f_{0 } (t)dμ(t) is finite. Then, ∫ f_{0 } (t)ln f_{0 } (t)dμ(t) ≥ ∫ f_{0 } (t)ln f (t)dμ(t), with equality iff f = f_{o} a.e. w.r.t. μ.
Under the Cox model with IC data, the S-K inequality becomes E (lnL(S_{o}, β)) ≥E (lnL(S, b)) (S, b), where L(∙, ∙) is the likelihood function of the Cox model with IC data, which is different from L ( ∙, ∙) in Eq. (1.3) and S is a baseline survival function and bR^{p}. Their approach cannot be extended to the Cox model with RC data as the key assumption (in the S-K inequality) [3G.Y.C. Wong, M.P. Osborne, Q.G. Diao, and Q.Q. Yu, "The piece-wise cox model with right-censored data", Comm. Statist. Comput. Simul., vol. 46, pp. 7894-7908.
[http://dx.doi.org/10.1080/03610918.2016.1255968] ].
That is, finite E (lnL (S_{o}, β)), may not hold. Indeed, if Y has a df and β = 0, then L
A related inequality is as follows.
The Kullback-Leibler (K-L) information inequality. Let f_{o} and f be two densities w.r.t. a measure μ. Then ∫ f_{0 }(t)ln (f_{0 }/f)(t)dμ(t) ≥ 0, with equality iff f = f_{o}a.e. w.r.t. μ.
The K-L inequality says that ∫ f_{0 }(t)ln (f_{0 }/f)(t)dμ(t) exists, though it maybe ∞. The two inequalities are not equivalent. In fact,
In this note, we show that the SMLE under the Cox model is consistent, making use of the Kullback-Leibler information inequality [6S. Kullback, and R.A. Leibler, "On information and sufficiency", Ann. Math. Stat., vol. 22, pp. 79-86.
[http://dx.doi.org/10.1214/aoms/1177729694] ]
2. The Main Results. Notice that under the assumption that h_{o} exists, S_{o}, f_{o}, F_{o} and h_{o} are equivalent, in the sense that given one of them, the other 3 functions can be derived. Thus, the Cox model is applicable only to the distributions that the density functions exist, that is, Y is either continuous, or discrete, or the mixture of the previous two. Since the expression of S(t|x) varies in these three cases, for simplicity, we only prove the consistency of the SMLE under the Cox model in the first two cases.
Theorem 1. Under the Cox model with RC data, if Y is either continuous or discrete, and ifS_{o} (τ_{M}) <1, then the SMLE () is consistent t D (see Eq. (1.2)).
The proof of Theorem 1 makes use of a modified K-L inequality. K-L inequality requires that f_{0 } and f are both densities w.r.t. the measure μ. That is ∫ f(t)dμ(t = 1. However, in our case, we encounter the case that ∫ f(t)dμ(t) [0,1].
Lemma 1 (the modified K-L inequality). If f_{i} ≥ 0, μ_{1} is a measure, ∫ f_{1}(t)dμ_{1}(t = 1 and ∫ f_{2}(t)dμ_{1}(t ≤ 1, then ∫ f_{1}(t)ln dμ_{1}(t) ≥ 0, with equality iff f_{1} = f_{2} a.e. w.r.t. μ_{1}.
Proof. In view of the K-L inequality, it suffices to prove the inequality ∫ f_{1}(t)ln dμ_{1}(t) ≥ 0 under the additional assumptions that ∫ f_{2}(t)dμ_{1}(t < 1, ∫ f_{1}(t)dμ_{2}(t = 0 and ∫ f_{2}(t)dμ(t < 1, where μ_{2} is a measure and μ = μ_{1} + μ_{2} Since ∫ f_{2}(t)dμ(t) = 1, f_{1} and f_{2} are df's w.r.t. μ.
Proof of Theorem 1. Let Ω_{0 } be the subset of the sample space Ω such that the empirical distribution function (edf) , (t, s, x) based on (M_{i}, δ_{i}, X_{i}) converges to F(t,s,x), the cdf of (M, δ, X). It is well-known that P(Ω_{0},) =1. Notice that the SMLE () is a function of (ω, n), say (_{o,n} (t)(ω), _{o,n} (t_{n})(ω) , where ω Ω and n is the sample size. Hereafter, fix an ω Ω_{0 }, since (=_{n}(ω)) is a sequence of vectors in R^{p}, there is a convergent subsequence with the limit β*, where the components of β_{*} can be ±∞. Moreover, S_{o} (= S_{o}_{,}_{n} (∙)(ω)) is a sequence of bounded non-increasing functions, Helly’s selection theorem ensures that given any subsequence of _{o}, there exists a further subsequence which is convergent. Without loss of generality (WLOG), we assume that _{o} → S_{*} and → β_{*}. Of course, (β_{*}, S_{*}) depends on ω( Ω_{0 }). We prove in Theorem 2 for the discrete case and in Theorem 3 for the continuous case that:
(2.1) |
Since ω can be arbitrary in Ω_{0 } and P(Ω_{0 }) = 1, the SMLE is consistent.
Before we prove Theorems 2 and 3, we present a preliminary result.
Lemma 2 (Proposition 17 in Royden (1968), page 231). Suppose thatμ_{n} is a sequence of measures on the measurable space (J, ) such that μ_{n}(B) μ(B),B, g_{n} and f_{n} are non-negative measurable functions, and (f_{n}, g_{n})(x) = (f, g)(x) Then,
Corollary 1. Suppose that μ_{n} is a sequence of measures on the measurable space (J , B) such thatμ_{n} (B) → μ (B), B, f andf_{n} (n ≥ 1) are integrable functions that are bounded below andf(x)_{n→∞} = lim f_{n}(x). Then ∫ f dμ ≤ lim_{n→∞} ∫ f_{n} dμ_{n}.
Proof. Let k = inf_{n} inf_{x}f_{n}(x). If k ≥ 0 then the corollary follows from Lemma 2. Otherwise, let f_{n}^{-}(x) = 0 Λ f_{n}(x), f_{n}^{+}(x) = 0 v f_{n}(x), f^{-}(x) = 0 Λ f(x) and f^{ +}(x) = 0 v f(x). Then, f_{n}^{+} → f ^{+} and f_{n}^{-} → f ^{-} point wisely, as, f_{n} → f
lim_{n→∞} ∫ f_{n} dμ_{n} = lim_{n→∞} ∫ (f_{n}^{+} + f_{n}^{-})dμ_{n} = lim_{n→∞} [∫ f_{n}^{+} dμ_{n} + f_{n}^{-} dμ_{n}] ≥ ∫ lim_{n→∞}f_{n}^{+} dμ + ∫ lim_{n→∞}f_{n}^{-} dμ (by Lemma2, as f_{n}^{+} (x) is nonnegative and |f^{-} (x)| ≤ k) = ∫ f^{ +} dμ + ∫ f^{-} dμ = ∫ (f^{ +} + f^{-})dμ = ∫ f dμ.
Theorem 2. Under the discrete Cox model with RC data, Eq. (2.1) holds.
Proof. For the given ω Ω_{0 } and (S_{*}, β_{*}) in the proof of Theorem 1, as assumed, () (ω) → (S_{*}, β_{*}). Defining h_{*}(t) = and h_{*}(t|x) = h_{*}(t)^{eβ*'x} (for S_{*}(t -) > 0) yeilds S_{*}(t|x) and f_{*}(t|x), which are continuous functions of S_{*} and β_{*}. Consequently, (·|·) → S_{*}(·|·).
Let G_{n}(S_{0 }, β) = lnL(S_{0 }, β)/n (see Eq.(1.3)). Then, the SMLE () satisfies
(2.2) |
. |
where B is a measurable set in R^{p+1}. To apply Lemma 2,
(2.3) |
(2.4) |
(2.5) |
(2.6) |
(2.7) |
(2.8) |
(2.9) |
and v_{n} converges set wisely to a finite measure v (see (2.9)), by a similar argument as in (2.4), (2.6), (2.7) and (2.8), we have:
(2.10) |
Thus, ∫ lndF(t, 0, x) + ∫ lndF(t, 1, x). Hence, (S_{0 }(t),β) = (S_{*}(t),β)tD by the 2nd statement of the K-L inequality.
Theorem 3.Under the Cox model with RC data, if Y is continuous then Eq. (2.1) holds.
Proof. For the given ωΩ_{0} and (S_{*},β_{*}) in the proof of Theorem 1, as well as (ω) and (t|x)(ω), we have S_{*}(t|x) = (S_{*}(t))^{exp(β*'x)}. By a similar argument as in proving Eq. (2.8), we can show:
(2.11) |
In view of Eq. (1.4) due to Y is continuous, we denote:
(2.12) |
(2.13) |
as S_{*} is a monotone function, S_{*}^{'} exists a.e., and so do S_{*}^{'}(t|x) and F_{*}^{'}(t|x). We have
(2.14) |
The reason is as follows. For each (t, x) such that F^{'}(t|x) > 0 and Eq. (2.13) holds,
F_{*}^{'}(t|x) /F^{'}(t|x) (=f_{*}(t|x) /f (t|x)) is finite. Then, there exists n_{o} such that G(t, x, n) < 1 + F_{*}^{'}(t|x) /F'(t|x) for n ≥ n_{o} . On the other hand, G(t, x, n) is finite for n =1, ..., n_{o} . Thus, G(t, x, n) < k for some k. Since Eq. (2.1) holds a.e. and ∫ 1dF(t, s, x) = 1, Eq. (2.14) holds.
We shall prove in Lemma 3 that
(2.15) |
. |
(2.16) |
. |
The last inequality further implies that ∫lnd F(t,0,x) + ∫lnd F(t,1,x) = 0. Thus, (S_{0 }(t),β) = (S_{*}(t),β_{*}) t D by the 2nd statement of the K-L inequality and by the assumption ASI.
Lemma 3. Inequality (2.15) holds.
Proof. Let k ≥ 1 and , where B is a measurable set and
. |
Not applicable.
Not applicable.
None.
The author declare no conflict of interest, financial or otherwise.
The author would like to thank the editor and two referees for their invaluable comments.
[1] | Yu. Qiqing, "A note on the proportional hazards model with discontinuous data", Stat. Probab. Lett., vol. 77, no. 7, pp. 735-739. [http://dx.doi.org/10.1016/j.spl.2006.11.008] |
[2] | J. Kiefer, "J and J. Wolfowitz. “Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters", Ann. Math. Stat., vol. 1, pp. 887-906. [http://dx.doi.org/10.1214/aoms/1177728066] |
[3] | G.Y.C. Wong, M.P. Osborne, Q.G. Diao, and Q.Q. Yu, "The piece-wise cox model with right-censored data", Comm. Statist. Comput. Simul., vol. 46, pp. 7894-7908. [http://dx.doi.org/10.1080/03610918.2016.1255968] |
[4] | D.R. Cox, and D. Oakes, Analysis of Survival Data., Chapman & Hall NY, . |
[5] | Q.Q. Yu, and Q.G Diao, "Consistency of the semi-parametric MLE under the Cox model with linearly time-dependent covariates and interval-censored data", J. Adv.Stat., vol. 4, no. 1, . |
[6] | S. Kullback, and R.A. Leibler, "On information and sufficiency", Ann. Math. Stat., vol. 22, pp. 79-86. [http://dx.doi.org/10.1214/aoms/1177729694] |