The data set used in this section is cervix.dat and was taken from Carroll et al. (1993). Missing values of X are coded as -9.
The do-file is cervix.do.
The program used is gllamm. Use the command ssc describe gllamm and follow instructions to download gllamm. For more information on gllamm see http://www.gllamm.org.
Variables:| D | dummy for case (cervical cancer) (1=case,0=control) |
| X | true exposure to herpes (1=yes,0=no,-9=missing) |
| W | measured exposure (1=yes,0=no) |
Read data
infile D X W using cervix.dat, clear
Estimates using true exposure only in validation sample (iteration log not shown):
logit D X if X~= -9
Logit estimates Number of obs = 115
LR chi2(1) = 2.95
Prob > chi2 = 0.0860
Log likelihood = -72.17832 Pseudo R2 = 0.0200
------------------------------------------------------------------------------
D | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
X | .6813592 .3999444 1.70 0.088 -.1025174 1.465236
_cons | -1.011601 .2919371 -3.47 0.001 -1.583787 -.4394147
------------------------------------------------------------------------------
Estimates using imperfect measure of exposure:
logit D W
Logit estimates Number of obs = 2044
LR chi2(1) = 23.93
Prob > chi2 = 0.0000
Log likelihood = -1321.3944 Pseudo R2 = 0.0090
------------------------------------------------------------------------------
D | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
W | .4528744 .0928123 4.88 0.000 .2709657 .6347831
_cons | -.802962 .0656468 -12.23 0.000 -.9316275 -.6742966
------------------------------------------------------------------------------
Estimate joint model using gllamm for Table 14.3
Collapse and reshape data:
gen cons=1
collapse (sum) wt2=cons, by(D X W)
gen patt=_n
list patt D X W wt2, clean
patt D X W wt2
1. 1 0 -9 0 701
2. 2 0 -9 1 535
3. 3 0 0 0 33
4. 4 0 0 1 11
5. 5 0 1 0 16
6. 6 0 1 1 16
7. 7 1 -9 0 318
8. 8 1 -9 1 375
9. 9 1 0 0 13
10. 10 1 0 1 3
11. 11 1 1 0 5
12. 12 1 1 1 18
Stack responses X,W,D into variable y and create a key to which response is of which type, var=1,2,3, respectively:
gen y1 = X
gen y2 = W
gen y3 = D
reshape long y, i(patt) j(var)
(note: j = 1 2 3)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 12 -> 36
Number of variables 8 -> 7
j variable (3 values) -> var
xij variables:
y1 y2 y3 -> y
-----------------------------------------------------------------------------
drop if y==-9
Create dummies for the three response types:
tab var, gen(d)
Dummy v for being in the validation sample and nv for not being in the validation sample:
gen v = X> -9 gen nv = 1-v
Interactions between variables and dummies for fixed part:
gen v_d1 = v*d1 gen X_v_d2 = X*v*d2 gen D_v_d2 = D*v*d2 gen D_nv_d2 = D*nv*d2 gen X_D_v_d2 = X*D*v*d2 gen X_v_d3 = X*v*d3
Interactions between variables and dummies for random part:
gen nv_d2 = nv*d2 * already defined: gen D_nv_d2 = D*nv*d2 gen nv_d3 = nv*d3 cons def 1 [z2_1_1]nv_d2=1 cons def 2 [z2_1_2]nv_d2=0 cons def 3 [pat1_1l]nv_d2 = [y]X_v_d2 cons def 4 [pat1_1l]D_nv_d2=[y]X_D_v_d2 cons def 5 [pat1_1l]nv_d3=[y]X_v_d3 cons def 6 [y]d1=[p2_1]_cons
Non-differential measurement error (a2=a3=0) for Table 14.3:
eq load: nv_d2 nv_d3
gllamm y d1 d2 X_v_d2 d3 X_v_d3, i(patt) l(logit) f(binom) nocons nip(2) /*
*/ ip(fn) eqs(load) constr(1 2 3 5 6) frload(1) weight(wt)
number of level 1 units = 4203
number of level 2 units = 2044
Condition Number = 13.474097
gllamm model with constraints:
( 1) [z2_1_1]nv_d2 = 1
( 2) [z2_1_2]nv_d2 = 0
( 3) - [y]X_v_d2 + [pat1_1l]nv_d2 = 0
( 4) - [y]X_v_d3 + [pat1_1l]nv_d3 = 0
( 5) [y]d1 - [p2_1]_cons = 0
log likelihood = -2805.194770938573
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
d1 | .0062648 .1647347 0.04 0.970 -.3166092 .3291388
d2 | -1.062541 .2244703 -4.73 0.000 -1.502495 -.6225876
X_v_d2 | 1.812779 .3653273 4.96 0.000 1.096751 2.528807
d3 | -1.097045 .1487521 -7.37 0.000 -1.388593 -.8054958
X_v_d3 | .9579213 .2365887 4.05 0.000 .494216 1.421627
------------------------------------------------------------------------------
Probabilities and locations of random effects
------------------------------------------------------------------------------
***level 2 (patt)
loc1: 1, 0
var(1): .24999755
loadings for random effect 1
nv_d2: 1.8127791 (.36532727)
nv_d3: .95792129 (.23658867)
prob: 0.5016, 0.4984
------------------------------------------------------------------------------
Response probabilities:
gllapred mu, mu
Probabilities of W for given X and D (Table 14.4):
list X D mu if v==1&d2==1, clean
X D mu
6. 0 0 .25682409
9. 0 0 .25682409
12. 1 0 .67923049
15. 1 0 .67923049
22. 0 1 .25682409
25. 0 1 .25682409
28. 1 1 .67923049
31. 1 1 .67923049
Posterior probabilities of true exposure X (Table 14.5):
gllapred p, p
sort patt
list W D p1 p2 if nv==1, clean
W D p1 p2
1. 0 0 .23651644 .76348356
2. 0 0 .23651644 .76348356
3. 1 0 .65495805 .34504195
4. 1 0 .65495805 .34504195
17. 0 1 .44671496 .55328504
18. 0 1 .44671496 .55328504
19. 1 1 .83185431 .16814569
20. 1 1 .83185431 .16814569
drop p1 p2 mu
Differential measurement error for Table 14.3:
eq load: nv_d2 D_nv_d2 nv_d3
gllamm y d1 d2 X_v_d2 D_v_d2 D_nv_d2 X_D_v_d2 d3 X_v_d3, i(patt) /*
*/ l(logit) f(binom) nocons nip(2) /*
*/ ip(fn) eqs(load) constr(1 2 3 4 5 6) frload(1) weight(wt)
number of level 1 units = 4203
number of level 2 units = 2044
Condition Number = 37.207216
gllamm model with constraints:
( 1) [z2_1_1]nv_d2 = 1
( 2) [z2_1_2]nv_d2 = 0
( 3) - [y]X_v_d2 + [pat1_1l]nv_d2 = 0
( 4) - [y]X_D_v_d2 + [pat1_1l]D_nv_d2 = 0
( 5) - [y]X_v_d3 + [pat1_1l]nv_d3 = 0
( 6) [y]d1 - [p2_1]_cons = 0
log likelihood = -2802.626223555175
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
d1 | -.0242144 .1826211 -0.13 0.895 -.3821452 .3337164
d2 | -.7911943 .2586129 -3.06 0.002 -1.298066 -.2843224
X_v_d2 | 1.0986 .4962029 2.21 0.027 .1260605 2.07114
D_v_d2 | -.6751662 .6907708 -0.98 0.328 -2.029052 .6787197
D_nv_d2 | -.6587914 .6424359 -1.03 0.305 -1.917943 .6003598
X_D_v_d2 | 1.648709 .9550386 1.73 0.084 -.2231321 3.52055
d3 | -.8937903 .2212583 -4.04 0.000 -1.327449 -.460132
X_v_d3 | .6020423 .3966732 1.52 0.129 -.1754229 1.379508
------------------------------------------------------------------------------
Probabilities and locations of random effects
------------------------------------------------------------------------------
***level 2 (patt)
loc1: 1, 0
var(1): .24996336
loadings for random effect 1
nv_d2: 1.0986002 (.49620285)
D_nv_d2: 1.6487091 (.95503858)
nv_d3: .60204233 (.39667322)
prob: 0.4939, 0.5061
------------------------------------------------------------------------------
Response probabilities:
gllapred mu, mu
Probabilities of W for given X and D (Table 14.4):
list X D mu if v==1&d2==1, clean
X D mu
6. 0 0 .31191228
9. 0 0 .31191228
12. 1 0 .57625194
15. 1 0 .57625194
22. 0 1 .18749643
25. 0 1 .18749643
28. 1 1 .78261125
31. 1 1 .78261125
Posterior probabilities of true exposure X (Table 14.5):
gllapred p, p
sort patt
list W D p1 p2 if nv==1, clean
W D p1 p2
1. 0 0 .32653164 .67346836
2. 0 0 .32653164 .67346836
3. 1 0 .5925908 .4074092
4. 1 0 .5925908 .4074092
17. 0 1 .27582333 .72417667
18. 0 1 .27582333 .72417667
19. 1 1 .85594791 .14405209
20. 1 1 .85594791 .14405209
Carroll, R. J., Gail, M. H. and Lubin, J. H. (1993). Case-control studies with errors in covariates. Journal of the American Statistical Association 88, 185-199.
Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized
Latent Variable Modeling: Multilevel, Longitudinal and Structural
Equation Models. Boca Raton, FL: Chapman & Hall/ CRC Press.
Outline
Datasets and do-files