### Generalized Latent Variable Modeling by Skrondal and Rabe-Hesketh Section 14.3: Herpes and cervical cancer: A latent class covariate measurement error model for a case-control study

The data set used in this section is cervix.dat and was taken from Carroll et al. (1993). Missing values of X are coded as -9.

The do-file is cervix.do.

Variables:
 D dummy for case (cervical cancer) (1=case,0=control) X true exposure to herpes (1=yes,0=no,-9=missing) W measured exposure (1=yes,0=no)

```
infile D X W using cervix.dat, clear

```

Estimates using true exposure only in validation sample (iteration log not shown):
```
logit D X if X~= -9

Logit estimates                                   Number of obs   =        115
LR chi2(1)      =       2.95
Prob > chi2     =     0.0860
Log likelihood =  -72.17832                       Pseudo R2       =     0.0200

------------------------------------------------------------------------------
D |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
X |   .6813592   .3999444     1.70   0.088    -.1025174    1.465236
_cons |  -1.011601   .2919371    -3.47   0.001    -1.583787   -.4394147
------------------------------------------------------------------------------
```

Estimates using imperfect measure of exposure:
```
logit D W

Logit estimates                                   Number of obs   =       2044
LR chi2(1)      =      23.93
Prob > chi2     =     0.0000
Log likelihood = -1321.3944                       Pseudo R2       =     0.0090

------------------------------------------------------------------------------
D |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
W |   .4528744   .0928123     4.88   0.000     .2709657    .6347831
_cons |   -.802962   .0656468   -12.23   0.000    -.9316275   -.6742966
------------------------------------------------------------------------------
```

Estimate joint model using gllamm for Table 14.3

Collapse and reshape data:
```
gen cons=1
collapse (sum) wt2=cons, by(D X W)
gen patt=_n
list patt D X W wt2, clean

patt   D    X   W   wt2
1.      1   0   -9   0   701
2.      2   0   -9   1   535
3.      3   0    0   0    33
4.      4   0    0   1    11
5.      5   0    1   0    16
6.      6   0    1   1    16
7.      7   1   -9   0   318
8.      8   1   -9   1   375
9.      9   1    0   0    13
10.     10   1    0   1     3
11.     11   1    1   0     5
12.     12   1    1   1    18
```

Stack responses X,W,D into variable y and create a key to which response is of which type, var=1,2,3, respectively:
```
gen y1 = X
gen y2 = W
gen y3 = D

reshape long y, i(patt) j(var)

(note: j = 1 2 3)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       12   ->      36
Number of variables                   8   ->       7
j variable (3 values)                     ->   var
xij variables:
y1 y2 y3   ->   y
-----------------------------------------------------------------------------

drop if y==-9

```

Create dummies for the three response types:
```
tab var, gen(d)

```

Dummy v for being in the validation sample and nv for not being in the validation sample:
```
gen v = X> -9
gen nv = 1-v

```

Interactions between variables and dummies for fixed part:
```
gen v_d1 = v*d1
gen X_v_d2 = X*v*d2
gen D_v_d2 = D*v*d2
gen D_nv_d2 = D*nv*d2
gen X_D_v_d2 = X*D*v*d2
gen X_v_d3 = X*v*d3

```

Interactions between variables and dummies for random part:
```
gen nv_d2 = nv*d2
* already defined: gen D_nv_d2 = D*nv*d2
gen nv_d3 = nv*d3

cons def 1 [z2_1_1]nv_d2=1
cons def 2 [z2_1_2]nv_d2=0
cons def 3 [pat1_1l]nv_d2 = [y]X_v_d2
cons def 4 [pat1_1l]D_nv_d2=[y]X_D_v_d2
cons def 5 [pat1_1l]nv_d3=[y]X_v_d3
cons def 6 [y]d1=[p2_1]_cons

```

Non-differential measurement error (a2=a3=0) for Table 14.3:
```
gllamm y d1 d2 X_v_d2 d3 X_v_d3, i(patt) l(logit) f(binom) nocons nip(2) /*

number of level 1 units = 4203
number of level 2 units = 2044

Condition Number = 13.474097

gllamm model with constraints:
( 1)  [z2_1_1]nv_d2 = 1
( 2)  [z2_1_2]nv_d2 = 0
( 3) - [y]X_v_d2 + [pat1_1l]nv_d2 = 0
( 4) - [y]X_v_d3 + [pat1_1l]nv_d3 = 0
( 5)  [y]d1 - [p2_1]_cons = 0

log likelihood = -2805.194770938573

------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
d1 |   .0062648   .1647347     0.04   0.970    -.3166092    .3291388
d2 |  -1.062541   .2244703    -4.73   0.000    -1.502495   -.6225876
X_v_d2 |   1.812779   .3653273     4.96   0.000     1.096751    2.528807
d3 |  -1.097045   .1487521    -7.37   0.000    -1.388593   -.8054958
X_v_d3 |   .9579213   .2365887     4.05   0.000      .494216    1.421627
------------------------------------------------------------------------------

Probabilities and locations of random effects
------------------------------------------------------------------------------

***level 2 (patt)

loc1: 1, 0
var(1): .24999755

nv_d2: 1.8127791 (.36532727)
nv_d3: .95792129 (.23658867)

prob: 0.5016, 0.4984
------------------------------------------------------------------------------
```

Response probabilities:
```
gllapred mu, mu

```

Probabilities of W for given X and D (Table 14.4):
```
list X D mu if v==1&d2==1, clean

X   D          mu
6.   0   0   .25682409
9.   0   0   .25682409
12.   1   0   .67923049
15.   1   0   .67923049
22.   0   1   .25682409
25.   0   1   .25682409
28.   1   1   .67923049
31.   1   1   .67923049
```

Posterior probabilities of true exposure X (Table 14.5):
```
gllapred p, p
sort patt
list W D p1 p2 if nv==1, clean

W   D          p1          p2
1.   0   0   .23651644   .76348356
2.   0   0   .23651644   .76348356
3.   1   0   .65495805   .34504195
4.   1   0   .65495805   .34504195
17.   0   1   .44671496   .55328504
18.   0   1   .44671496   .55328504
19.   1   1   .83185431   .16814569
20.   1   1   .83185431   .16814569

drop p1 p2 mu

```

Differential measurement error for Table 14.3:
```
gllamm y d1 d2 X_v_d2 D_v_d2 D_nv_d2 X_D_v_d2 d3 X_v_d3, i(patt) /*
*/ l(logit) f(binom) nocons nip(2) /*

number of level 1 units = 4203
number of level 2 units = 2044

Condition Number = 37.207216

gllamm model with constraints:
( 1)  [z2_1_1]nv_d2 = 1
( 2)  [z2_1_2]nv_d2 = 0
( 3) - [y]X_v_d2 + [pat1_1l]nv_d2 = 0
( 4) - [y]X_D_v_d2 + [pat1_1l]D_nv_d2 = 0
( 5) - [y]X_v_d3 + [pat1_1l]nv_d3 = 0
( 6)  [y]d1 - [p2_1]_cons = 0

log likelihood = -2802.626223555175

------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
d1 |  -.0242144   .1826211    -0.13   0.895    -.3821452    .3337164
d2 |  -.7911943   .2586129    -3.06   0.002    -1.298066   -.2843224
X_v_d2 |     1.0986   .4962029     2.21   0.027     .1260605     2.07114
D_v_d2 |  -.6751662   .6907708    -0.98   0.328    -2.029052    .6787197
D_nv_d2 |  -.6587914   .6424359    -1.03   0.305    -1.917943    .6003598
X_D_v_d2 |   1.648709   .9550386     1.73   0.084    -.2231321     3.52055
d3 |  -.8937903   .2212583    -4.04   0.000    -1.327449    -.460132
X_v_d3 |   .6020423   .3966732     1.52   0.129    -.1754229    1.379508
------------------------------------------------------------------------------

Probabilities and locations of random effects
------------------------------------------------------------------------------

***level 2 (patt)

loc1: 1, 0
var(1): .24996336

nv_d2: 1.0986002 (.49620285)
D_nv_d2: 1.6487091 (.95503858)
nv_d3: .60204233 (.39667322)

prob: 0.4939, 0.5061
------------------------------------------------------------------------------
```

Response probabilities:
```
gllapred mu, mu

```

Probabilities of W for given X and D (Table 14.4):
```
list X D mu if v==1&d2==1, clean

X   D          mu
6.   0   0   .31191228
9.   0   0   .31191228
12.   1   0   .57625194
15.   1   0   .57625194
22.   0   1   .18749643
25.   0   1   .18749643
28.   1   1   .78261125
31.   1   1   .78261125
```

Posterior probabilities of true exposure X (Table 14.5):
```
gllapred p, p
sort patt
list W D p1 p2 if nv==1, clean

W   D          p1          p2
1.   0   0   .32653164   .67346836
2.   0   0   .32653164   .67346836
3.   1   0    .5925908    .4074092
4.   1   0    .5925908    .4074092
17.   0   1   .27582333   .72417667
18.   0   1   .27582333   .72417667
19.   1   1   .85594791   .14405209
20.   1   1   .85594791   .14405209
```

References

Carroll, R. J., Gail, M. H. and Lubin, J. H. (1993). Case-control studies with errors in covariates. Journal of the American Statistical Association 88, 185-199.

Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models. Boca Raton, FL: Chapman & Hall/ CRC Press.

Outline
Datasets and do-files