Generalized Latent Variable Modeling by Skrondal and Rabe-Hesketh
Section 14.3: Herpes and cervical cancer:

A latent class covariate measurement error model for a case-control study

The data set used in this section is cervix.dat and was taken from Carroll et al. (1993). Missing values of X are coded as -9.

The do-file is cervix.do.

The program used is gllamm. Use the command ssc describe gllamm and follow instructions to download gllamm. For more information on gllamm see http://www.gllamm.org.

Variables:
Ddummy for case (cervical cancer) (1=case,0=control)
Xtrue exposure to herpes (1=yes,0=no,-9=missing)
Wmeasured exposure (1=yes,0=no)

Read data

infile D X W using cervix.dat, clear


Estimates using true exposure only in validation sample (iteration log not shown):

logit D X if X~= -9

Logit estimates                                   Number of obs   =        115
                                                  LR chi2(1)      =       2.95
                                                  Prob > chi2     =     0.0860
Log likelihood =  -72.17832                       Pseudo R2       =     0.0200

------------------------------------------------------------------------------
           D |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           X |   .6813592   .3999444     1.70   0.088    -.1025174    1.465236
       _cons |  -1.011601   .2919371    -3.47   0.001    -1.583787   -.4394147
------------------------------------------------------------------------------

Estimates using imperfect measure of exposure:

logit D W

Logit estimates                                   Number of obs   =       2044
                                                  LR chi2(1)      =      23.93
                                                  Prob > chi2     =     0.0000
Log likelihood = -1321.3944                       Pseudo R2       =     0.0090

------------------------------------------------------------------------------
           D |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           W |   .4528744   .0928123     4.88   0.000     .2709657    .6347831
       _cons |   -.802962   .0656468   -12.23   0.000    -.9316275   -.6742966
------------------------------------------------------------------------------

Estimate joint model using gllamm for Table 14.3

Collapse and reshape data:

gen cons=1
collapse (sum) wt2=cons, by(D X W)
gen patt=_n
list patt D X W wt2, clean

       patt   D    X   W   wt2  
  1.      1   0   -9   0   701  
  2.      2   0   -9   1   535  
  3.      3   0    0   0    33  
  4.      4   0    0   1    11  
  5.      5   0    1   0    16  
  6.      6   0    1   1    16  
  7.      7   1   -9   0   318  
  8.      8   1   -9   1   375  
  9.      9   1    0   0    13  
 10.     10   1    0   1     3  
 11.     11   1    1   0     5  
 12.     12   1    1   1    18  

Stack responses X,W,D into variable y and create a key to which response is of which type, var=1,2,3, respectively:

gen y1 = X
gen y2 = W
gen y3 = D


reshape long y, i(patt) j(var)

(note: j = 1 2 3)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       12   ->      36
Number of variables                   8   ->       7
j variable (3 values)                     ->   var
xij variables:
                               y1 y2 y3   ->   y
-----------------------------------------------------------------------------

drop if y==-9


Create dummies for the three response types:

tab var, gen(d)


Dummy v for being in the validation sample and nv for not being in the validation sample:

gen v = X> -9
gen nv = 1-v


Interactions between variables and dummies for fixed part:

gen v_d1 = v*d1
gen X_v_d2 = X*v*d2
gen D_v_d2 = D*v*d2
gen D_nv_d2 = D*nv*d2
gen X_D_v_d2 = X*D*v*d2
gen X_v_d3 = X*v*d3


Interactions between variables and dummies for random part:

gen nv_d2 = nv*d2
* already defined: gen D_nv_d2 = D*nv*d2
gen nv_d3 = nv*d3


cons def 1 [z2_1_1]nv_d2=1
cons def 2 [z2_1_2]nv_d2=0
cons def 3 [pat1_1l]nv_d2 = [y]X_v_d2
cons def 4 [pat1_1l]D_nv_d2=[y]X_D_v_d2
cons def 5 [pat1_1l]nv_d3=[y]X_v_d3
cons def 6 [y]d1=[p2_1]_cons


Non-differential measurement error (a2=a3=0) for Table 14.3:

eq load: nv_d2 nv_d3
gllamm y d1 d2 X_v_d2 d3 X_v_d3, i(patt) l(logit) f(binom) nocons nip(2) /*
  */ ip(fn) eqs(load) constr(1 2 3 5 6) frload(1)  weight(wt)

number of level 1 units = 4203
number of level 2 units = 2044
 
Condition Number = 13.474097
 
gllamm model with constraints:
 ( 1)  [z2_1_1]nv_d2 = 1
 ( 2)  [z2_1_2]nv_d2 = 0
 ( 3) - [y]X_v_d2 + [pat1_1l]nv_d2 = 0
 ( 4) - [y]X_v_d3 + [pat1_1l]nv_d3 = 0
 ( 5)  [y]d1 - [p2_1]_cons = 0
 
log likelihood = -2805.194770938573
 
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          d1 |   .0062648   .1647347     0.04   0.970    -.3166092    .3291388
          d2 |  -1.062541   .2244703    -4.73   0.000    -1.502495   -.6225876
      X_v_d2 |   1.812779   .3653273     4.96   0.000     1.096751    2.528807
          d3 |  -1.097045   .1487521    -7.37   0.000    -1.388593   -.8054958
      X_v_d3 |   .9579213   .2365887     4.05   0.000      .494216    1.421627
------------------------------------------------------------------------------
 
 
Probabilities and locations of random effects
------------------------------------------------------------------------------

 
***level 2 (patt)
 
    loc1: 1, 0
  var(1): .24999755
 
    loadings for random effect 1
    nv_d2: 1.8127791 (.36532727)
    nv_d3: .95792129 (.23658867)
 
    prob: 0.5016, 0.4984
------------------------------------------------------------------------------

Response probabilities:

gllapred mu, mu 


Probabilities of W for given X and D (Table 14.4):

list X D mu if v==1&d2==1, clean

       X   D          mu  
  6.   0   0   .25682409  
  9.   0   0   .25682409  
 12.   1   0   .67923049  
 15.   1   0   .67923049  
 22.   0   1   .25682409  
 25.   0   1   .25682409  
 28.   1   1   .67923049  
 31.   1   1   .67923049  

Posterior probabilities of true exposure X (Table 14.5):

gllapred p, p
sort patt
list W D p1 p2 if nv==1, clean

       W   D          p1          p2  
  1.   0   0   .23651644   .76348356  
  2.   0   0   .23651644   .76348356  
  3.   1   0   .65495805   .34504195  
  4.   1   0   .65495805   .34504195  
 17.   0   1   .44671496   .55328504  
 18.   0   1   .44671496   .55328504  
 19.   1   1   .83185431   .16814569  
 20.   1   1   .83185431   .16814569  

drop p1 p2 mu


Differential measurement error for Table 14.3:

eq load: nv_d2 D_nv_d2 nv_d3
gllamm y d1 d2 X_v_d2 D_v_d2 D_nv_d2 X_D_v_d2 d3 X_v_d3, i(patt) /*
  */ l(logit) f(binom) nocons nip(2) /*
  */ ip(fn) eqs(load) constr(1 2 3 4 5 6) frload(1) weight(wt)

number of level 1 units = 4203
number of level 2 units = 2044
 
Condition Number = 37.207216
 
gllamm model with constraints:
 ( 1)  [z2_1_1]nv_d2 = 1
 ( 2)  [z2_1_2]nv_d2 = 0
 ( 3) - [y]X_v_d2 + [pat1_1l]nv_d2 = 0
 ( 4) - [y]X_D_v_d2 + [pat1_1l]D_nv_d2 = 0
 ( 5) - [y]X_v_d3 + [pat1_1l]nv_d3 = 0
 ( 6)  [y]d1 - [p2_1]_cons = 0
 
log likelihood = -2802.626223555175
 
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          d1 |  -.0242144   .1826211    -0.13   0.895    -.3821452    .3337164
          d2 |  -.7911943   .2586129    -3.06   0.002    -1.298066   -.2843224
      X_v_d2 |     1.0986   .4962029     2.21   0.027     .1260605     2.07114
      D_v_d2 |  -.6751662   .6907708    -0.98   0.328    -2.029052    .6787197
     D_nv_d2 |  -.6587914   .6424359    -1.03   0.305    -1.917943    .6003598
    X_D_v_d2 |   1.648709   .9550386     1.73   0.084    -.2231321     3.52055
          d3 |  -.8937903   .2212583    -4.04   0.000    -1.327449    -.460132
      X_v_d3 |   .6020423   .3966732     1.52   0.129    -.1754229    1.379508
------------------------------------------------------------------------------
 
 
Probabilities and locations of random effects
------------------------------------------------------------------------------

 
***level 2 (patt)
 
    loc1: 1, 0
  var(1): .24996336
 
    loadings for random effect 1
    nv_d2: 1.0986002 (.49620285)
    D_nv_d2: 1.6487091 (.95503858)
    nv_d3: .60204233 (.39667322)
 
    prob: 0.4939, 0.5061
------------------------------------------------------------------------------

Response probabilities:

gllapred mu, mu 


Probabilities of W for given X and D (Table 14.4):

list X D mu if v==1&d2==1, clean

       X   D          mu  
  6.   0   0   .31191228  
  9.   0   0   .31191228  
 12.   1   0   .57625194  
 15.   1   0   .57625194  
 22.   0   1   .18749643  
 25.   0   1   .18749643  
 28.   1   1   .78261125  
 31.   1   1   .78261125  

Posterior probabilities of true exposure X (Table 14.5):

gllapred p, p
sort patt
list W D p1 p2 if nv==1, clean

       W   D          p1          p2  
  1.   0   0   .32653164   .67346836  
  2.   0   0   .32653164   .67346836  
  3.   1   0    .5925908    .4074092  
  4.   1   0    .5925908    .4074092  
 17.   0   1   .27582333   .72417667  
 18.   0   1   .27582333   .72417667  
 19.   1   1   .85594791   .14405209  
 20.   1   1   .85594791   .14405209  

References

Carroll, R. J., Gail, M. H. and Lubin, J. H. (1993). Case-control studies with errors in covariates. Journal of the American Statistical Association 88, 185-199.

Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models. Boca Raton, FL: Chapman & Hall/ CRC Press.

Outline
Datasets and do-files