Generalized Latent Variable Modeling by Skrondal and Rabe-Hesketh
Section 14.5: Physician advice and drinking

Endogenous treatment model

The data set used in this section is kenkel.dat. We would like to thank the Journal of Applied Econometrics for making these data used in Kenkel and Terza (2001) available at Journal of Applied Econometrics Data Archive.

The do-file is kenkel.do.

I. Easy method for estimating the model using gllamm wrapper ssm

The programs used are gllamm and ssm.

Use the command ssc describe gllamm and ssc describe ssm and follow instructions to download the programs. For more information on gllamm see http://www.gllamm.org and for more information on ssm see http://www.gllamm.org/wrappers.html.


ssm drinks advice black hieduc, s(advice = black hieduc hlthins regmed heart) /*
  */ adapt q(16) family(poiss) link(log)

II. More difficult method for estimating the model using gllamm

The program used is gllamm. The first three models can also be estimated using Stata's own commands poisson, xtpoisson and probit, but we use gllamm throughout to make the syntax of the final model easier to understand.

Use the command ssc describe gllamm and follow instructions to download gllamm. For more information on gllamm see http://www.gllamm.org.


Read data and display first ten records:

insheet using kenkel.dat, clear
list in 1/10, clean

. list in 1/10, clean

       drinks   advice   black   hlthins   regmed   heart   hieduc  
  1.       60        0       0         1        1       0        1  
  2.       84        0       0         1        1       0        0  
  3.        4        0       0         1        1       0        1  
  4.        0        1       0         1        0       0        1  
  5.       12        0       0         1        1       0        0  
  6.       12        0       0         1        1       0        0  
  7.        5        0       0         1        1       0        1  
  8.        7        0       0         1        1       0        0  
  9.        0        0       0         1        0       0        1  
 10.      1.5        1       0         1        1       1        1  

The dependent variable for the analysis is the number of alcoholic beverages consumed in the last two weeks. This is calculated as the product of self-reported drinking frequency (the number of days in the past two weeks with any drinking) and drinking intensity (the average number of drinks on a day with any drinking). We round this to the nearest integer to obtain a proper count.

replace drinks=round(drinks,1)


Collapse data and generate frequency weight variable wt2 to speed up estimation. The gllamm option weight(wt) will ensure that the data are weighted appropriately.

disp _N

2467


gen one=1
collapse (sum) wt2=one, by(black hlthins regmed heart hieduc drinks advice)
disp _N

737


gen id=_n
gen cons=1
list in 1/10, clean

       drinks   advice   black   hlthins   regmed   heart   hieduc   wt2   id   cons  
  1.        0        0       0         0        0       0        0     7    1      1  
  2.        0        1       0         0        0       0        0     3    2      1  
  3.        1        0       0         0        0       0        0     1    3      1  
  4.        1        1       0         0        0       0        0     1    4      1  
  5.        2        0       0         0        0       0        0     2    5      1  
  6.        2        1       0         0        0       0        0     3    6      1  
  7.        3        0       0         0        0       0        0     1    7      1  
  8.        4        0       0         0        0       0        0     4    8      1  
  9.        5        0       0         0        0       0        0     1    9      1  
 10.        6        0       0         0        0       0        0     1   10      1  

Poisson model for drinking for Table 14.7:

gllamm drinks advice cons hieduc black, i(id) weight(wt) family(poisson) link(log) /*
  */ nocons init 

number of level 1 units = 2467
 
Condition Number = 3.8352842
 
gllamm model
 
log likelihood = -32939.148
 
------------------------------------------------------------------------------
      drinks |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      advice |    .473367    .010918    43.36   0.000     .4519681    .4947658
        cons |   2.650541   .0084928   312.09   0.000     2.633896    2.667187
      hieduc |  -.1826093   .0107983   -16.91   0.000    -.2037736   -.1614451
       black |  -.3096866   .0168905   -18.33   0.000    -.3427914   -.2765818
------------------------------------------------------------------------------

Overdispersed Poisson model for drinking for Table 14.7 (iteration log not shown):

gllamm drinks advice cons hieduc black, i(id) weight(wt) family(poisson) link(log) /*
    */ nocons adapt nip(10) 

number of level 1 units = 2467
number of level 2 units = 2467
 
Condition Number = 3.9794068
 
gllamm model
 
log likelihood = -8857.8425
 
------------------------------------------------------------------------------
      drinks |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      advice |   .5954294   .0815787     7.30   0.000      .435538    .7553207
        cons |   1.429521   .0600222    23.82   0.000      1.31188    1.547162
      hieduc |   .0277592   .0740012     0.38   0.708    -.1172806     .172799
       black |  -.2835224   .1090915    -2.60   0.009    -.4973378    -.069707
------------------------------------------------------------------------------
 
 
Variances and covariances of random effects
------------------------------------------------------------------------------

 
***level 2 (id)
 
    var(1): 2.899832 (.1132476)
------------------------------------------------------------------------------

Probit model for advice for Table 14.7 (iteration log not shown):

gllamm advice cons hieduc black hlthins regmed heart, i(id) weight(wt) family(binomial) /*
    */ link(probit) nocons init

number of level 1 units = 2467
 
Condition Number = 6.4702064
 
gllamm model
 
log likelihood = -1419.9041
 
------------------------------------------------------------------------------
      advice |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cons |  -.4785403   .0849039    -5.64   0.000     -.644949   -.3121317
      hieduc |  -.2520195   .0560241    -4.50   0.000    -.3618247   -.1422143
       black |   .3031406   .0780889     3.88   0.000     .1500891    .4561921
     hlthins |  -.2708712   .0704249    -3.85   0.000    -.4089013    -.132841
      regmed |   .1801328   .0738763     2.44   0.015     .0353379    .3249278
       heart |   .1661613   .0757854     2.19   0.028     .0176246     .314698
------------------------------------------------------------------------------

Prepare data for endogeneous treatment model for Table 14.7:

stack drinking and advice into single variable resp and create a variable type = 1 for drinking and type = 2 for advice

rename drinks resp1 
gen resp2 = advice
reshape long resp, i(id) j(type)

(note: j = 1 2)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                      737   ->    1474
Number of variables                  11   ->      11
j variable (2 values)                     ->   type
xij variables:
                            resp1 resp2   ->   resp
-----------------------------------------------------------------------------

Create dummies d1 for type=1 (drinking) and d2 for type=2 (advice).

tab type, gen(d)
sort id type
list in 1/10, clean

       id   type   advice   black   hlthins   regmed   heart   hieduc   wt2   cons   resp   d1   d2  
  1.    1      1        0       0         0        0       0        0     7      1      0    1    0  
  2.    1      2        0       0         0        0       0        0     7      1      0    0    1  
  3.    2      1        1       0         0        0       0        0     3      1      0    1    0  
  4.    2      2        1       0         0        0       0        0     3      1      1    0    1  
  5.    3      1        0       0         0        0       0        0     1      1      1    1    0  
  6.    3      2        0       0         0        0       0        0     1      1      0    0    1  
  7.    4      1        1       0         0        0       0        0     1      1      1    1    0  
  8.    4      2        1       0         0        0       0        0     1      1      1    0    1  
  9.    5      1        0       0         0        0       0        0     2      1      2    1    0  
 10.    5      2        0       0         0        0       0        0     2      1      0    0    1  

Create interactions between d1 and covariates in drining model:

gen d1_advice = d1*advice
gen d1_hieduc = d1*hieduc
gen d1_black = d1*black


Create interactions between d2 and covariates in advice model (use foreach to save typing):

foreach var in hieduc black hlthins regmed heart {
    gen d2_`var' = d2*`var'
}


Endogenous treatment model for Table 14.7.

eq fac: d1 d2
gllamm resp d1_advice d1 d1_hieduc d1_black d2 d2_hieduc d2_black d2_hlthins /*
    */ d2_regmed d2_heart, nocons i(id) weight(wt) family(poisson binom)     /*
    */ link(log probit) fv(type) lv(type) eq(fac) adapt nip(15) 

number of level 1 units = 4934
number of level 2 units = 2467
 
Condition Number = 23.071575
 
gllamm model
 
log likelihood = -10254.241
 
------------------------------------------------------------------------------
        resp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   d1_advice |  -2.412288   .2308401   -10.45   0.000    -2.864727    -1.95985
          d1 |   2.323782   .0932252    24.93   0.000     2.141064      2.5065
   d1_hieduc |  -.2842432    .098614    -2.88   0.004     -.477523   -.0909633
    d1_black |   .1103129   .1439681     0.77   0.444    -.1718595    .3924853
          d2 |  -1.117855   .1629308    -6.86   0.000    -1.437194   -.7985166
   d2_hieduc |  -.3988385    .104094    -3.83   0.000    -.6028591   -.1948179
    d2_black |   .6040407   .1528167     3.95   0.000     .3045254     .903556
  d2_hlthins |  -.3270483   .0957591    -3.42   0.001    -.5147327   -.1393639
   d2_regmed |   .3851485   .1000492     3.85   0.000     .1890557    .5812413
    d2_heart |   .5077518   .1105428     4.59   0.000     .2910919    .7244118
------------------------------------------------------------------------------
 
 
Variances and covariances of random effects
------------------------------------------------------------------------------

 
***level 2 (id)
 
    var(1): 2.4751449 (.69051946)
 
    loadings for random effect 1
    d2: 1 (fixed)
    d1: 1.4303603 (.15172818)
 
------------------------------------------------------------------------------

There are some small discrepancies between these estimates and Table 14.7.

References

Kenkel, D. S. and Terza, J. V. (2001). The effect of physician advice on alcohol consumption: Count regression with an endogenous treatment effect. Journal of Applied Econometrics 16, 165-184.

Miranda, A. and Rabe-Hesketh, S. (2005). Maximum likelihood estimation of endogenous switching and sample selection models for binary, count, and ordinal variables. Submitted for publication.

Rabe-Hesketh, S., Skrondal, A. and Pickles, A. (2002). Reliable estimation of generalised linear mixed models using adaptive quadrature. The Stata Journal 2, 1-21.

Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models. Boca Raton, FL: Chapman & Hall/ CRC Press.

Outline
Datasets and do-files