www.gllamm.org

Data for

Generalized Latent Variable Modeling

Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models. Chapman & Hall/CRC.


Section 9.2

Data

xerop.dat (ASCII, no variable names)

Variables (names as in book):
id resp cons age xero cosine sine female height stunted time age1 season time2

Some Stata commands

* read the data:
infile id resp cons age xero cosine sine female /*
  */ height stunted time age1 season time2 using xerop.dat, clear

* GEE:
xtgee resp age xero female cosine sine height stunted, i(id) /*
  */ corr(exch) l(logit) f(binom) robust eform

* Random intercept model:
gllamm resp age xero female cosine sine height stunted, i(id) /*
  */ l(logit) f(binom) trace adapt
gllamm, eform

* do-file available: ichs.do

See here for more commands and output

Acknowledgement

We thank Al Sommer, Keith West, Joanne Katz, Scott Zeger and Patrick Heagerty for allowing us to make the data available.

References

Zeger, S. L. and Karim, M. R. (1991). Generalized linear models with random effects: A Gibbs sampling approach. Journal of the American Statistical Association 86, 79-86.

Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data. Oxford: Oxford University Press.

back to outline

Section 9.3

Data

mi.dat (ASCII, tab delimited, variable names)

Variables:
q[Q-wave]
l[LDH]
c[CPK]
h[History]
count

Some Stata commands

* read data:
insheet using mi.dat, clear

* prepare data for analysis using gllamm:
rename q y1
rename h y2
rename l y3
rename c y4
gen wt2 = count
gen patt=_n

reshape long y, i(patt) j(var)
tab var, gen(d)

* do-file available: : myoc.do
Click here for a talk including gllamm commands for these data.

Source and Reference

Rindskopf, D. and Rindskopf, W. (1986). The value of latent class analysis in medical diagnosis. Statistics in Medicine 5, 21-27.
back to outline

Section 9.4

Data

mislevy.dat (ASCII, tab delimited, variable names)

Variables (as in Table 9.4):
y1 y2 y3 y4 cwm cwf cbm cbf

Some Stata commands

* read data:
insheet using mislevy, clear

* do-file available: mislevy.do
See here for more commands and output

Source and Reference

Mislevy, R. J. (1985). Estimation of latent group effects. Journal of the American Statistical Association 80, 993-997.
back to outline

Section 9.5

Data

gum.dat (ASCII, tab delimited, variable names)

Variables (as in Table 9.8):
Study d1 n1 d0 n0

Some Stata commands

* read data:
insheet using gum.dat, clear

* do-file available: gum.do

See here for more commands and output

Source and Reference

Silagy, C. (2003). Nicotine replacement therapy for smoking cessation (Cochrane review). The Cochrane Library, Issue 4. Chichester: Wiley.
back to outline

Section 9.6

Data

wemp.dat (ASCII, no variable names)

Variables (as in book):
case y HUnemp Time Child1 Child5 Age
(y is response variable: wife's employment status)

Some Stata commands

* read data:
infile case y hunemp time child1 child5 age using wemp.dat, clear

Acknowledgement

We thank Dave Stott for providing us with these data.

Reference

Davies, R. B., Elias, P. and Penn, R. (1992). The relationship between a husband's unemployment and his wife's participation in the labour force. Oxford Bulletin of Economics and Statistics 54, 145-171.
back to outline

Section 9.7

Data

snow.dat (ASCII, no variable names)

Variables:
v6capture on trapping day 6 (1: yes, 0: no)
v5capture on trapping day 5 (1: yes, 0: no)
v4capture on trapping day 4 (1: yes, 0: no)
v3capture on trapping day 3 (1: yes, 0: no)
v2capture on trapping day 2 (1: yes, 0: no)
v1capture on trapping day 1 (1: yes, 0: no)
wt2frequency (-99 means missing)

Some Stata commands

* read data:
infile v6 v5 v4 v3 v2 v1 wt2 using snow.dat, clear

Sources and References

Agresti, A. (1994). Simple capture-recapture models permitting unequal catchability and variable sampling effort. Biometrics 50, 494-500.

Coull, B. A. and Agresti, A. (1999). The use of mixed logit models to reflect heterogeneity in capture-recapture studies. Biometrics 55, 294-301.

back to outline

Section 11.2

Data

dmft.dat (ASCII, no variable names)

Variables:
dmft1DMFT before treatment (not used)
dmft2response variable (DMFT after treatment)
maledummy for child being male
ethnicethnic group (1: brown, 2: white, 3: black)
schooltreatment group (1: educ, 2: all, 3: control, 4: enrich, 5: rinse, 6: hygiene)

Some Stata commands

* read data:
infile dmft1 dmft2 male ethnic school using dmft.dat, clear

* prepare data:
rename dmft2 y
tab school, gen(s)
rename s1 educ
rename s2 all
rename s4 enrich
rename s5 rinse
rename s6 hygiene

tab ethnic, gen(eth)
rename eth2 white
rename eth3 black

* Poisson model
poisson y educ enrich rinse hygiene all male white black

* Normal intercept model
gen id=_n
gllamm y educ enrich rinse hygiene all male white black, i(id) /*
  */ f(poiss) l(log) adapt

* ZIP model
zip y educ enrich rinse hygiene all male white black, inflate(_cons)

Acknowledgement

We would like to thank the Royal Statistical Society for making these data used in Bohning et al. (1999) available at Royal Statistical Society Datasets Website

Reference

Böhning, D., Ekkehart, D., Schlattmann, P., Mendonça, L. and Kircher, U. (1999). The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. Journal of the Royal Statistical Society, Series A 162, 195-209.
back to outline

Section 11.3

Data

epilep.dat (ASCII, tab delimited, variable names)

Variables (as in book):
subj y treat visit v4 lage lbas lbas_trt cons id
(y is response variable)

Some Stata commands

* read data
insheet using epilep.dat, clear
See here for a talk including gllamm commands for these data.

See also:
see Rabe-Hesketh, S., Skrondal, A. and Pickles, A. (2002). Reliable estimation of generalized linear mixed models using adaptive quadrature. The Stata Journal 2, 1-21.

References

Thall, P. F. and Vail, S. C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics 46, 657-671.

back to outline

Section 11.4

Data

lips.txt (ASCII, comma delimited, no variable names)

Variables:
oobserved count
eexpected count
smrcrude SMR
x[Agric]
r1-r56dummy for county a neighbour (for spatial modelling)

Some Stata commands

* read data
insheet o e smr x r1-r56 n using lips.txt, clear

* prepare data
gen area = _n
gen lne = ln(e)
replace x = (x-8.39)/10

* independence model (normal random effects )
gllamm o x, i(area) offset(lne) f(poiss) nip(15) adapt

Reference

Clayton, D. G. and Kaldor, J. (1987). Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics 43, 671-681.
back to outline

Section 12.4

Data

angina.dat (ASCII, tab delimited, variable names)

Variables:
Subjectsubject j
Testtest i
dosedose of ISDN (not used)
secondptime to angina in placebo condition
Secondstime to angina after treatment with ISDN
Uncensoredcensoring indicator (1=uncensored, 0=censored)
Bypassdummy for heart bypass
Afterconstructed variable as in book
Linconstructed variable as in book

Some Stata commands

* read data
insheet using angina.dat, clear

Source

Danahy, D. T., Burwell, D. T., Aranov, W. S. and Prakash, R. (1976). Sustained hemodynamic and antianginal effect of high dose oral isosorbide dinitrate. Circulation 55, 381-387.

Reference

Pickles, A. and Crouchley, R. (1995). A comparison of frailty models for multivariate survival data. Statistics in Medicine 14, 1447-1461.

back to outline

Section 13.4

Data

bes.dat (ASCII, tab delimited, variable names)

For each voting occasion there are three rows of data, one for each of the three parties.

Variables:
occvoting occasion
serialnovoter identifier
yearyear of election
constitidentifier for constituency
wt14weighting variable (not used)
male[Male] dummy for voter being male
age87voter's age in 1987
manual[Manual] dummy for father a manual worker
fvparty voted for (1=Conservative, 2=Labour, 3=Liberal)
rightposition of voter on left-right dimension
price[Inflation] perceived inflation since last elections
rlposition of party on left-right dimension
partypolitical party (1=Conservative, 2=Labour, 3=Liberal)
rankrank assigned to the party
age[Age] age at time of election divided by 10

Some Stata commands

* read data
insheet using bes.dat, clear

* do-file available: bes.do

Acknowledgement

We would like to thank Anthony Heath for allowing us to make the data available.

Source

British Election Panel 1987-1992 from the UK Data Archive.

Reference

Skrondal, A. and Rabe-Hesketh, S. (2003). Multilevel logistic regression for polytomous data and rankings. Psychometrika 68, 267-287.
back to outline

Section 13.5

Data

materia.dat (ASCII, no variable names)

Variables (same as first four columns of Table 13.5):
item1 item2 item3 item4 wt2

Some Stata commands

* read data
infile item1 item2 item3 item4 wt2 using materia.dat, clear
See gllamm manual (Section 9.4) for gllamm commands.

Reference

Croon, M. A. (1989). Latent class models for the analysis of rankings. In: G. De Soete and K. C. Klauer (Editors), New Developments in Psychological Choice Modeling. Amsterdam: Elsevier, pp 99-121.
back to outline

Section 13.6

Data

coff.dat (ASCII, tab delimited, variable names)

Variables:
indidentifier for choice situation
altalternative (see Table 13.9)
idsubject identifier
setchoice set (see Table 13.9)
choicealternative chosen (1,2,3)
brand1dummy for Phillips brand
brand2dummy for Braun brand
cap1dummy for capacity of 6 cups
cap2dummy for capacity of 10 cups
price1dummy for price being 39
price2dummy for price being 69
filterdummy for filter
thermdummy for thermos
chdummy for alternative chosen (1=yes, 0=no)

Dataset for predictions in Table 13.12

coffpred.dat (ASCII, tab delimited, variable names)

Variables (same as coff.dat):
ind alt id set brand1 brand2 cap1 cap2 price1 price2 filter therm ch

Some Stata commands

* read data
insheet using coff.dat, clear

* do-file available: coff.do
NOTE THAT THERE ARE ERRORS IN TABLE 13.10; see remarks

Acknowledgement

We would like to thank Michel Wedel for making the data available.

Reference

Haaijer, M. E., Wedel. M., Vriens, M. and Wansbeek, T. J. (1998). Utility covariances and context effects in conjoint MNP models. Marketing Science 17, 236-252.
back to outline

Section 14.2

Data

diet.dat (ASCII, tab delimited, variable names)

Variables (-99 is missing):
idsubject identifier
fiber1first fiber measurement
fiber2second fiber measurement
agetime to angina in placebo condition
bus[Transp] working for London Transport
chdcoronary heart disease (1:yes, 0:no)

Some Stata commands

* read data
insheet using diet.dat, clear

* recode -99 to missing
mvdecode _all, mv(-99)
See this paper on gllamm and cme syntax:

Rabe-Hesketh, S. and Skrondal, A. and Pickles, A. (2003). Maximum likelihood estimation of generalized linear models with covariate measurement error. The Stata Journal 3, 385-410.

Acknowledgement

We would like to thank David Clayton for making the data available.

References

Clayton, D. G. (1992). Models for the analysis of cohort and case-control studies with inaccurately measured exposures. In: J. H. Dwyer and M. Feinlieb and P. Lippert and H. Hoffmeister (Eds), Statistical Models for Longitudinal Studies on Health. New York: Oxford University Press.

Rabe-Hesketh, S., Pickles, and Skrondal, A. (2003). Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation. Statistical Modelling 3, 215-232.

back to outline

Section 14.3

Data

cervix.dat (ASCII, no variable names)

Variables:
Ddummy for case (cervical cancer) (1=case,0=control)
Xtrue exposure to herpes (1=yes,0=no,-9=missing)
Wmeasured exposure (1=yes,0=no)

Some Stata commands

* read data
infile D X W using cervix.dat, clear

* do-file available: cervix.do

See here for more commands and output

Reference and Source

Carroll, R. J., Gail, M. H. and Lubin, J. H. (1993). Case-control studies with errors in covariates. Journal of the American Statistical Association 88, 185-199.

back to outline

Section 14.4

Data

wjobs.dat (ASCII, no variable names)

Variables (names as in book):
depressdepression (response variable)
riskbaseline risk score
Txrandomized to receive job training (1:yes, 0:no). This is r_j
basedepbaseline depression score
ageage in years
motivatemotivation to attend training
educschool grade completed
assertassertiveness
singledummy for being single
econeconomic hardship
nonwhitedummy for not being white
x10not used
c1complier in treatment group (1: complied, 0: did not comply). This is c_j
c2not used

Some Stata commands

* read data
infile depress risk Tx basedep age motivate educ /*
  */ assert single econ nonwhite x10 c1 c2 using wjobs.dat, clear
  
* do-file available: cace.do

See here for more commands and output

Acknowledgement

We would like to thank Amiram Vinokur and Bengt Muthén for making the data available.

References

Vinokur, A. D., Price, R. H. and Schul, Y. (1995). Impact of JOBS intervention on unemployed workers varying in risk for depression. American Journal of Community Psychology 19, 543-562.

Little, R. J. A. and Yau, L. H. Y. (1998). Statistical techniques for analyzing data from prevention trials. Psychological Methods 3, 147-159.

back to outline

Section 14.5

Data

kenkel.dat (ASCII, tab delimited, variable names)

Variables (names as in book):
drinks advice black hlthins regmed heart hieduc

Some Stata commands

* read data
insheet using kenkel.dat, clear

* do-file available: kenkel.do

See here for more commands and output

Acknowledgement

We would like to thank the Journal of Applied Econometrics for making these data used in Kenkel and Terza (2001) available at Journal of Applied Econometrics Data Archive.

Reference

Kenkel, D. S. and Terza, J. V. (2001). The effect of physician advice on alcohol consumption: Count regression with an endogenous treatment effect. Journal of Applied Econometrics 16, 165-184.
back to outline

Section 14.6

Data

prothro.dat (ASCII, tab delimited, variable names)
prothros.dat (ASCII, tab delimited, variable names)

Variables in prothro.dat (marker data):
idsubject identifier
Nnumber of measurements of prothrobin marker
treattreatment group (1: prednisone, 0: placebo)
timetime when prothrobin was measured
proprothrobin (marker) measurement
varequals 1 (dummy for marker model)

Variables in prothros.dat (survival data):
idsubject identifier
Nnumber of measurements of prothrobin marker
treattreatment group (1: prednisone, 0: placebo)
timetime to death or censoring
deathdummy for death (versus censoring)
varequals 0 (dummy for marker model)

Some Stata commands

* read survival data:
insheet using prothros.dat, clear

* read marker data:
insheet using prothro.dat, clear

* do-file available: prothrobin.do

See here for explanations of commands and output

Acknowledgement

We thank Per Kragh Andersen for providing us with these data.

Reference

Andersen, P. K., Borgan, Ø., Gill, R. D. and Keiding, N. (1993). Statistical Models Based on Counting Processes. New York: Springer.
back to outline