FAQs: Creating frequency weights for gllamm

www.gllamm.org

How do I create frequency weights to speed up gllamm?

Title		Creating frequency weights for gllamm
Author		Minjeong Jeon and Sophia Rabe-Hesketh, University of California, Berkeley
Date		July 2012

Using frequency weights is a very useful and also an easy way to speed up gllamm. For instance, if you have several identical level-2 units, by using level-2 weights, gllamm could become enormously faster than without using the level-2 weights.

Using frequency weights means that the data are in collapsed form. Thus, creating frequency weights is the same as collapsing the data and the collapse command can be used to generate frequency weights.

Suppose there is a dataset that contains students nested within schools. In the imaginary dataset, you have four variables

schid    stuid    y    sex

schid and stuid are school and student identifiers, respectively. y and sex are a binary response variable and a binary explanatory variable for the students. Many students in the same school will have the same response and sex; therefore, we can collapse the data and create level-1 frequency weights using the commands:

generate cons=1
collapse (count) wt1 = cons, by(schid sex y)

Above, the collapse command creates the weight variable wt1 by counting the number of cases in the combination of schid, sex, and y. Namely, wt1 represents the number of cases who have the same response and sex in the same school.

Level-2 frequency weights are most likely to be useful when the response variable is binary and the number of level-1 units per level-2 unit is small. Examples are longitudinal data or item responses in item response models. The weights represent the number of level-2 units with the same set of responses and covariate values for its level-1 units. The data should be in wide form, with one row of data for each level-2 unit and separate variables for each level-1 unit. For example, consider a longitudinal dataset in wide form with variables

y1    y2    y3    y4    sex

y1 to y4 are the responses at time-points 1 to 4 and sex is a binary explanatory variable. Many subjects will have the same sex and the same responses at the four time-points; therefore we can collapse the data and create level-2 frequency weights using the commands:

generate cons=1
collapse (count) wt2 = cons, by(y1 y2 y3 y4 sex)

Before we can run gllamm, we must reshape the data to long form:

generate pattern = _n
reshape long y, i(pattern) j(occasion)

Here pattern is now the new level-2 identifier that should be used in the i() option of the gllamm command.

Examples and documentation

Description of weight() option on p.22-23, and examples in Sections 3.2.2, 4.1.1-4.1.2, 8.3.1-8.3.2, 8.4, and 9.3 of Rabe-Hesketh, S., Skrondal, A. and Pickles, A. (2004). GLLAMM Manual. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 160.
Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models. Chapman & Hall/CRC.
- Section 9.4 Arithmetic reasoning: Item response models of the book
Exercises 10.3, 10.7, 14.5, and 16.11 and p.929-930 in Rabe-Hesketh, S. and Skrondal, A. (2012). Multilevel and Longitudinal Modeling Using Stata (Third Edition). Volume II: Categorical Responses, Counts, and Survival. College Station, TX: Stata Press.
- Solution to exercise 10.3
- Datasets for the book