www.gllamm.org

## How do I create frequency weights to speed up gllamm?

 Title Creating frequency weights for gllamm Author Minjeong Jeon and Sophia Rabe-Hesketh, University of California, Berkeley Date July 2012

Using frequency weights is a very useful and also an easy way to speed up `gllamm`. For instance, if you have several identical level-2 units, by using level-2 weights, `gllamm` could become enormously faster than without using the level-2 weights.

Using frequency weights means that the data are in collapsed form. Thus, creating frequency weights is the same as collapsing the data and the `collapse` command can be used to generate frequency weights.

Suppose there is a dataset that contains students nested within schools. In the imaginary dataset, you have four variables

```schid    stuid    y    sex
```
`schid` and `stuid` are school and student identifiers, respectively. `y` and `sex` are a binary response variable and a binary explanatory variable for the students. Many students in the same school will have the same response and sex; therefore, we can collapse the data and create level-1 frequency weights using the commands:
```generate cons=1
collapse (count) wt1 = cons, by(schid sex y)
```
Above, the `collapse` command creates the weight variable `wt1` by counting the number of cases in the combination of `schid`, `sex`, and `y`. Namely, `wt1` represents the number of cases who have the same response and sex in the same school.

Level-2 frequency weights are most likely to be useful when the response variable is binary and the number of level-1 units per level-2 unit is small. Examples are longitudinal data or item responses in item response models. The weights represent the number of level-2 units with the same set of responses and covariate values for its level-1 units. The data should be in wide form, with one row of data for each level-2 unit and separate variables for each level-1 unit. For example, consider a longitudinal dataset in wide form with variables

```y1    y2    y3    y4    sex
```
`y1` to `y4` are the responses at time-points 1 to 4 and `sex` is a binary explanatory variable. Many subjects will have the same sex and the same responses at the four time-points; therefore we can collapse the data and create level-2 frequency weights using the commands:
```generate cons=1
collapse (count) wt2 = cons, by(y1 y2 y3 y4 sex)
```

Before we can run `gllamm`, we must reshape the data to long form:

```generate pattern = _n
reshape long y, i(pattern) j(occasion)
```
Here `pattern` is now the new level-2 identifier that should be used in the `i()` option of the `gllamm` command.

### Examples and documentation

• Description of `weight()` option on p.22-23, and examples in Sections 3.2.2, 4.1.1-4.1.2, 8.3.1-8.3.2, 8.4, and 9.3 of Rabe-Hesketh, S., Skrondal, A. and Pickles, A. (2004). GLLAMM Manual. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 160.
• Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models. Chapman & Hall/CRC.
• Exercises 10.3, 10.7, 14.5, and 16.11 and p.929-930 in Rabe-Hesketh, S. and Skrondal, A. (2012). Multilevel and Longitudinal Modeling Using Stata (Third Edition). Volume II: Categorical Responses, Counts, and Survival. College Station, TX: Stata Press.