Thompson and Baker (1981). Composite link functions in generalized linear models.

Composite link model for blood type data.

The data set used in this section is **blood.dta**

The do-file is **blood.do**.

The program used is **gllamm**.
Use the command **ssc describe gllamm**
and follow instructions to download **gllamm**.
For more information on **gllamm** see
http://www.gllamm.org.

Read data

use blood, clear list, cleany lnN A O B cA cAB cO cB ind clus id 1. 179 6.045005 2 0 0 1 0 0 0 1 1 1 2. 6 6.045005 1 1 0 1 0 0 0 2 1 1 3. 202 6.045005 1 0 1 0 1 0 0 3 1 1 4. 35 6.045005 1 1 0 1 0 0 0 4 1 1 5. 0 6.045005 0 2 0 0 0 1 0 0 1 1 6. 0 6.045005 0 1 1 0 0 0 1 0 1 1 7. 0 6.045005 1 0 1 0 1 0 0 0 1 1 8. 0 6.045005 0 1 1 0 0 0 1 0 1 1 9. 0 6.045005 0 0 2 0 0 0 1 0 1 1

**y**are the observed frequencies of blood types (phenotypes) A, AB, O, B, respectively. This is the response variable of the model.**lnN**is the natural log of the total number of subjects, to be used as an offset.-
**A**,**O**,**B**together specify combinations of genes from the two parents, e.g. observation 1 is AA, observation 2 is AO.**A**,**O**and**B**will be used as explanatory variables in the model. (These are columns 2 to 4 of the*X*matrix on page 127 of Thompson and Baker, 1981). -
**cA**,**cAB**,**cO**, and**cB**are the rows of the*C*matrix on page 127 of Thompson and Baker (1981).**cA**is a dummy for combinations of genes described by**A**,**O**,**B**being consistent with blood type A. This is the weight vector for the composite link of observation 1.**cAB**is a dummy for combinations of genes described by**A**,**O**,**B**being consistent with blood type AB. This is the weight vector for the composite link of observation 2.**cO**is a dummy for combinations of genes described by**A**,**O**,**B**being consistent with blood type O. This is the weight vector for the composite link of observation 3.**cB**is a dummy for combinations of genes described by**A**,**O**,**B**being consistent with blood type B. This is the weight vector for the composite link of observation 4.

**ind**will be used to associate the above four weight variables with the correct observations.**clus**is an identifier for the groups of observations whose linear predictors may be compined into a single composite link.**id**will be used in the**i()**option required by**gllamm**but it is arbitrary here.

The expected frequency for blood type A is given by the composite link

exp(ln N+ 2a) + exp(lnN+a+o) + exp(lnN+o+a),

wherea= ln(p) is the log relative frequency of gene A ando= ln(r) is the log relative frequency of gene O. The first term corresponds to genes (from mother and father, respectively) AA, the second to AO and the third to OA.

Estimate the model usinggllammwith a composite link. Thecompos()option specifies:

- the grouping variable
clussuch that linear predictors within a group can be combined with each other to form a composite link for an observation within the group,- the variable
indindicating to which response the composite links defined by the subsequent weight variables correspond: composite link based on weight variable 1 goes to whereindequals 1, etc.,- the weight variables
cAcABcOcBfor the four composite links:

gllamm y A O B, i(id) offset(lnN) fam(poiss) compos(clus ind cA cAB cO cB) init nocons eformnumber of level 1 units = 9 Condition Number = 4.918322 gllamm model log likelihood = -13.200915 ------------------------------------------------------------------------------ y | exp(b) Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- A | .25156 .0172837 -20.09 0.000 .2198665 .2878221 O | .6984284 .0240529 -10.42 0.000 .6528414 .7471986 B | .0500116 .0077113 -19.43 0.000 .0369679 .0676577 lnN | (offset) ------------------------------------------------------------------------------

The coefficients are the estimates of the gene frequencies of A (0.25), O (0.90) and B (0.05).

Obtain predicted counts

gllapred mu, mu list y mu, cleany mu 1. 179 174.99317 2. 6 10.618296 3. 202 205.85252 4. 35 30.536004 5. 0 . 6. 0 . 7. 0 . 8. 0 . 9. 0 .

Perform goodness-of-fit test by estimating saturated model

* store current log-likelihood local ll=e(ll) * dummies for the four counts: gen n=_n if y>0 quietly tab n, gen(dummy) * estimate saturated model using poisson quietly poisson y dummy*, nocons * compare log-likelihoods local deviance = 2*(e(ll)-`ll') local pval = chiprob(`deviance',1) disp "deviance = " `deviance' ", p-value = " `pval'deviance = 3.1732899, p-value = .82544424