Thompson and Baker (1981). Composite link functions in generalized linear models. Journal of the Royal Statistical Society, Series C 30, 125-131.

Composite link model for blood type data.

The data set used in this section is blood.dta

The do-file is blood.do.

The program used is gllamm. Use the command ssc describe gllamm and follow instructions to download gllamm. For more information on gllamm see http://www.gllamm.org.

Read data

use blood, clear
list, clean

         y        lnN   A   O   B   cA   cAB   cO   cB   ind   clus   id  
  1.   179   6.045005   2   0   0    1     0    0    0     1      1    1  
  2.     6   6.045005   1   1   0    1     0    0    0     2      1    1  
  3.   202   6.045005   1   0   1    0     1    0    0     3      1    1  
  4.    35   6.045005   1   1   0    1     0    0    0     4      1    1  
  5.     0   6.045005   0   2   0    0     0    1    0     0      1    1  
  6.     0   6.045005   0   1   1    0     0    0    1     0      1    1  
  7.     0   6.045005   1   0   1    0     1    0    0     0      1    1  
  8.     0   6.045005   0   1   1    0     0    0    1     0      1    1  
  9.     0   6.045005   0   0   2    0     0    0    1     0      1    1  

The expected frequency for blood type A is given by the composite link

exp(lnN + 2a) + exp(lnN + a + o) + exp(lnN + o + a),

where a = ln(p) is the log relative frequency of gene A and o = ln(r) is the log relative frequency of gene O. The first term corresponds to genes (from mother and father, respectively) AA, the second to AO and the third to OA.
Estimate the model using gllamm with a composite link. The compos() option specifies:

gllamm y A O B, i(id) offset(lnN) fam(poiss) compos(clus ind cA cAB cO cB) init nocons eform

number of level 1 units = 9
 
Condition Number = 4.918322
 
gllamm model
 
log likelihood = -13.200915
 
------------------------------------------------------------------------------
           y |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           A |     .25156   .0172837   -20.09   0.000     .2198665    .2878221
           O |   .6984284   .0240529   -10.42   0.000     .6528414    .7471986
           B |   .0500116   .0077113   -19.43   0.000     .0369679    .0676577
         lnN |   (offset)
------------------------------------------------------------------------------

The coefficients are the estimates of the gene frequencies of A (0.25), O (0.90) and B (0.05).
Obtain predicted counts

gllapred mu, mu
list y mu, clean

         y          mu  
  1.   179   174.99317  
  2.     6   10.618296  
  3.   202   205.85252  
  4.    35   30.536004  
  5.     0           .  
  6.     0           .  
  7.     0           .  
  8.     0           .  
  9.     0           .  
Perform goodness-of-fit test by estimating saturated model

 * store current log-likelihood
local ll=e(ll)

 * dummies for the four counts:
gen n=_n if y>0
quietly tab n, gen(dummy)

 * estimate saturated model using poisson
quietly poisson y dummy*, nocons

 * compare log-likelihoods
local deviance = 2*(e(ll)-`ll')
local pval = chiprob(`deviance',1)
disp "deviance = " `deviance' ", p-value = " `pval'

deviance = 3.1732899, p-value = .82544424