Blood types example

Thompson and Baker (1981). Composite link functions in generalized linear models. Journal of the Royal Statistical Society, Series C 30, 125-131.

Composite link model for blood type data.

The data set used in this section is blood.dta

The program used is gllamm. Use the command ssc describe gllamm and follow instructions to download gllamm. For more information on gllamm see http://www.gllamm.org.

Read data


use blood, clear
list, clean

         y        lnN   A   O   B   cA   cAB   cO   cB   ind   clus   id  
  1.   179   6.045005   2   0   0    1     0    0    0     1      1    1  
  2.     6   6.045005   1   1   0    1     0    0    0     2      1    1  
  3.   202   6.045005   1   0   1    0     1    0    0     3      1    1  
  4.    35   6.045005   1   1   0    1     0    0    0     4      1    1  
  5.     0   6.045005   0   2   0    0     0    1    0     0      1    1  
  6.     0   6.045005   0   1   1    0     0    0    1     0      1    1  
  7.     0   6.045005   1   0   1    0     1    0    0     0      1    1  
  8.     0   6.045005   0   1   1    0     0    0    1     0      1    1  
  9.     0   6.045005   0   0   2    0     0    0    1     0      1    1

y are the observed frequencies of blood types (phenotypes) A, AB, O, B, respectively. This is the response variable of the model.
lnN is the natural log of the total number of subjects, to be used as an offset.
A, O, B together specify combinations of genes from the two parents, e.g. observation 1 is AA, observation 2 is AO. A, O and B will be used as explanatory variables in the model. (These are columns 2 to 4 of the X matrix on page 127 of Thompson and Baker, 1981).
cA, cAB, cO, and cB are the rows of the C matrix on page 127 of Thompson and Baker (1981).
- cA is a dummy for combinations of genes described by A, O, B being consistent with blood type A. This is the weight vector for the composite link of observation 1.
- cAB is a dummy for combinations of genes described by A, O, B being consistent with blood type AB. This is the weight vector for the composite link of observation 2.
- cO is a dummy for combinations of genes described by A, O, B being consistent with blood type O. This is the weight vector for the composite link of observation 3.
- cB is a dummy for combinations of genes described by A, O, B being consistent with blood type B. This is the weight vector for the composite link of observation 4.
ind will be used to associate the above four weight variables with the correct observations.
clus is an identifier for the groups of observations whose linear predictors may be compined into a single composite link.
id will be used in the i() option required by gllamm but it is arbitrary here.

The expected frequency for blood type A is given by the composite link

exp(lnN + 2a) + exp(lnN + a + o) + exp(lnN + o + a),
where a = ln(p) is the log relative frequency of gene A and o = ln(r) is the log relative frequency of gene O. The first term corresponds to genes (from mother and father, respectively) AA, the second to AO and the third to OA.

Estimate the model using gllamm with a composite link. The compos() option specifies:

the grouping variable clus such that linear predictors within a group can be combined with each other to form a composite link for an observation within the group,
the variable ind indicating to which response the composite links defined by the subsequent weight variables correspond: composite link based on weight variable 1 goes to where ind equals 1, etc.,
the weight variables cA cAB cO cB for the four composite links:


gllamm y A O B, i(id) offset(lnN) fam(poiss) compos(clus ind cA cAB cO cB) init nocons eform

number of level 1 units = 9
 
Condition Number = 4.918322
 
gllamm model
 
log likelihood = -13.200915
 
------------------------------------------------------------------------------
           y |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           A |     .25156   .0172837   -20.09   0.000     .2198665    .2878221
           O |   .6984284   .0240529   -10.42   0.000     .6528414    .7471986
           B |   .0500116   .0077113   -19.43   0.000     .0369679    .0676577
         lnN |   (offset)
------------------------------------------------------------------------------

The coefficients are the estimates of the gene frequencies of A (0.25), O (0.90) and B (0.05).

Obtain predicted counts


gllapred mu, mu
list y mu, clean

         y          mu  
  1.   179   174.99317  
  2.     6   10.618296  
  3.   202   205.85252  
  4.    35   30.536004  
  5.     0           .  
  6.     0           .  
  7.     0           .  
  8.     0           .  
  9.     0           .

Perform goodness-of-fit test by estimating saturated model


 * store current log-likelihood
local ll=e(ll)

 * dummies for the four counts:
gen n=_n if y>0
quietly tab n, gen(dummy)

 * estimate saturated model using poisson
quietly poisson y dummy*, nocons

 * compare log-likelihoods
local deviance = 2*(e(ll)-`ll')
local pval = chiprob(`deviance',1)
disp "deviance = " `deviance' ", p-value = " `pval'

deviance = 3.1732899, p-value = .82544424

Thompson and Baker (1981). Composite link functions in generalized linear models. Journal of the Royal Statistical Society, Series C 30, 125-131. Composite link model for blood type data.

Thompson and Baker (1981). Composite link functions in generalized linear models. Journal of the Royal Statistical Society, Series C 30, 125-131.

Composite link model for blood type data.