by Terry M. Therneau Ph.D.

Faculty, Mayo Clinic

About a year ago there was a query about how to do "type 3" tests for a Cox model on the R help list, which someone wanted because SAS does it. The SAS addition looked suspicious to me, but as the author of the survival package I thought I should understand the issue more deeply. It took far longer than I expected but has been illuminating.

First off, what exactly is this 'type 3' computation of which SAS so deeply enamored? Imagine that we are dealing with a data set that has interactions. In my field of biomedical statistics all data relationships have interactions: an effect is never precisely the same for young vs old, fragile vs robust, long vs short duration of disease, etc. We may not have the sample size or energy to model them, but they exist nonetheless. Assume as an example that we had a treatment effect that increases with age; how then would one describe a main effect for treatment? One approach is to select an age distribution of interest and use the mean treatment effect, averaged over that age distribution.

To compute this, one can start by fitting a sufficiently rich model, get predicted values for our age distribution, and then average them. This requires almost by definition a model that includes an age by treatment interaction: we need reasonably unbiased estimates of the treatment effects at individual ages a,b,c,... before averaging, or we are just fooling ourselves with respect to this overall approach. The SAS type 3 method for linear models is exactly this. It assumes as the "reference population of interest" a uniform distribution over any categorical variables and the observed distribution of the data set for any continuous ones, followed by a computation of the average predicted value. Least squares means are also an average prediction taken over the reference population.

A primary statistical issue with type 3 is the choice of reference. Assume for instance that age had been coded as a categorical with levels of 50-59, 60-69, 70-79 and 80+. A type 3 test answers the question of what the treatment effect would be in a population of subjects in which 1/4 were aged 50-59, another 1/4 were 60-69, etc. Since I will never encounter a set of subjects with said pattern in real life, such an average is irrelevant . A nice satire of the situation can be found under the nom de plume of Guernsey McPearson (Also have a look at *Multi-Centre Trials and the Finally Decisive Argument*). To be fair there are other cases where the uniform distribution is precisely the right population, e.g., a designed experiment that lost perfect balance due to a handful of missing response values. But these are rare to non-existent in my world, and type 3 remains an answer to the question that nobody asked.

Average population prediction also highlights a serious deficiency in R. Working out the algebra, type 3 tests for a linear model turn out to be a contrast, C %*% coef(fit), for a particular contrast vector or matrix C. This fits neatly into the SAS package, which has a simple interface for user specified contrasts. (The SAS type 3 algorithm is at its heart simply an elegant way to derive C for their default reference population.) The original S package took a different view, which R has inherited, of pre instead of post-processing. Several of the common contrasts one might want to test can be obtained by clever coding of the design matrix X, before the fit, causing the contrast of interest to appear as one of the coefficients of the fitted model. This is a nice idea when it works, but there are many cases where it is insufficient, a linear trend test or all possible pair-wise comparisons for example.

*R needs a general and well thought out post-fit contrasts function. Population averaged estimates could be one option of said routine, with the SAS population one possible choice.*

Also, I need to mention a couple more things:

- The standard methods for computing type 3 that I see in the help lists are flawed, giving seriously incorrect answers unless sum-to-zero constraints were used for the fit (contr.sum). This includes both the use of drop.terms and the anova function in the car package.
- For coxph models, my original goal, the situation is even more complex. In particular, which average does one want: average log hazard ratio, average hazard ratio, ratio of average hazards, or something else? Only one of these can be rewritten as a contrast in the coefficients, and thus the clever linear models algorithms do not transfer.

Regarding one of your last points. It was my understanding that contrasts should be set to e.g. contr.sum and NOT contr.treatment when running the model if you wanted to get Type III SS using the Anova() function in the car package.

Posted by: Gustaf Granath | October 28, 2014 at 19:55

I always appreciate Terry's perspective, coming from a viewpoint of doing data analysis (as opposed to primarily building software). He said:

"R needs a general and well thought out post-fit contrasts function. Population averaged estimates could be one option of said routine, with the SAS population one possible choice."

I could not agree more whole-heartedly. This concept of post-fit contrasts analysis can be very difficult to do in R in general, futzing around with the design matrix and trying to remember if it is the first or last level of a factor that is set to zero, etc. However, the commercial package asreml-r has a remarkable "predict" function that makes it simple to do such post-fit processing to easily calculate lsmeans-type contrasts or whatever narrow-space, intermediate-space, broad-space, or weighted-according-to-population-proportions-space predictions your heart desires.

See the examples on the following page for how to do predictions with one line of asreml code or many lines of tedious coefficient manipulations (using MCMCglmm, but it would be similar with lm, nlme, etc).

http://www.inside-r.org/packages/cran/agridat/docs/stroup.splitplot

For some reason, that page is missing the primary source document:

Walter W. Stroup, 1989. Predictable functions and prediction space in the mixed model procedure. Applications of Mixed Models in Agriculture and Related Disciplines.

Hey David Smith, what happened to the "Source" section of the Rd files when they were translated to inside-r.org ?

Kevin Wright

Posted by: Kevin Wright | October 28, 2014 at 21:10

@Kevin, that looks like a bug at inside-r.org -- I don't think there was special consideration of the fields in dataset help files.

Posted by: David Smith | October 29, 2014 at 06:54

I have posted this comment on behalf of the author. (ed.)

The first comment is correct. ( Gustaf Granath |)

The first sentence after "a couple more things" should say

"...giving seriously incorrect answers unless sum-to-zero constraints are were used for the fit (contr.sum)."

I was thinking one thing and typed another. As a footnote, the actual constraint is this: let Z by the design matrix for the reference population, ie the result of model.matrix() on the reference data set. If the column sums of Z are 0 for the factor of interest (the one we are testing), then the naive algorithm works. For a uniform reference distribution both contr.sum and contr.helmert suffice, but not for other reference distributions.

Posted by: Joseph Rickert | October 29, 2014 at 09:39

OK, thanks for the clarification. May be a good idea to edit the post so people dont get confused (more than they already are).

Posted by: Gustaf Granath | October 29, 2014 at 09:55

Bill Venables has a very nice essay about linear models

and goes into detail about what is wrong with type III sums of squares in section 5. Here's a link

http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

Nicholas Lewin-Koh

Posted by: Nicholas | October 30, 2014 at 09:12

Dear all,

Many thanks for this very good point.

Regarding the post-fit contrasts function, I have used the

'multcomp'package:http://cran.r-project.org/web/packages/multcomp/multcomp.pdf

from Bretz, Hothorn,Westfall assocciated with their book in 2010 (

http://www.crcpress.com/product/isbn/9781584885740)I am not sure that it could help solving the type III issue initially discussed there but some functions in the package were very helpful to help me handling user defined contrasts notably.

If you have ever use it, I would be grateful to have your expert opinion on it.

Thanks again

Marc

Posted by: Marc Lamarine | October 31, 2014 at 03:29

@Kevin mentions asreml's prediction capability - the method is described in two papers: "Prediction in Linear Mixed Models", Welham et al, 2004 presents the basic ideas, while "An efficient computing strategy for prediction in mixed linear models" Gilmour et al, 2004 provides details of the math/computing involved.

In addition, there is a recently released package, 'predictmeans', available on CRAN which produces predictions from aov, lm, lme, lmer, glm, and other models, apparently using the approach described in the Welham et al. paper.

I've not used the package much, so cannot comment on its reliability, but I mention it in case it is of interest.

Cheers,

Alec

Posted by: Alec Zwart | November 12, 2014 at 21:10