Pooling of variances: the skeleton in the mixed model closet?

This talk is intended to solicit opinions and advice on an issue that I worry about intermittently: when is it appropriate to pool variances and when should that be avoided. The issue is not new: one instance is inference about the difference of means from two groups. If you pool, you get the two-sample t-test; if you do not, you have the Behrens-Fisher problem, which has various approximate solutions, including the Welch-Satterthwaite approximate degrees of freedom. Modern computing power lets us consider models with multiple “non-pooled” variances. For example, the combined analysis of an experiment repeated at multiple sites and multiple years requires at least two estimated variances: the error variance, i.e., that between replicates at the same location, and the treatment*environment variance component, i.e. the consistency of the treatment differences across environments (sites, years, or their combination). Both of these variances could be pooled or split into components. For example, a different error variance could be estimated for each site, for each combination of site and year, or for each treatment. The treatment*environment interaction could be split into separate components for treatment*year, treatment*site and treatment*site*year. In my experience, inference with pooled variances is more sensitive to heterogeneity among treatments than heterogeneity among environments. I will share examples that illustrate this.

Philip Dixon
Philip Dixon
Iowa State University