Goodhardt’s law and plant breeding. Is cross-validation misleading about the success of genomic prediction?
Goodhart’s law says that “when a measure becomes a target, it ceases to be a good measure.” Plant breeding aims to increase the genetic value of varieties, and the Breeder’s Equation states that the rate of gain is proportional to the accuracy of selection. In the past two decades, Genomic Prediction has been tested in many plant breeding systems, for many traits, using ever-increasing complexity of statistical models. These models are almost universally evaluated empirically using cross-validation to quantify their accuracy. However, cross-validation can make Genomic Prediction look much better than it really is. I am concerned that our singular focus on accuracy has led to many discoveries of ways to improve accuracy that do not improve genetic gain. I will lay out this argument with preliminary results from a literature survey and simulations. But I also hope for a broader discussion of whether similar issues occur in other areas of applied statistics in agriculture, and how to solve them.