Integrating genomic, weather, and secondary trait data for multiclass classification

The development of genomic selection (GS) methods has allowed plant breeding programs to select favorable lines using genomic data before performing field trials. Improvements in genotyping technology and automated data collection have enabled scientists to characterize genotypes and phenotypes in more precise ways. However, these technologies are producing high-dimensional data sets which can be difficult to incorporate into statistical models. To leverage the different types and dimensions of data collected for the purpose of predicting desirable phenotypes, we proposed a three-stage classification method for multi-class traits by integrating three data types — genomic, weather, and secondary trait using chickpea data. The classifiers obtained using our method were highly sparse, allowing for a straightforward interpretation of relationships between the response and the selected predictors.

Réka Howard
Réka Howard
University of Nebraska - Lincoln