Classification Methods for High-dimensional MiFi® Data
Microbe Finder (MiFi®) is a diagnostic tool developed by researchers at Oklahoma State University which uses high-throughput sequencing technology to measure the abundance of a pathogen in a sample. Data are generated as follows. First, a pathogen’s unique RNA sequence is decomposed into multiple smaller sequences called “e-probes”. Then, the number of e-probe “hits” in a sample is recorded for each e-probe. Also recorded is a simple single measure called “total score”. The goal is to train a classifier using MiFi® data. The challenge here is that there are anywhere from 10 to 10,000 e-probes, depending on the pathogen of interest, and there are only a few samples available for training the classifier. A variety of classifiers for this high-dimensional classification problem are available. Likewise, a simple univariate classifier based on the total scores could be considered. This talk will briefly review available approaches and compare their performances for several pathogens.