Tianzhou "Charles" Ma of the Department of Biostatistics defends his dissertation on "Differential Expression and Feature Selection in the Analysis of Multiple Omics Studies".
Graduate faculty of the University and all other interested parties are invited to attend
With the rapid advances of high-throughput technologies in the past decades, various kinds of omics data have been generated from many labs and accumulated in the public domain. These studies have been designed for different biological purposes, including the identification of differentially expressed genes between two conditions, the selection of important biomarkers that can predict a clinical outcome, etc. Effective meta-analysis of omics data from multiple studies can improve statistical power, accuracy and reproducibility of single study. This dissertation covered a few methods for differential expression (section 1) and feature selection (section 2) in the analysis of multiple omics studies.
In the first section, we proposed a full Bayesian hierarchical model for RNA-seq meta-analysis by modeling count data, integrating information across genes and across studies, and modeling differential signals across studies via latent variables. A Dirichlet process mixture prior is further applied on the latent variables to provide categorization of detected biomarkers according to their differential expression patterns across studies. We used both simulations and a real application on multiple brain region HIV-1 transgenic rats to demonstrate improved sensitivity, accuracy and biological findings of our method. In a follow-up paper, we extended the previous Bayesian model to jointly integrate transcriptomic data from the two platforms: microarray and RNA-seq.
In the second section, we considered a general framework for variable screening with multiple omics studies and further proposed a novel two-step screening procedure for high-dimensional regression analysis in this framework. Compared to the one-step procedure and rank-based sure independence screening procedure, our procedure greatly reduced false negative errors while keeping a low false positive rate. Theoretically, we showed that our procedure possesses the sure screening property with weaker assumptions on signal strengths and allows the number of features to grow at an exponential rate of the sample size.
Detection of important biomarkers that are either differentially expressed or predictive of clinical outcomes is essential for searching for potential drug targets and understanding the disease mechanism. Such findings in basic science can be translated into preventive medicine or potential treatment for disease to promote human health and improve the global healthcare system.