Li Zhu of the Department of Biostatistics defends her dissertation on "Bayesian variable selection model and differential co-expression network analysis for multi-omics data integration".
Committee Chairperson: George C. Tseng, ScD, Department of Biostatistics
Robert Krafty, PhD, Department of Biostatistics
Lu Tang, PhD, Department of Biostatistics
Daniel E. Weeks, PhD, Department of Human Genetics
Wei Chen, Department of Pediatrics
Graduate faculty of the University and all other interested parties are invited to attend
Due to the large accumulation of omics data sets in public repository, innumerable studies have been designed to analyze omics data for various purposes. However, the analysis of single data set often suffers from limited sample size, small power, and lack of reproducibility across studies, and thus data integration is gaining more and more attention nowadays. This dissertation focuses on developing methods for variable selection in regression and clustering for multi-omics data integration, and identification of differential co-expression network in the transcriptomic meta-analysis setting.
In the first paper, we propose a Bayesian indicator variable selection model to incorporate multi-layer overlapping group structure (MOG) in the regression setting, motivated by the structure commonly encountered in multi-omics applications, in which a biological pathway contains tens to hundreds of genes and a gene can contain multiple experimentally measured features (such as its mRNA expression, copy number variation and methylation levels of possibly multiple sites). We evaluated the model in simulations and two breast cancer examples, and demonstrated that the result not only enhances prediction accuracy but also improves variable selection and model interpretation that lead to deeper biological insight of the disease. In the second paper, we extended MOG to Gaussian mixture models for clustering, aiming to identify disease subtypes and detect subtype-relevant omics features.
In the third paper, we present a meta-analytic framework for detecting differential co-expression networks (MetaDCN). Differential co-expression (DC) analysis, different from conventional differential expression (DE) analysis, helps detect alterations of gene-gene correlations in case/control comparison, which is likely to be missed in DE analysis.
Public health significance: Methods proposed in paper 1 and 2 not only can predict disease outcome or identify disease subtypes, but also determine relevant biomarkers, which can potentially facilitate the design of a test assay to monitor disease progression, predict disease subtypes, and guide treatment decision. Methods developed in paper 3 provides a novel framework to identify differentially co-expressed genes to help us better understand how gene-gene interactions are altered in disease mechanism and provide potential new molecular targets for drug development.