Biostatistics guest speaker, Michael Sohn, University of Pennsylvania, will present, "Statistical Methods in Microbiome Data Analysis."
Microbiome study involves new computational and statistical challenges due to the characteristics of microbiome data: high sparsity, over-dispersion, and high-dimensionality. I am going to present two methods that account for the characteristics of microbiome data: 1) a GLM-based latent variable ordination method and 2) a compositional mediation model.
1) GLM-based latent variable ordination method: Distance-based ordination methods, such as the principal coordinate analysis (PCoA), are incapable of distinguishing between location effect (i.e., the difference in mean) and dispersion effect (i.e., the difference in variation) when there is a strong dispersion effect. In other words, PCoA may falsely display a location effect when there is a strong dispersion effect but no location effect. To resolve this potential problem, I proposed, as an ordination method, a zero-inflated quasi-Poisson factor model whose estimated factor loadings are used to display the similarity of samples.
2) Compositional mediation model: The causal mediation model has been extended to incorporate nonlinearity, treatment-mediation interaction, and multiple mediators. These models, however, are not directly applicable when mediators are components of a composition. I proposed a causal, compositional mediation model utilizing the algebra for compositions in the simplex space and an L1 penalized linear regression for compositional data in high-dimensional settings. The estimators of the direct and indirect (or mediation) effects are defined under the potential outcomes framework to establish causal interpretation. The model involves a novel integration of statistical methods in high dimensional regression analysis, compositional data analysis, and causal inference.