Research

Statistical Methods for Human Genetics

We developed statistical genetics methods in several areas. We have developed a popular method, TADA, for analyzing rare variants from family sequencing data. In genomewide association studies (GWAS), it is often difficult to know causal variants, target genes and molecular mechanisms. We developed methods to integrate GWAS with genetic data of molecular traits to identify the intermediate molecular phenotypes linking genetic variants with phenotypes. These methods extended the popular Mendelian Randomization (MR) method and Transcriptome-wide Association Studies (TWAS). We addressed the key challenges of these method and improved the accuracy of finding causal relationships between molecular traits and complex diseases.

Sherlock

Sherlock: Detecting Gene-Disease Associations by Matching Patterns of Expression QTL and GWAS. He X, Fuller CK, Song Y, Meng Q, Zhang B, Yang X, Li H. Am J Hum Genet, 2013 May 2;92(5):667-80

Synaptic, transcriptional, and chromatin genes disrupted in autism. De Rubeis S, He X, Goldberg A, Poultney C, Samocha K, et al. Nature, 2014 Nov 13;515(7526):209

Mendelian Randomization Accounting for Correlated and Uncorrelated Pleiotropic Effects Using Genome-Wide Summary StatisticsMorrison J, Knoblauch N, Marcus JH, Stephens M, He X. Nature Genetics, 2020, May 25.

Adjusting for genetic confounders in transcriptome-wide association studies improves discovery of risk genes of complex traits. Siming Zhao*, Wesley Crouse*, Sheng Qian, Kaixuan Luo, Matthew Stephens#, Xin He#. Nature Genetics. 2024. Feb;56(2):336-347

Regulatory Variations and Gene Mapping of Complex Traits

Most variants associated with complex traits are located in non-coding regions. Identifying functional variants in such regions in a tissue-specific manner is thus critical for mapping causal variants of complex traits. We have been working with experimental collaborators to identify regulatory variants and leverage such findings to study human genetics. We found that a particular class of variants that affect mRNA modifications (m6A) contribute significantly to heritability of complex traits. These variants work largely independently of transcription or splicing, representing a novel path from genetic to phenotypic variations. We have also used chromatin accessibility profiles in iPS cell-derived neurons to narrow down putative causal variants of neuropsychiatric disorders.

M6A_QTL

Genetic Analyses Support the Contribution of mRNA N6-methyladenosine (m6A) Modification to Human Disease Heritability. Zhang Z, Luo K, Zou Z, Qiu M, Tian J, Sieh L, Shi H, Zou Y, Wang G, Morrison J, Zhu A, Qiao M, Li Z, Stephens M*, He X*, He C*. Nature Genetics, 2020 Jun 29

Allele-specific open chromatin in human iPSC neurons elucidates  functional disease variants. Zhang S, Zhang H, Zhou Y, Qiao M, Zhao S, Kozlova A, Shi J, Sanders A, Wang G, Luo K, Sengupta S, West S, Qian S, Stret M, Avramopoulos D, Cowan C, Chen M, Pang Z, Gejman P, He X*, Duan J*. Science 2020 Jul 31;369(6503):561-565

Regulatory Sequences and Evolution

Genes need to be expressed in the correct time and place and disruption of normal expression patterns can lead to diseases. Gene expression is controlled by enhancer sequences, which read information of cellular environment to drive specific expression patterns appropriate for cellular conditions or cell types. A major research challenge is to decipher the rules that govern this process. During my PhD with Dr. Saurabh Sinha, I developed quantitative models of how enhancer sequences interact with the regulatory proteins (transcription factors) to drive gene expression. These models attempted to capture the basic physical process and helped uncover several mechanisms that may be important in generating precise spatial patterns of gene expression during fruit fly early development. I have also studied the pattern of evolution of cis-regulatory sequences involved in fruit fly development. My lab  at Univ. of Chicago continued to develop methods to study gene regulation. We developed a deep learning based method to predict the functional effect of non-coding variant in the neurodevelopment context. We also developed a method to identify target genes of regulatory proteins from single-cell CRISPR screening data.

EvoSimul

Thermodynamics-based models of transcriptional regulation by enhancers: the roles of synergistic activation, cooperative binding and short-range repression. He X, Samee MA, Blatti C, Sinha S. PLoS Comput Biol, 2010 Sep 16;6(9). pii: e1000935

Evolutionary Origins of Transcription Factor Binding Site Clusters. He X, Duque TS, Sinha S. Mol Biol Evol, 2012, 29(3):1059-70.

Annotating functional effects of non-coding variants in neuropsychiatric cell types by deep transfer learning. Boqiao Lai , Sheng Qian , Hanwei Zhang, Siwei Zhang, Alena Kozlova, Jubao Duan, Jinbo Xu*, Xin He* PLoS Comput Biol. 2022 May 16. 18(5)

A new Bayesian factor analysis method improves detection of genes and biological processes affected by perturbations in single-cell CRISPR screening. Yifan Zhou, Kaixuan Luo*, Lifan Liang*, Mengjie Chen#, Xin He#. Nature Methods. 2023 Sep 28.

Cancer Genomics

Cancer is largely a genetic disease, where somatic mutations give cancer cells survival advantages and drive tumorigenesis. Identifying driver events, and how they link to changes in celular behavior and tumor microenvrionment, are major challenges of the field. We have developed a method (DriverMAPS) that models the complex pattern of positive selection acting on cancer driver genes, leading to much better detection of driver genes than existing methods. We are interested in how these driver events lead to downstream effects, in particular, the escape from the immune system.

driverMAPS

Detailed modeling of positive selection improves detection of cancer driver genes, Zhao S, Liu J, Nanga P, Liu Y, Cicek AE, Knoblauch N, He C, Stephens M*, He X*. Nature Communications, 2019 Jul 30;10(1):3399