Systems Genetics of Complex Diseases


Traditionally genetic studies focus on the correlation between genetic variations and organism-level phenotypes. The emerging field of systems genetics studies how DNA variations affect molecular phenotypes, such as gene expression, metabolites, and DNA methylation. Such studies provide a natural bridge from genetic changes to phenotypes, thus can help elucidate the mechanism of how variants affect disease risks.

A major focus of the lab is to develop methods to extract information from systems genetic data, especially eQTL studies that identify genetic variants associated with gene expression levels. One method we developed, called Sherlock, combine eQTL and data from genome-wide association studies (GWAS) in a novel fashion. It can take advantage of loci linked to gene expression in trans (i.e. distant from the gene itself in chromosome locations), and can discover genes that would be impossible to find by GWAS alone. We are expanding our work in several directions. We are developing methods that can identify not individual genes, but gene pathways that underlie diseases. We are also interested in building models that can fully integrate QTL data at different levels (epigenome, mRNAs, proteins, etc.) for a better understanding of diseases.

Sherlock: Detecting Gene-Disease Associations by Matching Patterns of Expression QTL and GWAS. He X, Fuller CK, Song Y, Meng Q, Zhang B, Yang X, Li H. Am J Hum Genet, 2013 May 2;92(5):667-80

Statistical Genetics of Rare Variants

Figure 2

Genome-wide association studies (GWAS) that correlate genotypes to phenotypic traits have been successful in mapping risk loci of many common diseases. Yet the findings from these studies often have small effects and for some diseases, only a small number of risk loci have been found through GWAS. There is a growing interest in identifying rare disease variants (often defined as frequency below 5%) through whole exome or genome sequencing studies. The statistical challenge is that the power of detecting rare variants of diseases is often low.

We have been working on addressing the challenges of rare variant genetics by integrating multiple types of data to maximize the power. Each person inherits mutations from parents, some of which may predispose the person to certain diseases. Meanwhile, new mutations may occur spontaneously during reproduction, and if disrupting key genes, these de novo mutations can also increase risks of diseases. We have developed a model that effectively combines de novo mutations and inherited variants from parents to test the role of a gene. This new method empowers one of the largest sequencing studies of autism and the findings shed new insights onto the biological processes disrupted in autism. Some ongoing research along this line includes incorporating functional annotations of variants in analysis and combining nucleotide level variations and copy-number variations.

Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. He X, Sanders SJ, Liu L, De Rubeis S, Lim ET, Sutcliffe JS, Schellenberg GD, Gibbs RA, Daly MJ, Buxbaum JD, State MW, Devlin B, Roeder K. PLoS Genetics, 2013 Aug;9(8):e1003671

Synaptic, transcriptional, and chromatin genes disrupted in autism. De Rubeis S, He X, Goldberg A, Poultney C, Samocha K, et al. Nature, 2014 Nov 13;515(7526):209

Cis-Regulatory Sequences

Genes need to be expressed in the correct time and place and disruption of normal expression patterns can increase a person's susceptibility of diseases. Gene expression is controlled by enhancer sequences, which read information of cellular environment to drive specific expression patterns appropriate for cellular conditions or cell types. A major research challenge is to decipher the rules that govern this process and to use such knowledge to improve our ability to interpret DNA variations in non-coding sequences.

We have developed quantitative models of how enhancer sequences interact with the regulatory proteins (transcription factors) to drive gene expression. These models attempt to capture the basic physical process and are able to reproduce the complex spatial pattern of gene expression observed in fruit fly early development. More recently, we are working with experimental collaborators to map enhancer elements in human brain, and developing quantitative methods of how these sequences function. Our goal is to use such knowledge to interpret non-coding mutations that predispose to mental disorders such as autism and schizophrenia.


Thermodynamics-based models of transcriptional regulation by enhancers: the roles of synergistic activation, cooperative binding and short-range repression. He X, Samee MA, Blatti C, Sinha S. PLoS Comput Biol, 2010 Sep 16;6(9). pii: e1000935

A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data. He X, Chen CC, Hong F, Fang F, Sinha S, Ng HH, Zhong S. PLoS ONE, 2009 Dec 1;4(12):e8155.

Evolutionary Genomics of Gene Regulation


An influential hypothesis in evolution is that the change of when and where genes are expressed, instead of gene functions per se, largely drives evolution of different phenotypes across species. We are interested in the general principles of how regulatory sequences and regulatory relationship between genes evolve. What is the origin of new regulatory sequences? Do the regulatory networks change all the time even when no new functions are selected? How does a complex biological process or system that requires many changes, e.g. a new cell type, evolve?

We have studied evolution of cis-regulatory sequences in the context of fruit fly development. We found that the basic units of these sequences, called transcription factor binding sites, can turnover rather rapidly even when the sequences encode the same information, i.e. generating similar pattern of gene expression. In another study, we use a combination of theory and simulation to demonstrate how redundancy (multiple units of similar function) is built into regulatory sequences by evolution, even though redundancy is never directly selected. We are very interested in combining genomic data across species (regulatory elements, transcriptome, networks, etc.) to infer the key drivers of novel phenotypes.

Evolution of regulatory sequences in 12 Drosophila species. Kim J*, He X*, Sinha S. PLoS Genet, 2009 Jan;5(1):e1000330

Evolutionary Origins of Transcription Factor Binding Site Clusters. He X, Duque TS, Sinha S
Mol Biol Evol, 2012, 29(3):1059-70

Alignment and prediction of regulatory sequences based on a probabilistic model of evolution. He X, Ling X, Sinha S. PLoS Comput Biol, 2009 Mar;5(3):e1000299