Deciphering Enhancer Sequence Using Thermodynamics-Based Models and Convolutional Neural Networks

Themes: Conversion

Keywords: Modeling, Transcriptomics

Citation

Overview

Overview of the data used in this study. (A) Schematic of the wild-type (WT) enhancer of the Drosophila rhomboid (rho) gene. Binding sites of the three TFs were identified using the PWMs employed in (14). All annotated sites agree with those found in (14) and except for D1 and S2 all the sites match with the in vitro footprinted sites characterized previously (14). (B) The levels of the three regulators and the expression of rho driven by the wild-type enhancer in 17 equidistant points along 0–40% of ventral-dorsal (V-D) axis. (C) The expression of rho driven by perturbed enhancers (shown in brown) representing mutagenesis of binding sites of one or more TFs. Each panel’s title denotes the TF(s) whose sites were mutagenized. (D) Each activator’s site deletion (or combination thereof) is expected to reduce peak expression of rho (at bin 8 on the V–D axis); we therefore defined the effect of a variant enhancer (Y-axis) as the difference between the expression driven by it and the wild-type expression at this position of the axis. The effect of T2 single site deletion is not shown due to its overlap with the SNA site S5.(E) Schematic of synergistic activation, where the activation driven by two bound activators (right) is greater than the sum of their individual activation effects (left and middle).

Deciphering the sequence-function relationship encoded in enhancers holds the key to interpreting non-coding variants and understanding mechanisms of transcriptomic variation. Several quantitative models exist for predicting enhancer function and underlying mechanisms; however, there has been no systematic comparison of these models characterizing their relative strengths and shortcomings. Here, we interrogated a rich data set of neuroectodermal enhancers in Drosophila, representing cis- and trans- sources of expression variation, with a suite of biophysical and machine learning models. We performed rigorous comparisons of thermodynamics-based models implementing different mechanisms of activation, repression and cooperativity. Moreover, we developed a convolutional neural network (CNN) model, called CoNSEPT, that learns enhancer ‘grammar’ in an unbiased manner. CoNSEPT is the first general-purpose CNN tool for predicting enhancer function in varying conditions, such as different cell types and experimental conditions, and we show that such complex models can suggest interpretable mechanisms. We found model-based evidence for mechanisms previously established for the studied system, including cooperative activation and short-range repression. The data also favored one hypothesized activation mechanism over another and suggested an intriguing role for a direct, distance-independent repression mechanism. Our modeling shows that while fundamentally different models can yield similar fits to data, they vary in their utility for mechanistic inference.

Data

Related Publications