Applying machine learning (deep neural networks) to genome-wide maps of epigenetic modifications (e.g. DNA accessibility, or gene expression) has emerged as a powerful tool to predict genomic activity directly from DNA seqeunce. While high prediction accuracy can generally be achieved, there are several shortcomings that limit the widespread use of “sequence-to-activity” models in clinical settings. A key challenge is deriving human-interpretable, molecular mechanisms from ‘black-box’ models. As of today, we still lack a full understanding of how our genetic code leads to epigenetic changes, making it difficult to develop new treatments for epigenome-related malignancies.

Our lab is pioneering technologies in near-native settings that characterize the functional properties of individual transcription factors (TFs) – the master regulators of cell fate – and how they relay genetic information to the epigenome . Instead of relying on already existing regulatory sequences, we deploy large libraries of synthetic ones, to generate detailed maps of how TFs interact with each other and with the nuclear environment.