Applying machine learning (deep neural networks) to genome-wide maps of molecular activity (e.g. gene expression or DNA accessibility) has emerged as a powerful tool to predict genomic activity directly from DNA sequence. While high prediction accuracy can generally be achieved when testing against non-coding elements in our genome, there are several shortcomings that limit the widespread use of “sequence-to-activity” models in clinical settings. A key challenge is deriving human-interpretable, molecular mechanisms from ‘black-box’ models. As of today, we still lack a full understanding of how our genetic code leads to epigenetic and gene expression changes, making it difficult to develop new treatments that target malignancies caused by transcriptional dysregulation.

Our lab is pioneering technologies in near-native settings that characterize the functional properties of individual transcription factors (TFs) – the master regulators of cell fate – and how they relay genetic information to the epigenome . Instead of relying on already existing regulatory sequences, we deploy large libraries of synthetic ones, to generate detailed maps of how TFs interact with each other and with the nuclear environment.