The process of translation, whereby RNA is converted to protein, is an essential step in protein expression, but our understanding of the regulatory mechanisms at this stage is limited. I develop probabilistic models for ribosome profiling datasets, which give a high-resolution snapshot of the distribution of ribosomes across a genome. From these data, we extract information about local and global kinetics of translation.
The structure of RNA informs many different cellular processes. Given a sequence, we'd like to predict the set of base pairs formed under a certain energy model, while maintaining low algorithmic complexity, high generality (in the space of potential structures formed), and high accuracy. I work on statistical models that incorporate various experimental data by training on a set of known structures. I have also worked on models and algorithms for more complex "pseudoknotted" structures.
I have worked on predicting regulatory modules for gene expression and am also interested in genetic variation as it relates to translation. During an internship at Microsoft Research, I worked on sequence assembly algorithms for polyploid genomes.
During my undergrad, I worked on: (1) algorithms for initial conditions for simulations of biological processes described by hybrid systems (pdf), and (2) a review of DNA sequencing using biological and synthetic nanopores.
I am generally interested in probabilistic models for generative processes or problems where domain information is useful, but have worked with other machine learning tools. During an internship at Google, I worked on machine learning models for ad click-through rate prediction.