machine learning, probabilistic inference, data science, online education
New I recently showed off the Codewebs engine at the NIPS demo track. Our software is a web app that gives real time feedback for student code submitted to a MOOC. The feedback from Codewebs is based completely on the massive collection of submissions by other students. I'm happy to do live demos for anyone interested. This is joint work with Chris Piech, Andy Nguyen, and Leo Guibas. We also have a paper that was recently accepted to WWW.
Check out our visualization
of 40,000 Octave/Matlab implementations of linear regression! This is part of my ongoing Codewebs
project for analyzing and providing detailed feedback to students in a programming based MOOC
with Chris Piech,
Andy Nguyen, and
Leo Guibas. Data from
Andrew Ng's course on Machine Learning offered through
Also check out Ben Lorica's blog post, and Hal Hodson's article at New Scientist about our work!
New Read my post at Stanford Online's Signal blog (with Jane Manning and Marc Sanders) on: Those Chatty Seniors! in which we analyze and discuss the demographics of MOOC forum posters. (the tl;dr is that older people talk more). The Stanford Daily also covered our work in a recent article.
I am a postdoctoral fellow working in the Computer Science Department at Stanford University and am supported by an NSF/CRA CI (Computing Innovations) fellowship. I am a member of the Geometric Computation Group which is headed by Leonidas Guibas. I am also part of the recently started Lytics Lab, a multidisciplinary group focused on Learning Analytics.
I received a Ph.D. in Robotics from the School of Computer Science at Carnegie Mellon University in 2011, where I worked with Carlos Guestrin. During graduate school, I was fortunate enough to spend two happy summers interning in Seattle, first with Intel Research working with Ali Rahimi, then at Microsoft Research working with Ashish Kapoor.
Before coming to CMU, I studied math (also) at Stanford University. And before Stanford, I attended Oakton High School in Vienna, Virginia, and for a time, also Lynbrook High School in San Jose, California.
My research interests in wordle form. The right wordle is generated from my most recent publications on online education and the left wordle is generated from my work on probabilistic inference and learning with combinatorially structured data.
I am interested in theoretical and applied problems in machine learning. My main interests lie in designing computationally efficient probabilistic reasoning and learning algorithms which allow computers to deal with the uncertainty and complexity inherent in real world data. My work has focused on tackling applications whose mathematical abstractions involve probabilistic reasoning with combinatorially structured objects such as matchings, rankings, and trees. These problems are challenging both statistically and computationally due to structural constraints (like mutual exclusivity) which cause interactions between objects that traditional techniques in machine learning have been ill-equipped to handle. Portions of my work thus address:
While being dedicated to pushing on core research problems, I am also committed to problems with real world applications and impact. My past work has contributed solutions to a variety of applications such as predicting preference over webpages and political elections, tracking with camera networks, and reconstructing temporal orderings of events (such as the onset of symptoms in neurodegenerative diseases) from noisy and incomplete data.
I now focus most of my energies on applications with educational impact. The recent surge in popularity of massive open online courses (MOOCs), with platforms such as Coursera and EdX, has made it possible for almost anyone to take free university courses. However while new technologies allow for scalable content delivery, we remain limited in our ability to scalably evaluate and give feedback for open-ended assignments. I approach these challenges fundamentally as machine learning (ML) problems, in which we can leverage the massive datasets now collected by online learning platforms. My work has thus focused on ML-driven education and has contributed algorithms for giving feedback in MOOCs via crowdsourcing or semi-automated methods.
A full list of publications is here. Contact me for preprints of papers that are under submission.
Note: Every now and then, Tomasz and I get emails from people about this code. While we're always happy to help out, I would like to point out that we wrote this code many years ago. Nowadays it is much more popular (and effective) to use collapsed samplers or online algorithms over the mean field + variational EM algorithm that was proposed in the first LDA paper.