Jonathan Chung-Kuan Huang

machine learning, probabilistic inference, data science, online education

jhuang11@stanford.edu
Room S297
James H. Clark Center
Stanford University
Stanford, CA 94305
P: (650) 248-4441

New I recently showed off the Codewebs engine at the NIPS demo track. Our software is a web app that gives real time feedback for student code submitted to a MOOC. The feedback from Codewebs is based completely on the massive collection of submissions by other students. I'm happy to do live demos for anyone interested. This is joint work with Chris Piech, Andy Nguyen, and Leo Guibas. We also have a paper that was recently accepted to WWW.

New I recently co-organized the NIPS 2013 Workshop on Data Driven Education with Sumit Basu and Kalyan Veeramachaneni.

New Check out our visualization of 40,000 Octave/Matlab implementations of linear regression! This is part of my ongoing Codewebs project for analyzing and providing detailed feedback to students in a programming based MOOC with Chris Piech, Andy Nguyen, and Leo Guibas. Data from Andrew Ng's course on Machine Learning offered through Coursera.
Also check out Ben Lorica's blog post, and Hal Hodson's article at New Scientist about our work!

New Read my post at Stanford Online's Signal blog (with Jane Manning and Marc Sanders) on: Those Chatty Seniors! in which we analyze and discuss the demographics of MOOC forum posters. (the tl;dr is that older people talk more). The Stanford Daily also covered our work in a recent article.

I am a postdoctoral fellow working in the Computer Science Department at Stanford University and am supported by an NSF/CRA CI (Computing Innovations) fellowship. I am a member of the Geometric Computation Group which is headed by Leonidas Guibas. I am also part of the recently started Lytics Lab, a multidisciplinary group focused on Learning Analytics.

I received a Ph.D. in Robotics from the School of Computer Science at Carnegie Mellon University in 2011, where I worked with Carlos Guestrin. During graduate school, I was fortunate enough to spend two happy summers interning in Seattle, first with Intel Research working with Ali Rahimi, then at Microsoft Research working with Ashish Kapoor.

Before coming to CMU, I studied math (also) at Stanford University. And before Stanford, I attended Oakton High School in Vienna, Virginia, and for a time, also Lynbrook High School in San Jose, California.

Here is an "official" bio and photo.

My research interests in wordle form. The right wordle is generated from my most recent publications on online education and the left wordle is generated from my work on probabilistic inference and learning with combinatorially structured data.

I am interested in theoretical and applied problems in machine learning. My main interests lie in designing computationally efficient probabilistic reasoning and learning algorithms which allow computers to deal with the uncertainty and complexity inherent in real world data. My work has focused on tackling applications whose mathematical abstractions involve probabilistic reasoning with combinatorially structured objects such as matchings, rankings, and trees. These problems are challenging both statistically and computationally due to structural constraints (like mutual exclusivity) which cause interactions between objects that traditional techniques in machine learning have been ill-equipped to handle. Portions of my work thus address:

  • Compact, probabilistic formulations for reasoning jointly with large collections of structured data,
  • Efficient algorithms for reasoning and learning that exploit problem structure,
  • Theoretical analyses of computational and statistical complexity as well as approximation quality.

While being dedicated to pushing on core research problems, I am also committed to problems with real world applications and impact. My past work has contributed solutions to a variety of applications such as predicting preference over webpages and political elections, tracking with camera networks, and reconstructing temporal orderings of events (such as the onset of symptoms in neurodegenerative diseases) from noisy and incomplete data.

I now focus most of my energies on applications with educational impact. The recent surge in popularity of massive open online courses (MOOCs), with platforms such as Coursera and EdX, has made it possible for almost anyone to take free university courses. However while new technologies allow for scalable content delivery, we remain limited in our ability to scalably evaluate and give feedback for open-ended assignments. I approach these challenges fundamentally as machine learning (ML) problems, in which we can leverage the massive datasets now collected by online learning platforms. My work has thus focused on ML-driven education and has contributed algorithms for giving feedback in MOOCs via crowdsourcing or semi-automated methods.

A full list of publications is here. Contact me for preprints of papers that are under submission.

PyMallows
Jonathan Huang
Python routines for fitting and simulating from a generalized Mallows model. Learning algorithms implemented for both full and partial rankings
PROPS: Probabilistic Reasoning on Permutations toolbox
Jonathan Leonard Long, Jonathan Huang,
C++/Python library for reasoning/learning with distributions on permutations.
Littlewood-Richardson rule
Jonathan Huang
A matlab implementation of the Littlewood-Richardson rule.

(LDA) Latent Dirichlet Allocation
Jonathan Huang and Tomasz Malisiewicz,
An implementation of the mean field inference/learning algorithms from Blei et al. (2003)

Sample output on 20 Newsgroups dataset: [Link]

Note: Every now and then, Tomasz and I get emails from people about this code. While we're always happy to help out, I would like to point out that we wrote this code many years ago. Nowadays it is much more popular (and effective) to use collapsed samplers or online algorithms over the mean field + variational EM algorithm that was proposed in the first LDA paper.

HLNfit,
Jonathan Huang and Tomasz Malisiewicz,
Code for fitting a Hierarchical Logistic Normal distribution.
There is also a Romanian translation by Maxim Petrenko - Software blogger
...

Adaptive Fourier-Domain Inference on the Symmetric Group

Algebraic Methods in Machine Learning Workshop, NIPS '08
Whistler, Canada

...

Probability Distributions on Permutations: Compact Representations and Inference

Machine Learning Lunch Seminar, 2008
Carnegie Mellon University

...

Exploiting Independence and Its Generalizations for Reasoning about Permutation Data

Machine Learning Lunch Seminar, 2010
Carnegie Mellon University

...

Politics, Preferences and Permutations: Probabilistic Reasoning with Rankings

Seminar, 2011
Microsoft Research Cambridge (UK)