# Ashok CutkoskyLearning

Background

I was born in Columbia, Missouri and acquired my AB in Mathematics from Harvard in 2013. My undergraduate work centered on mathematics, computer science and biology. After coming to Stanford, I’ve realized a deep interest in the theory and practice of machine learning as well as medicine. I’ve just finished a masters in medicine and I'm currently in the computer science PhD Program.

Research Goals

Any learning algorithm that can be deployed in a neuromorphic chip must be very robust: it must be able to deal with many different problems without need for human-supervised tuning and it must do so in a highly noisy environment. The types of noise present in such a scenario are quite varied: noise in input data, noise accrued during the actual computation itself, and potentially even unknown delays in data processing.

Current machine learning algorithms aren’t suited for a neuromorphic chip because they typically require many extra variables (called hyperparameters) that must be carefully tuned to the problem at hand, and there is no clear recipe for doing this. The problem of hyperparameter tuning also creates another issue: the level of knowledge required to successfully use machine learning is much higher than it should be since it takes a lot of experience to understand how to tune them effectively. Although they seem very different, these problems of dealing with noise, diverse problem types and hyperparameter/model selection actually all have a common underlying source: uncertainty in the structure of the data.

To illustrate these ideas, consider the problem facing a meteorologist. Each day, he or she needs to provide a prediction about the weather for the next several days. These predictions must naturally be based on some data collected from various instruments (radar, thermometers, barometers, e.t.c.) The first thing the meteorologist needs to do is to perform model selection. That is, he or she needs to decide the general way in which the future weather depends on the past measurements. Is the temperature going to be a simple linear function of the temperatures over the last week, or is it some complicated trigonometric polynomial involving the windspeed at noon two days ago? Once our intrepid meteorologist has finished selecting a model, the next step is to fit the model. That is, find a particular linear function or trigonometric polynomial that predicts the data well.

Comparison of different learning algorithms with different hyperparameter settings. By adaptively choosing learning rates based on both observed gradients as well as parameter vectors we can derive an algorithm that achieves very low error across many different datasets as long as we under-estimate an initial guess for a hyperparameter - the algorithm is robust to even extremely small underestimates (red line).

Our meteorologist’s life is made even more difficult by inherent uncertainties in the measurements. Wind values may fluctuate rapidly, and measurement devices may make errors in recording. These sources of noise obscure the true data, making all aspects of the problem more difficult. In order to solve all of these problems and predict the weather, our meteorologist needs to know something about the structure of the weather data. This knowledge will inform the model selection process as well as the optimization procedure. In practice, the meteorologist acquires this requisite knowledge through experience by some combination of education and simply looking at lots of weather data.

My goal is to streamline and simplify this process by designing algorithms that can adaptively choose and optimize models in response to observed information. The overall form of the learning algorithms I have been working on are all versions of gradient descent. The fundamental idea of gradient descent is extremely simple. Let’s go back to the weather prediction example to see it in action. Suppose the meteorologist predicts that it will rain on Monday, but instead it is sunny. This is unfortunate, but it provides an opportunity for improvement. The meteorologist uses the  observed fact that it was sunny on Monday to calculate a quantity called the gradient, which describes how to adjust the model in order to make better predictions in the future. An important aspect of this method is a quantity called a learning rate. The learning rate is a number that tells the meteorologist how seriously to take mistakes. A large learning rate indicates that an incorrect prediction should result in a significant change to the model, while a small learning rate indicates that an incorrect prediction was probably just due to noise and bad-luck and so shouldn’t be grounds for very much change.

The major challenges are to find good estimates of the gradients and choose learning rates in an optimal manner. In order to prove the correctness of my algorithms for determining these parameters I am using the machinery of online learning. This is a beautiful field in which one uses worst-case analysis to measure the performance of learning algorithms, thus sidestepping the need for statistical assumptions about the environment and providing a substantial degree of robustness to any results.

My hope is that this work will result in learning algorithms that eliminate the need for hyperparameter tuning and are robust across many different problems. These algorithms would be useful in neuromorphic hardware, and would also be easy to use for anyone who wants to do some data analysis without any need to understand the underlying mechanisms.

Publications

 ID Article Full Text C49 A Cutkosky and K Boahen, Stochastic and Adversarial Online Learning Without Hyperparameters, Advances in Neural Information Processing Systems 30, Curran Associates, Inc., pp 5066-5074, 2017. C48 A Cutkosky and K Boahen, Online Learning Without Prior Information, Conference on Learning Theory, pp. 643-677, 2017. C44 A Cutkosky and K Boahen, Online Convex Optimization with Unconstrained Domains and Losses, Advances in Neural Information Processing Systems 29, Curran Associates, Inc., pp 748-756, 2016. C43 A Cutkosky and K Boahen, Bloom Features, IEEE International Conference on Computation Science and Computational Intelligence, IEEE Computer Society, pp 547-552, 2015.

Miscellaneous

In my spare time I enjoy performing card magic, reading science fiction, and playing frisbee.

My personal site is here.