Alon Kipnis

math/stats • information theory • ambitious data science


I am a postdoctoral research scholar and a lecturer in the Statistics department at Stanford University. Previously, I graduated with a PhD in electrical engineering from Stanford University. My research focuses on the mathematical principles of data science and information processing.

My recent work explores subtle classification and detection problems involving a vast number of features. My approach, based on multiple hypothesis testing and the Higher Criticism (HC), is particularly appealing when only few features out of possible many are useful while those useful are relatively weak. This approach enables the use of HC and other tests as unsupervised, untrained discriminators: no model is specified for the features hence no learning or tuning is done. This property makes the method incredibly useful in a host of real-world applications. Examples include text classification, detecting mutations in short sequence genomic data, trend prediction in high-dimensional data, social graph analysis, automatic selection of words for topic modeling, and early detection of economic or health crises. As a part of this research, I conducted a massive number of computational experiments to verify the usefulness of the method in these applications.

Another line of my work studies the effect of compression or communication constraints on modern estimation and learning procedures. The disproportional size of datasets compared to computing and communication resources, as well as the wide adoption of cloud computing infrastracture, making such constraints among the most limiting factors in modern data science applications. My work studies fundamental limits of inference and ways to adapt standard methods in statistics and machine learning to the scenario where the data undergoes lossy compression.

My Ph.D. work addressed the effect of data compression on sampling real-world analog data. It provided the first complete characterization of the fundamental trade-off between sampling rate, compression bitrate, and system performance for any digital system processing real-world signal. This tradeoff extends the classical Shannon-Nyquist sampling theorem to the (practical) situation where the samples are quantized or compressed in a lossy manner. In particular, my work shows that, for most signal models, a bitrate constraint imposes a new sampling rate (smaller than Nyquist) above which the signal is optimally represented.

Profile picture 

Selected Works

Two-sample testing for large high-dimensional multinomials

My recent work shows that the HC test has interesting optimality properties when applied to detecting differences between discrete distributions. In the most challenging situation, when the difference between the distributions is sparse and weak, the adapted HC test expriences a phase transition phenomenon.

It turns out that HC performs better than any other known test in terms of this phase transition analysis. Based on this insight, I developed and applied an HC-based test to classify text documents and detect authorship; see this page for more details. When applied to authorship attribution challenges, this technique performs as-well-as state-of-the-art methods but without handcrafting or tuning.

Phase Transition

Gaussian Approximation of Quantization Error for Estiamtion from Compressed Data

This work considers the performance in estimating or learning from datasets undergoing bit-level data compression. In general, it is difficult to apply existing results from statistics and learning theory to such data because of the non-linearity of the compression operation. To address this issue, this work considers the distributional connection between the lossy compressed representation of a high-dimensional signal using a random spherical code and the observation of this signal under an additive white Gaussian noise (AWGN). This connection allows us to characterize the risk of an estimator based on an AWGN-corrupted version of the data to the risk attained by the same estimator when fed with its lossy compressed version. We demonstrate the usefulness of this connection by deriving various novel results for inference problems under compression constraints, including sparse regression and compressed-sensing under quantization or communication constraints.

Sub-Nyquist sampling under quantization and lossy compression constraints

My PhD work provided a unified treatment of processing data under three fundamental information inhibiting processes: sampling, bit-level data compression, and random noise. While the effect of each of these processes has been well-understood before, my work shows that the combination of them undermines standard conceptions. The highlight of this work is a fundamental result showing that sampling at a rate smaller than Nyquist is optimal when the compression/quantization bitrate of the digital representation is restricted. This restriction may be the result of limited memory or communication rates as in self-driving cars, or limited processing power as in most modern data science applications.

In other words, while sampling at or above Nyquist is needed for optimality when the bitrate is unlimited (which is never the case in practice), for each finite bitrate there exists a new critical sampling rate that is smaller than Nyquist and also depends on the bitrate. Sampling at this new rate leads to the optimal performance under the bitrate constraint.

Sub-Nyquist sampling is optimal under bitrate constraints

You can read more about my past work and future plans on my Research Statement page.


Room 208 at Sequoia Hall. My email address is alonkipnis at gmail.