Instructor: John Duchi (jduchi stanford edu)
Lectures: Tuesday/Thursday, 13:1514:30, building 200 (History corner) room 030
TA: Yuming Kuang (ymkuang stanford edu)
Office hours: John: Tuesday/Thurdsay 14:30  15:30 in 126 Sequoia Hall. Yuming: Wednesday 13:30  15:30 in 216 Sequoia Hall.
Description: Information theory was developed to solve
fundamental problems in the theory of communications, but its
connections to statistical estimation and inference date nearly
to the birth of the field. With their focus on fundamental
limits, information theoretic techniques have provided deep
insights into optimal procedures for a variety of inferential
tasks. In addition, the basic quantities of information
theoryentropy and relative entropy and their generalizations
to other divergence measures such as fdivergencesare
central in many areas of mathematical statistics and
probability.
The application of these tools are numerous. In mathematical
statistics, for example, they allow characterization of optimal error
probabilities in hypothesis testing, determination of minimax rates of
convergence for estimation problems, demonstration of equivalence
between (ostensibly) different estimation problems, and lead to
penalized estimators and the minimum description length principle. In
probability, they provide insights into the central limit theorem,
large deviations theory (via Sanov's theorem and other results), and
appear in empirical process theory and concentration of
measure. Information theoretic techniques also arise in game playing,
gambling, stochastic optimization and approximation, among other
areas.
In this course, we will study information theoretic quantities,
and their connections to estimation and statistics, in some
depth, showing applications to many of the areas above. Except
to provide background, we will not cover standard
informationtheoretic topics such as sourcecoding or
channelcoding, focusing on the probabilistic and statistical
consequences of information theory.
Here is a more precise
syllabus
including evaluation criteria and topics.
Readings listed below are supplementary to the lectures
and not necessary, but should give additional
perspective on the material.
Cover and Thomas: Elements of Information Theory, Second Edition, Wiley 2006.
Tsybakov: Introduction to Nonparametric Estimation, Springer 2009.
Date  Topics  Reading  Assignments 
September 23 [pdf] 
Course overview  
September 25 [pdf] 
Review of information theory
Divergences 
Cover and Thomas, Chapter 2  Problem Set 1 Out [pdf] 
September 30 No Classtravelling 
N/A  N/A  
October 2 [pdf] 
Minimax lower bounds Le Cam's method 
Tsybakov Chapters 2.12.4 

October 7 [pdf] 
Minimax lower bounds
Fano's method and packing 
Tsybakov Chapters 2.52.7 
Problem Set 1 due. Solutions: [pdf] 
Rescheduled: October 10 [pdf] 
Minimax lower bounds
Fano's method, Assouad's method 
Problem Set 2 out [pdf] 

October 14 [pdf] 
Nonparametrics and minimax
bounds
Assouad's method, nonparametric regression 
Tsybakov Chapters 1.11.2 and 1.5 

October 16 [pdf] 
Global Fano Method  Yang and Barron 1999  
October 21 [pdf] 
Exponential familes and maximum entropy 
Wainwright and Jordan, Chapter 3  
October 23 [pdf] 
Maximum entropy and robustness 
Problem Set 2 due. Solutions: [pdf] 

October 28 [pdf] 
Fisher information and associated metrics 
Problem Set 3 out [pdf] 

October 30 [pdf] 
Regret and redundancy  
November 4 [pdf] 
Bayesian redundancy, Jeffreys prior,
reference prior, redundancy capacity 

November 6 [pdf] 
Prediction games without the log loss  
November 11 [pdf] 
Finishing prediction games Surrogate risk consistency 
Problem set 3 due. Solutions: [pdf] 

November 13 [Draft outline] 
Surrogate risk consistency
Classification calibration 

November 18 [Draft outline] 
Risk and divergences Binary experiments and classification 
Problem set 4 out [pdf] 

November 20 [Draft outline] 
Binary experiments Concentration inequality intro 

December 2 [Draft outline] 
Concentration inequalities The entropy method 
Problem set 4 due. Solutions: [pdf] 