Universal Communication and Clustering (with T. Weissman)

Techniques from information theory can be used to analyze unsupervised learning. Specifically,

  • Universal channel decoding makes for a compelling model for clustering sequences of finite-alphabet data.

  • A universal clustering technique is shown to be asymptotically optimal.

  • We suggest a similarity metric that is both practical and, in a certain sense, optimal.

Network Functional Quantization (with V. Goyal, J. Sun, L. Varshney, and K. Viswanathan)

Network source coding is fundamental, practically important, and horrendously difficult to solve with traditional approaches.

  • We use techniques from high-resolution quantization theory. Instead of Shannon's blocklength asymptotic, we use a resolution asymptotic.

  • We allow more complex ‘‘functional’’ distortion metrics, wherein a user cares about a function of the data.

  • This machinery yields intuitively understandable closed-form answers for a variety of network topologies and arbitrary (smooth) source distributions and arbitrary (smooth) functions of the data.

Channel Porosity (with T. Weissman)

Information theory usually relies on probabilistic characterization of systems. Universal results typically ask for algorithms that achieve optimal performance for an unknown channel model.

  • We consider channels with arbirary noise sequences, and make no assumptions about the processes that may or may not have generated them.

  • We define the porosity as the best possible rate of communication across such a channel using finite-state machines.

  • We find a simple closed-form expression for porosity that summarizes a noise sequence's statistics and relates closely to Lempel and Ziv's compressibility.

Film recommendation