Asymptotically tight bounds on the depth of estimated context trees

Álvaro Martín
Instituto de Computación, Universidad de la República, Uruguay.
Date: Feb. 5th, 2016

Abstract

We study the maximum depth of context tree estimates, i.e., the maximum Markov order attainable by an estimated tree model given an input sequence of length n. We consider two classes of estimators:

1) Penalized maximum likelihood (PML) estimators where a context tree T is obtained by minimizing a cost of the form -log P_T(x^n) + f(n)|S_T|, where P_T(x^n) is the ML probability of the input sequence x^n under a tree model T, S_T is the set of states defined by T, and f(n) is an increasing (penalization) function of n (the popular BIC estimator corresponds to f(n)= (A-1)/2 log n, where A is the size of the input alphabet).

2) MDL estimators based on the KT probability assignment. In each case we derive an asymptotic upper bound, n^{1/2 + o(1)}, and we exhibit explicit input sequences that show that this bound is asymptotically tight up to the term o(1) in the exponent.

It is based on joint work with Gadiel Seroussi.

Bio

Álvaro Martín was born in Montevideo, Uruguay. He received the Computer Engineer degree, and the Ph.D. degree on Informatics from the Universidad de la República, Uruguay, in 2001 and 2009, respectively.

From 1996 to 2002 he worked in the software development industry, and since 2000 he is with the Instituto de Computación, Universidad de la República, Uruguay. In 2003 he visited the Information Theory Group at the Hewlett-Packard Laboratories, Palo Alto, CA, and the Electrical an Computer Engineering Department, University of Minnesota, Minneapolis. His research interests include source coding, statistical modeling, and algorithms.