9:00 - 10:00 BREAKFAST/REGISTRATION 10:00 - 11:00 Tutorial: Ravi Kannan Department of Computer Science Yale University SAMPLING IN LARGE MATRICES In many application areas, the input data matrix is too large to be handled by traditional Linear Algebra algorithms. A number of recent results address this. One proves that a small random sample of columns and a small random sample of rows are sufficient to approximate any matrix provided the sampling probabilities are judiciously chosen. Also, from a good low-rank approximation (LRA) to the sampled sub-matrices, one can derive a good LRA to the whole matrix. These approximations are suitable for a host of numerical applications which go under the name of Principal Component Analysis. We also discuss applications of these methods to a broad class of combinatorial problems of which a typical one is to maximize a low-degree n-variable polynomial over the corners of the unit n-cube. We describe an efficient algorithm for finding a rough analog of LRA to a tensor which then helps us estimate the maximum. 11:00 - 11:30 Santosh Vempala Department of Mathematics Massachusetts Institute of Technology SAMPLING METHODS FOR LOW RANK APPROXIMATION It is well-known that the sum of the first k terms of the Singular Value Decomposition of a matrix gives its optimal rank-k approximation (under the 2-norm or the Frobenius norm). Is computing the SVD essential for low-rank approximation? It was shown in 1998 (FKV) that a small sample of rows chosen according to their squared lengths can be used to compute a low-rank approximation whose error is at most the best possible plus a term that depends on the sample size and the norm of the original matrix. This leads to a randomized algorithm to compute such an approximation in time linear in the number of non-zero entries of the matrix. In this talk, we discuss this approach and present two ways of generalizing it, called adaptive sampling and volume sampling. Together, they show that a sample of O(k/c + klogk) rows of any matrix can generate a rank-k matrix whose error is at most (1+c) times that of the optimum rank-k matirx. The algorithm based on this computes such a multiplicative low-rank approximation, also in linear time. 11:30 - 12:00 Petros Drineas Department of Computer Science Rensselaer Polytechnic Institute SUBSPACE SAMPLING AND RELATIVE ERROR MATRIX APPROXIMATION In this talk we will focus on low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or rows of a matrix, as opposed to, e.g., linear combinations of up to all the columns and/or rows of the matrix, such as provided by truncating the singular value decomposition (SVD). Motivations for studying such matrix decompositions include (i) the efficient decomposition of large low-rank matrices that posses additional structure such as sparsity or non-negativity, (ii) expressing Gram matrices in statistical learning theory in terms of a small number of actual data points, (iii) the improved data interpretability that such decompositions provide for datasets in the Internet domain, the social sciences, biology, chemistry, medicine, etc., and (iv) the efficient computation of low-rank matrix approximations in space-constrained settings. We shall discuss two such decompositions. Given an m-by-n matrix A, the first one approximates A by the product CX, where C consists of a few columns of A, and X is a coefficient matrix. The second one is of the form CUR, where C consists of a few columns of A, R consists of a few rows of A, and U is a carefully constructed, constant-sized matrix. Previous such matrix decompositions only guaranteed additive error approximations. Our algorithms for constructing C, X, U, and R take low polynomial time and return approximations that are almost the best possible in a relative error sense. They employ the subspace sampling technique; in particular, rows and columns of A are sampled with probabilities that depend on the lengths of the rows of the top few singular vectors of A. 12:00 - 1:30 LUNCH (on your own) 1:30 - 2:30 Tutorial Dianne O'Leary Department of Computer Science and Institute for Advanced Computer Studies University of Maryland, College Park MATRIX FACTORIZATIONS FOR INFORMATION RETRIEVAL This tutorial will survey a wide variety of matrix decompositions that have been proposed for use in information retrieval. Comparisons among the methods and their applicability to massive data sets will be emphasized. 2:30 - 3:00 G.W. Stewart Department of Computer Science University of Maryland, College Park SPARSE REDUCED-RANK APPROXIMATIONS TO SPARSE MATRICES The purpose of this talk is to describe an algorithm for producing a reduced rank approximation to a sparse mxn matrix A in which the factors are either small or sparse. When the sparseness of the factorization is not an issue, there are two decompositions that give reduced-rank decompositions: the singular value decomposition and the pivoted QR decomposition. This talk focuses on the latter. We will show how a variant of the pivoted Gram--Schmidt algorithm can produce approximations to A of the form XS, where where X is an mxp matrix consisting of columns of X and S is a pxn dense matrix. Thus if p is sufficiently small, this factorization is suitable for the case where m is much larger than n. The same algorithm applied to the transpose of A an approximation of the form TY, where Y is an pxn matrix matrix consisting of rows of A and T is an dense mxp matrix. If both approximations have been computed, they may be combined into a approximation of the form XUY, where U is a pxp matrix--an approximation suitable for the case where both m and n are large. The approximations can be computed a column (or row) at a time. At each stage the error in the approximation can be computed essentially for free, thus giving a rigorous criterion for stopping the process. 3:00 - 3:30 Haesun Park College of Computing Georgia Institute of Technology ADAPTIVE DISCRIMINANT ANALYSIS BY REGULARIZED MINIMUM SQUARED ERRORS Fisher's discriminant analysis for binary class dimension reduction and the binary classifier design based on the minimum squared error formulation (MSE) have been widely utilized for handling high-dimensional clustered data sets. As the data sets get modified from incorporating new data points and deleting obsolete data points, there is a need to develop efficient updating and downdating algorithms for these methods to avoid expensive recomputation of the solution from scratch. In this talk, an efficient algorithm for adaptive linear and nonlinear kernel discriminant analysis based on regularized MSE, called KDA/RMSE, is introduced. In adaptive KDA/RMSE, updating and downdating of the computationally expensive eigenvalue decomposition (EVD) or singular value decomposition (SVD) is approximated by updating and downdating of the QR decomposition achieving an order of magnitude speed up. This fast algorithm for adaptive kernelized discriminant analysis is designed by utilizing regularization and some relationship between linear and nonlinear discriminant analysis for dimension reduction and the MSE for classifier design. An efficient algorithm for computing leave-one-out cross validation is also introduced by utilizing downdating of KDA/RMSE. Based on a joint work with Hyunsoo Kim and Barry Drake. 3:30 - 4:00 COFFEE BREAK 4:00 - 4:30 Michael Mahoney Yahoo! Research CUR MATRIX DECOMPOSITIONS FOR IMPROVED DATA ANALYSIS Much recent work in theoretical computer science, numerical linear algebra, machine learning, and data analysis has considered low-rank matrix decompositions of the following form: given an m-by-n matrix A, decompose it as a product of three matrices, C, U, and R, where C consists of a few columns of A, R consists of a few rows of A, and U is a small carefully constructed matrix that guarantees that the product CUR is "close," either provably or empirically, to A. Applications of such decompositions include the computation of compressed "sketches" for large matrices in a pass-efficient manner, matrix reconstruction, speeding up kernel-based statistical learning computations, sparsity-preservation in low-rank approximations, and improved interpretability of data analysis methods. We shall review the motivation for these decompositions, discuss various choices for the matrices C, U, and R that are appropriate in different application domains, and then discuss the application of CUR decompositions to three diverse application domains: human genetics, medical imaging, and internet recommendation systems. In each of these applications, the columns and rows have a natural interpretation in terms of the processes generating the data, and CUR decompositions can be used to solve reconstruction, clustering, classification, or prediction problems in each of these areas. 4:30 - 5:00 Daniel Spielman Department of Computer Science Yale University FAST ALGORITHMS FOR GRAPH PARTITIONING AND SPARSIFICATION, AND THE SOLUTION OF SYMMETRIC DIAGONALLY-DOMINANT LINEAR SYSTEMS Motivated by the problem of solving systems of linear equations, we develop nearly-linear time algorithms for partitioning graphs and for building strong sparsifiers. The graph partitioning algorithm is the first provably fast graph partitioning algorithm that finds sparse cuts of nearly optimal balance. It is based on an algorithm for finding small clusters in massive graphs that runs in time proportional to the size of the cluster it outputs. Using the graph partitioning algorithm and random sampling, we show how to spectrally spproximate any graph by a sparse subgraph. We then apply these algorithms, along with constructions of low-strech spanning trees, to optimize a preconditioner contruction introduced by Vaiday. We thereby obtain a nearly-linear time algorithm for approximately solving systems of linear equations of the form Ax=b, where A is symmetric and diagonally dominant. Joint work with Shang-Hua Teng. 5:00 - 5:30 Anna Gilbert/Martin Strauss Department of Mathematics University of Michigan, Ann Arbor LIST DECODING OF NOISY REED-MULLER-LIKE CODES Coding theory has played a central role in the development of computer science. One critical point of interaction is decoding error-correcting codes. First- and second-order Reed-Muller (RM(1) and RM(2), respectively) codes are two fundamental error-correcting codes which arise in communication as well as in probabilistically checkable proofs and learning. (The theory of first-order Reed-Muller codes is also known as Fourier analysis on the Boolean cube.) In this paper, we take the first steps toward extending the decoding tools of RM(1) into the realm of quadratic binary codes. We show how to recover a substantial subcode of RM(2) called a Kerdock code, in the presence of significant noise. The Kerdock codes are a well-studied family of codes for coding theory, radar signaling, and spread spectrum communication. Our result is a list-decoding result for Kerdock codes which is roughly analogous to that of RM(1). In addition, we present a new algorithmic characterization of Kerdock codes that we hope will be more useful to the algorithmic (and coding theory) community than the classic descriptions. Joint work with Robert Calderbank. 5:30 - 6:00 Bob Plemmons Department of Mathematics and Computer Science Wake Forest University LOW-RANK NONNEGATIVE FACTORIZATIONS FOR SPECTRAL IMAGING APPLICATIONS Nonnegative matrix factorization (NMF) is a technique with numerous applications that has been used extensively in recent years for approximating high dimensional data where the data are composed of nonnegative entries. The process involves low-rank nonnegative approximate matrix factorizations to allow the user to work with reduced dimensional models and to facilitate more efficient statistical classification of data. In spectral imaging, we will describe our use of NMF to obtain spectral signatures for space object identification using a point source, imaged at different spectral frequencies. In other applications, we are also concerned with multispectral data to be processed for extended object identification and analysis, and for processing stacks of correlated 2D images. We wish to find a nonnegative factorization of a multiway array, i.e., a tensor. The non-negative tensor factorization (NTF) problem has a unique set of difficulties for our applications that we will discuss. Joint work with Christos Boutsidis, Misha Kilmer, and Paul Pauca. 6:00 - 6:30 Art Owen Department of Statistics Stanford University A hybrid of multivariate regression and factor analysis Multivariate regression is a useful tool for identifying age related genes from microarray data. Introducing latent variables into the regression provides a diagnostic tool for spotting observations that are outliers and for identifying missing variables. An example of the former is a patient whose kidney appears, from gene expression, to be that of a much younger patient. The latent variables alone yield a factor analysis model while the measured variables alone are a multivariate regression. The criterion for the combined model is a Frobenious norm on the residual. The set of minimizers can be characterized in terms of a singular value decomposition. A power iteration appears to be faster the SVD in some real data. Based on a collaboration with Stuart Kim and Jacob Zahn. 6:30 - 8:00 WELCOME RECEPTION AT NEW GUINEA GARDEN

9:00 - 10:00 Tutorial: Prabhakar Raghavan Yahoo! Research THE CHANGING FACE OF WEB SEARCH Web search has come to dominate our consciousness as a convenience we take for granted, as a medium for connecting advertisers and buyers, and as a fast-growing revenue source for the companies that provide this service. Following a brief overview of the state of the art and how we got there, this talk covers a spectrum of technical challenges arising in web search - ranging from social search to auction design and incentive mechanisms. 10:00 - 10:30 Tong Zhang Yahoo! Research STATISTICAL RANKING PROBLEM Ranking has important applications in web-search, advertising, user recommender systems, etc, and is essential for internet companies such as Yahoo. In this talk, I will first go over some formulations and methods of ranking in the statistical literature. Then I will focus on a formulation suitable for web search, and talk about training relevance models based on DCG (discounted cumulated gain) optimization. Under this metric, the system output quality is naturally determined by the performance near the top of its rank-list. I will mainly discuss various theoretical issues in this learning problem. They reflect real issues encountered at Yahoo when building a machine learning based web-search ranking system. Joint work with David Cossock at Yahoo. 10:30 - 11:00 COFFEE BREAK 11:00 - 11:30 Michael Berry Department of Computer Science University of Tennessee, Knoxville TEXT MINING APPROACHES FOR EMAIL SURVEILLANCE Automated approaches for the identification and clustering of semantic features or topics are highly desired for text mining applications. Using a low rank non-negative matrix factorization algorithm to retain natural data non-negativity, we eliminate the need to use subtractive basis vector and encoding calculations present in techniques such as principal component analysis for semantic feature abstraction. Existing techniques for non-negative matrix factorization are briefly reviewed and a new approach for non-negative matrix factorization is presented. A demonstration of the use of this approach for topic (or discussion) detection and tracking is presented using the Enron Email Collection. Joint work with Murray Browne. 11:30 - 12:00 Hongyuan Zha Department of Computer Science and Engineering Pennsylvania State University INCORPORATING QUERY DIFFERENCE FOR LEARNING RETRIEVAL FUNCTIONS In this talk we discuss information retrieval methods that aim at serving a diverse stream of user queries. We propose methods that emphasize the importance of taking into consideration of query difference in learning effective retrieval functions. We formulate the problem as a multi-task learning problem using a risk minimization framework. In particular, we show how to calibrate the empirical risk to incorporate query difference in terms of introducing nuisance parameters in the statistical models, and we also propose an alternating optimization method to simultaneously learn the retrieval function and the nuisance parameters. We work out the details for both L1 and L2 regularization cases. We illustrate the effectiveness of the proposed methods using modeling data extracted from a commercial search engine. Joint work with Zhaohui Zheng, Haoying Fu and Gordon Sun. 12:00 - 12:30 Trevor Hastie/Ping Li Department of Statistics Stanford University EFFICIENT L2 AND L1 DIMENSION REDUCTION IN MASSIVE DATABASES Sampling and sketching (including random projections) are two basic strategies for dimension reduction. For dimension reduction in L2 using random projections, we show that the accuracy can be improved by taking advantage of the marginal information; and the efficiency can be improved dramatically using a very sparse random projection matrix. For dimension reduction in L1, we prove an analog of the JL lemma using nonlinear estimators. Previous known results have proved that no analog of the JL lemma exists for L1 if restricted to linear estimators. As a competitor to random projections, a sketch-based sampling algorithm is very efficient and highly flexible in sparse data. This sampling algorithm generates random samples online from sketches. These various methods are useful for mining huge databases (e.g., association rules), distance-based clustering, nearest neighbor searching, as well as efficiently computing the SVM kernels. Joint work with Kenneth Church. 12:30 - 2:00 LUNCH (on your own) 2:00 - 3:00 Tutorial Muthu Muthukrishnan Google Inc. AN ALGORITHMICIST'S LOOK AT COMPRESSED SENSING PROBLEMS Recently Donoho introduced the problem of Compressed Sensing and proved an interesting result that there exists a set of roughly klog n vectors of dimension n each such that any n dimensional vector can be recovered to nearly best accuracy possible using k terms, from only the inner products of the vector with the klog n vectors. This result has partial precursors in algorithms research and many postcursors. I will provide an overview of Compressed Sensing, its pre- and postcursors, from an algorithmicist's point of view. 3:00 - 3:30 Inderjit Dhillon Department of Computer Sciences University of Texas, Austin KERNEL LEARNING WITH BREGMAN MATRIX DIVERGENCES Bregman divergences offer a rich variety of distortion measures that are applicable to varied applications in data mining and machine learning. Popular Bregman vector divergences are the squared Euclidean distance, relative entropy and the Itakura-Saito distance. These divergence measures can be extended to Hermitian matrices, yielding as special cases, the squared Frobenius distance, von Neumann divergence and the Burg divergence. In this talk, I will talk about the kernel learning problem that uses these Bregman matrix divergences in a natural way. A key advantage is in obtaining efficient update algorithms for learning low-rank kernel matrices. Joint work with Brian Kulis, Matyas Sustik and Joel Tropp. 3:30 - 4:00 Bruce Hendrickson Sandia National Laboratories LATENT SEMANTIC ANALYSIS AND FIEDLER RETRIEVAL Latent semantic analysis (LSA) is a method for information retrieval and processing which is based upon the singular value decomposition. It has a geometric interpretation in which objects (e.g. documents and keywords) are placed in a low-dimensional geometric space. In this talk, we derive a new algebraic/geometric method for placing objects in space to facilitate information analysis. Our approach uses an alternative motivation, and reduces to the computation of eigenvectors of Laplacian matrices. We show that our method, which we call Fiedler retrieval, is closely related to LSA, and essentially equivalent for particular choices of scaling parameters. We then show that Fiedler retrieval supports a number of generalizations and extensions that existing LSA approaches cannot handle, including unified text and link analysis. 4:00 - 4:30 COFFEE BREAK 4:30 - 5:00 Piotr Indyk Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology NEAR-OPTIMAL HASHING ALGORITHM FOR THE APPROXIMATE NEAREST NEIGHBOR PROBLEM The nearest neighbor problem is defined as follows: given a set of n data points in a d-dimensional space, construct a data structure which, given a query point q, returns the data point closest to the query. The problem is of importance in multiple areas of computer science. Nearest neighbor algorithms can be used for: classification (in machine learning), similarity search (in biological, text or visual databases), data quantization (in signal processing and compression), etc. Most of these applications involve high-dimensional data. Unfortunately, classic algorithms for geometric search problems, such as kd-trees, do not scale well as the dimension increases. In recent years, there has been extensive research done on *approximate* variants of the nearest neighbor problem. One of the algorithms, based on the idea of Locality-Sensitive Hashing (LSH), has been successfully used in several applied scenarios. In this talk I will review that work, as well as describe recent developments in this area. In particular, I will show a new LSH-based algorithm, whose running time significantly improves over the earlier bounds. The running time of the new algorithm is provably close to the best possible in the class of hashing-based algorithms. 5:00 - 5:30 Moses Charikar Department of Computer Science Princeton University COMPACT REPRESENTATIONS FOR DATA Several algorithmic techniques have been devised recently to deal with large volumes of data. At the heart of many of these techniques are ingenious schemes to represent data compactly. This talk will present several examples of such compact representation schemes. Some of these are inspired by techniques devised in the field of approximation algorithms for "rounding" solutions of LP and SDP relaxations. We will also see how such compact representation schemes lead to efficient, one-pass algorithms for processing large volumes of data (streaming algorithms). 5:30 - 6:00 Sudipto Guha Department of Computer and Information Science University of Pennsylvania AT THE CONFLUENCE OF STREAMS; ORDER, INFORMATION AND SIGNALS In this talk we will explore several new directions in data streams, particularly at the intersection of streams, information and signal processing. One of the most important feature of a data stream is the meaning or the semantics of the elements streaming by. They typically either define a distribution over a domain or represent a signal. We will investigate a few problems corresponding to these two scenarios. The first set of problems will consider modeling a distribution as a stream of updates, and investigate space efficient algorithms for computing various information theoretic properties, divergences etc. The second set of problems would consider wavelet processing on data streams. 6:00 - 6:30 Frank McSherry Microsoft Research PRESERVING PRIVACY IN LARGE-SCALE DATA ANALYSIS We discuss privacy issues associated with the analysis of sensitive data sets, motivated by the general inaccessibility of sensitive data due to privacy concerns. We present a definition of "differential privacy", in which each user is assured that no conclusion reached by the data analyst is much more likely than if than user had not participated in the study. The unequivocal nature of the statement, quantified over all possible conclusions and all prior knowledge of the analyst, gives very strong privacy guarantees. Additionally, we show detail how many common analyses can be faithfully implemented in a framework that yields differential privacy, and discuss the opportunities for incorporating new analyses. Joint work with Cynthia Dwork.

9:00 - 10:00 Tutorial Dimitris Achlioptas Department of Computer Science University of California, Santa Cruz APPLICATIONS OF RANDOM MATRICES IN SPECTRAL COMPUTATIONS AND MACHINE LEARNING We will see how carefully crafted random matrices can be used to enhance a wide variety of computational tasks, including: dimensionality reduction, spectral computations, and kernel methods in machine learning. Several examples will be considered, including the following two. Imagine that we want to compute the top few eigenvectors of a matrix A, hoping to extract the "important" features in A. We will prove that either this is not worth doing, or that we can begin by randomly throwing away a large fraction of A's entries. A famous result of Johnson and Lindenstrauss asserts that any set of n points in d-dimensional Euclidean space can be embedded in k- dimensional Euclidean space, where k=O(log n), such that all pairwise distances are preserved with arbitrarily good accuracy. We prove that to construct such an embedding it suffices to multiply the n x d matrix of points with a random d x k matrix, whose entries are set to ±1 independently, with equal probability. The emphasis of the talk will be on Euclidean intuition. In particular, no prior knowledge on random matrices and/or linear algebra will be assumed. 10:00 - 10:30 Tomaso Poggio Center for Biological and Computational Learning, Computer Science and Artificial Intelligence Laboratory, and McGovern Institute for Brain Research Massachusetts Institute of Technology LEARNING FROM DATA: THEORY, ENGINEERING APPLICATIONS AND (A BIT OF) NEUROSCIENCE The problem of learning is one of the main gateways to making intelligent machines and to understanding how the brain works. In this talk I will give a brief overview of recent work on learning theory and on general conditions for predictivity of an algorithm and ultimately for science itself. I will then show a few examples of our efforts in developing machines that learn for applications such as visual recognition and graphics and speech synthesis. Finally, I will sketch a new theory of the ventral stream of the visual cortex, describing how the brain may learn to recognize objects, and show that the resulting model is capable of performing recognition on datasets of complex images at the level of human performance in rapid categorization tasks. The model performs as well or better than most state-of-art computer vision systems in recognition of complex images. Relevant papers can be downloaded from http://www.ai.mit.edu/projects/cbcl/publications/all-year.html 10:30 - 11:00 COFFEE BREAK 11:00 - 11:30 Stephen Smale Department of Mathematics University of California, Berkeley and Toyota Technological Institute University of Chicago GEOMETRY AND TOPOLOGY OF DATA The following questions are discussed. Suppose points are drawn at random from a submanifold M of Euclidean space. How may one reconstruct the topology and the geometry of M? and with what confidence? 11:30 - 12:00 Gunnar Carlsson Department of Mathematics and Institute for Computational and Mathematical Engineering Stanford University ALGEBRAIC TOPOLOGY AND ANALYSIS OF HIGH DIMENSIONAL DATA In recent years techniques have been developed for evaluating Betti numbers of spaces given only a finite but large set of points (a "point cloud") sampled from the space. Such techniques can be applied in statistical settings where the data being considered is high dimensional (and hence cannot be visualized) and non-linear in nature. This talk will describe the techniques required to obtain the Betti numbers, as well as discuss some actual examples where they have been applied. 12:00 - 12:30 Vin de Silva Department of Mathematics and Computer Science Pomona College POINT-CLOUD TOPOLOGY VIA HARMONIC FORMS Point-cloud data sets are discrete objects which vary continuously, whereas topological spaces are continuous objects which vary discretely. It follows that the discrete invariants of classical algebraic topology (such as homology) are not immediately useful for point-cloud topology. It is always necessary to make those invariants continuous in some way. One very successful paradigm for this is the theory of persistent homology. I will discuss an alternative paradigm which uses harmonic forms defined by discrete Laplacian operators. 12:30 - 2:00 LUNCH (on your own) 2:00 - 2:30 Daniel Boley Department of Computer Science University of Minnesota, Twin Cities FAST CLUSTERING LEADS TO FAST SUPPORT VECTOR MACHINE TRAINING AND MORE We discuss how a fast deterministic clustering algorithm enables a variety of applications, including the training of support vector machines and the computation of a low memory factored representation which admits the application of many standard algorithms on massive datasets. Some theoretical and experimental properties of the clustering methods and of the resulting support vector machines will be provided. We will show that the clustering method compares favorably with k-means and that the resulting SVM compares favorably with that computed in the usual way directly from the data. 2:30 - 3:00 Chris Ding Lawrence Berkeley National Laboratory ON THE EQUIVALENCE OF (SEMI-)NONNEGATIVE MATRIX FACTORIZATION (NMF) AND K-MEANS/SPECTRAL CLUSTERING We show that the objective of NMF X=FG' is equivalent to K-means clustering: G is cluster indicator and F contains cluster centroids. This can be generalized to semi-NMF where X, F contain mixed-sign data, while G being nonnegative. We further propose convex-NMF by restricting F be convex combinations of data points, ensuring F to be meaningful cluster centroids. We also show that the symmetric NMF W=HH' is equivalent to Kernel \Km clustering and the Laplacian-based spectral clustering. All these follow by rewriting K-means objective as a trace of quadratic function whose continuous relaxation solution are given by PCA components. We emphasize orthogonality and nonnegativity in matrix based clustering. We derive the updating algorithms for semi-NMF, convex-NMF, symmetric-NMF and prove their convergence. We present experiments on face images, newgroups, web log and text data to show the effectiveness of these NMF based clustering. Based on joint work with Xiaofeng He, Horst Simon, Tao Li and Michael Jordan. 3:00 - 3:30 Al Inselberg School of Mathematical Sciences Tel Aviv University PARALLEL COORDINATES: VISUALIZATION & DATA MINING FOR HIGH DIMENSIONAL DATASETS A dataset with M items has 2M subsets anyone of which may be the one we really want. With a good data display our fantastic pattern-recognition ability can not only cut great swaths searching through this combinatorial explosion but also extract insights from the visual patterns. These are the core reasons for data visualization. With Parallel Coordinates (abbr. kcoords) the search for multivariate relations in high dimensional datasets is transformed into a 2-D pattern recognition problem. After a short overview of k-coords, guidelines and strategies for knowledge discovery are illustrated on different real datasets, one with 400 variables from a manufacturing process. A geometric classification algorithm based on k-coords is presented and applied to complex datasets. It has low computational complexity providing the classification rule explicitly and visually. The minimal set of variables required to state the rule is found and ordered by their predictive value. A visual economic model of a real country is constructed and analyzed to illustrate how multivariate relations can be modeled by means of hypersurfaces, understanding trade-offs, sensitivities and interelations. Parallel coordinates is also used in collision avoidance algorithms for air traffic control. P.S. Do not be intimidated by this formalistic description. The speaker is also well known for his numerological anecdotes and palindromic digressions! 3:30 - 4:00 Joel Tropp Department of Mathematics University of Michigan ONE SKETCH FOR ALL: A SUBLINEAR APPROXIMATION SCHEME FOR HEAVY HITTERS The heavy hitters problem elicits a list of the m largest-magnitude components in a signal of length d. Although this problem is easy when the signal is presented explicitly, it becomes much more challenging in the setting of streaming data, where the signal is presented implicitly as a sequence of additive updates. One approach maintains a small sketch of the data that can be used to approximate the heavy hitters quickly. In previous work, this sketch is essentially a random linear projection of the data that fails with small probability for each signal. It is often desirable that the sketch succeed simultaneously for ALL signals from a given class, a requirement that may be called uniform heavy hitters. It arises, for example, when the signal is queried a large number of times or when the signal updates are stochastically dependent. This talk describes a random linear sketch for uniform heavy hitters that succeeds with high probability. The recovery algorithm produces a list of heavy hitters that approximates the input signal with an ell_2 error that is optimal, except for an additive term that depends on the optimal ell_1 error and a controllable parameter eps. The recovery algorithm requires space m*poly(log(d)/eps) and time m^2*poly(log(d)/eps) to produce the list of heavy hitters. In the turnstile model, the time per update is m*poly(log(d)/eps). Up to logarithmic factors, the performance of this algorithm is the best possible with respect to several resources. Joint work with Anna Gilbert, Martin Strauss, and Roman Vershynin. 4:00 - 4:30 COFFEE BREAK 4:30 - 5:00 David Donoho Department of Statistics and Institute for Computational and Mathematical Engineering Stanford University NEEDLES IN HAYSTACKS: FINDING THE RELEVANT VARIABLES AMONG MANY USELESS ONES Many modern datasets have more variables than observations. Among those variables it is thought that only a small number are really useful for linear prediction, but we don't know which. We could proceed by combinatorial search to try all possible subsets and build prediction models with those, but that's computationally very heavy. In my talk I will describe some fun results showing that often one can get just as good results without all-subsets search. For example, often one can penalize models by the l1-norm on the coefficients and find the same model that combinatorial optimization would have found. There are maybe 100 papers related to this over the last 18 months, and I'll try to give a flavor of what's being done and why. 5:00 - 5:30 Robert Tibshirani Department of Statistics Stanford University PREDICTION BY SUPERVISED PRINCIPAL COMPONENTS In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called supervised principal components that can be applied to this type of problem. Supervised principal components is similar to conventional principal components analysis except that it uses a subset of the predictors that are selected based on their association with the outcome. Supervised principal components can be applied to regression and generalized regression problems such as survival analysis. It compares favorably to other techniques for this type of problem, and can also account for the effects of other covariates and help identify which predictor variables are most important. We also provide asymptotic consistency results to help support our empirical findings. These methods could become important tools for DNA microarray data, where they may be used to more accurately diagnose and treat cancer. Joint work with Eric Bair, Trevor Hastie, and Debashis Paul. 5:30 - 6:00 Tao Yang Ask.com and Department of Computer Science University of California, Santa Barbara PAGE RANKING FOR LARGE-SCALE INTERNET SEARCH: ASK.COM'S EXPERIENCES Ask.com has been developing a comprehensive suite of search and question-answering technology and differentiated products to help users to find what they are looking for. This talk gives an overview of Ask.com's search engine and describes many of challenges faced in seeking relevant answers from billions of documents for tens of millions of users everyday. Particularly, this talk will discuss our ExpertRank algorithm which provides relevant search results by identifying topics and authoritative sites on the Web through query-specific clustering and expert analysis. Joint work with Apostolos Gerasoulis, Wei Wang, and other Ask.com engineers. 6:00 - 8:00 POSTER SESSION/BANQUET

9:00 - 10:00 BREAKFAST 10:00 - 11:00 Tutorial Lek-Heng Lim Institute for Computational and Mathematical Engineering Stanford University MULTILINEAR ALGEBRA IN DATA ANALYSIS: TENSORS, SYMMETRIC TENSORS, AND NONNEGATIVE TENSORS A simple recipe for creating multilinear models is to take a bilinear model, say, a term-by-document matrix, and include one or more additional "modes" to get, say, a term-by-document-by-URL tensor (which may come from, for example, a collection of term-by-document matrices across different URLs). Now, in analogy with matrix methods, one finds a low-rank approximation to the tensor in question and hope that it reveals interesting features about the data. Here, rank may mean the outer-product rank or the multilinear rank (the respective models are sometimes known as CANDECOMP/PARAFAC and the Tucker model) or indeed any other notions of tensor rank. An immediate question is: why should this work? Many problems arise once we go from order-2 (matrices) to higher orders (tensors). Not least among which is the fact that the problem of finding a best rank-r approximation for tensors often do not even have a solution. Furthermore, there seems to be no proper statistical foundation for such models --- what is one really doing when decomposing a tensor of data values into a sum of outer product of vectors? why should these vectors tell us anything about the data? In this talk, we will discuss these and other related questions. We will also address similar issues for two classes of tensors that are important in data-analytic applications: symmetric tensors (Independent Component Analysis) and nonnegative tensors (Nonnegative Tensor Factorization). We will show how these topics have recently been enriched by ideas from the ancient (eg. Cayley's formula for the hyperdeterminant of a 2x2x2 tensor) to the present (eg. Donoho's work in \ell^0-approximations). Furthermore, we shall see how some existing computational, mathematical, and statistical tools could be employed to study multilinear models. Examples include: using Matroid Theory to prove uniqueness of tensor decompositions; using properties of higher secant varities of Segre/Veronese variety to shed light on the aforementioned non-existence of best rank-r approximations; using SDP to solve relaxed convex versions of the best rank-r tensor approximation problem; using the Log Linear Model, or more generally, Graphical Models and Bayesian Networks to provide a statistical foundation. Time permitting, we will also discuss Multilinear Spectral Theory and show how the eigenvalues of symmetric tensors may be used to obtain basic results in Spectral Hypergraph Theory and how one could prove a Perron-Frobenius Theorem for irreducible non-negative tensors. 11:00 - 11:30 Eugene Tyrtyshnikov Institute of Numerical Mathematics Russian Academy of Sciences TENSOR COMPRESSION OF PETABYTE-SIZE DATA In some applications we are interested to work with huge amounts of data for which standard methods of numerical analysis cannot be applied. For instance, consider a 3-dimensional array of size of several thousand in every dimension. Is it possible to work with this size array any efficiently? In the talk, we are going to show that the answer is "yes" - however, under certain assumptions on the data. The main assumption is that the data admits a sufficiently accurate low-tensor-rank approximation. Then, we present a special cross-approximation technique using a relatively small amount of the entries of the given array and providing the so-called Tucker decomposition with the complexity linear in one-dimensional size of the array. The technique generalizes the low-rank approximation technique for matrices to 3-dimensional arrays. The Tucker core can be further approximated by a trilinear (canonical) decomposition. Development of good algorithms solving the latter problem is still topical. To this end, we present a new algorithm that seems to be competitive with the best known algorithms. Joint work with I. Oseledets and D. Savostianov 11:30 - 12:00 Lieven De Lathauwer Lab ETIS CNRS, Ecole Nationale Superieure d'Electronique et de ses Applications, and University of Cergy-Pontoise THE DECOMPOSITION OF A TENSOR IN A SUM OF RANK-(R1,R2,R3) TERMS The Parallel Factor decomposition (PARAFAC) in multilinear algebra decomposes a higher-order tensor in a sum of rank-1 terms. The decomposition can be essentially unique without imposing orthogonality constraints on the rank-1 terms. Tucker's decomposition is a second way to generalize the Singular Value Decomposition (SVD) of matrices. Here, the transformation matrices are orthogonal. Their columns correspond to the directions of extremal oriented energy in column space, row space, etc. However, the decomposition does in general not reduce the tensor to diagonal form. In this talk we introduce a new tensor decomposition, of which Tucker's decomposition and PARAFAC are special cases. The decomposition thus generalizes the SVD of matrices in two ways. The result sheds new light on very fundamental aspects of tensor algebra, including tensor rank. 12:00 - 1:30 LUNCH 1:30 - 2:00 Orly Alter Department of Biomedical Engineering and Institute for Cellular and Molecular Biology University of Texas, Austin MATRIX AND TENSOR COMPUTATIONS FOR RECONSTRUCTING THE PATHWAYS OF A CELLULAR SYSTEM FROM GENOME-SCALE SIGNALS DNA microarrays make it possible to record, for the first time, the complete genomic signals, such as mRNA expression and DNA-bound proteins' occupancy levels, that are generated and sensed by cellular systems. The underlying genome-scale networks of relations among all genes of the cellular systems can be computed from these signals. These relations among the activities of genes, not only the activities of the genes alone, are known to be pathway-dependent, i.e., conditioned by the biological and experimental settings in which they are observed. I will describe the use of the matrix eigenvalue decomposition (EVD) and pseudoinverse projection and a tensor higher-order EVD (HOEVD) in reconstructing the pathways, or genome-scale pathway-dependent relations among the genes of a cellular system, from nondirectional networks of correlations, which are computed from measured genomic signals and tabulated in symmetric matrices [Alter & Golub, PNAS 2005]. EVD formulates a genes $\times$ genes network, which is computed from a ``data'' signal, as a linear superposition of genes $\times$ genes decorrelated and decoupled rank-1 subnetworks. Significant EVD subnetworks might represent functionally independent pathways that are manifest in the data signal. The integrative pseudoinverse projection of a network, computed from a data signal, onto a designated ``basis'' signal approximates the network as a linear superposition of only the subnetworks that are common to both signals, i.e., pseudoinverse projection filters off the network the subnetworks that are exclusive to the data signal. The pseudoinverse-projected network simulates observation of only the pathways that are manifest under both sets of the biological and experimental conditions where the data and basis signals are measured. I will define a comparative HOEVD, that formulates a series of networks computed from a series of signals as linear superpositions of decorrelated rank-1 subnetworks and the rank-2 couplings among these subnetworks. Significant HOEVD subnetworks and couplings might represent independent pathways or transitions among them common to all or exclusive to a subset of the signals. I will illustrate the EVD, pseudoinverse projection and HOEVD of genome- scale networks with analyses of mRNA expression data from the yeast {\it Saccharomyces cerevisiae} during its cell cycle and DNA-binding data of yeast transcription factors that are involved in cell cycle, development and biosynthesis programs. Boolean functions of the discretized subnetworks and couplings highlight known and novel differential, i.e., pathway- dependent relations between genes. 2:00 - 2:30 Shmuel Friedland Department of Mathematics, Statistics and Computer Science University of Illinois, Chicago TENSORS: RANKS AND APPROXIMATIONS The theory of $k(\ge 3)$ tensors became of great interest in recent applications. In particular, the rank of a tensor and an approximation of a given tensor by low rank tensors, emerged as the most outstanding problems in data processing with many parameters. In this talk I will point out some directions of study of these problems using analytical methods: as algebraic geometry combined with matrix theory, and numerical methods: as SVD for matrices combined with Newton algorithms. Some very preliminary results can be found in Chapter 5 of my notes: http://www2.math.uic.edu/~friedlan/matrlecsum.pdf 2:30 - 3:00 Tammy Kolda Sandia National Laboratories MULTILINEAR ALGEBRA FOR ANALYZING DATA WITH MULTIPLE LINKAGES Link analysis typically focuses on a single type of connection, e.g., two journal papers are linked because they are written by the same author. However, often we want to analyze data that has multiple linkages between objects, e.g., two papers may have the same keywords and one may cite the other. The goal of this paper is to show that multilinear algebra provides a tool for multi-link analysis. We analyze five years of publication data from journals published by the Society for Industrial and Applied Mathematics. We explore how papers can be grouped in the context of multiple link types using a tensor to represent all the links between them. A PARAFAC decomposition on the resulting tensor yields information similar to the SVD decomposition of a standard adjacency matrix. We show how the PARAFAC decomposition can be used to understand the structure of the document space and define paper-paper similarities based on multiple linkages. Examples are presented where the decomposed tensor data is used to find papers similar to a body of work (e.g., related by topic or similar to a particular author^Rs papers), find related authors using linkages other than explicit co-authorship or citations, distinguish between papers written by different authors with the same name, and predict the journal in which a paper was published. Joint work with Daniel M. Dunlavy and W. Philip Kegelmeyer. 3:00 - 3:30 Lars Eld\'en Department of Mathematics Link\"oping University COMPUTING THE BEST RANK-($R_1,R_2,R_3$) APPROXIMATION OF A TENSOR We investigate various properties of the best rank-($R_1,R_2,R_3$) approximation of a tensor, and their implications in the development of algorithms for computing the approximation. In particular we discuss the Grassmann-Newton method for solving the problem, and its relation to the Grassmann-Rayleigh quotient iteration of de Lathauwer et al. Joint work with Berkant Savas. 3:30 - 4:00 COFFEE BREAK 4:00 - 4:30 Liqun Qi Department of Mathematics City University of Hong Kong EIGENVALUES OF TENSORS AND THEIR APPLICATIONS Motivated by the positive definiteness issue in automatic control, I defined eigenvalues, E-eigenvalues, H-eigenvalues and Z-eigenvalues for a real completely symmetric tensor. An mth order n-dimensional real completely symmetric tensor has n(m 1) n 1 eigenvalues and at most P n 1 k=0(m 1) k E-eigenvalues. They are roots of two one-dimensional polynomials, which are called the characteristic polynomial and the E-characteristic polynomial respectively. The sum of the eigenvalues is equal to the trace of the tensor, multi- plied with (m 1) n 1 . The eigenvalues obey some Gerschgorin-type theorems, while the E-eigenvalues are invariant under orthogonal transformation. The latter property indicates that E-eigenvalues are invariants of practical tensors in physics and mechanics. Real eigenvalues (E-eigenvalues) with real eigenvectors (E-eigenvectors) are called H- eigenvalues (Z-eigenvalues). Z-eigenvalues always exist. When m is even, H-eigenvalues always exist. A real completely symmetric tensor is positive definite if and only if all of its H-eigenvalues (Z-eigenvalues) are positive. Based on the resultant theory, H-eigenvalue and Z-eigenvalue methods are proposed for judging positive definiteness of an even degree multivariate form when m and n are small. Independently, Lek-Heng Lim has also explored the definitions and applications of H- eigenvalues and Z-eigenvalues. In particular, he proposed a multilinear generalization of the Perron-Frobenius theorem based upon H-eigenvalues. 4:30 - 5:00 Brett Bader Sandia National Laboratories ANALYSIS OF LATENT RELATIONSHIPS IN SEMANTIC GRAPHS USING DEDICOM This presentation introduces the DEDICOM (DEcomposition into DIrectional COMponents) family of models from the psychometrics literature for analyzing asymmetric relationships in data and applies it to new applications in data mining, in particular social network analysis. In this work we present an improved algorithm for computing the 3-way DEDICOM model with modifications that make it possible to handle large, sparse data sets. We demonstrate the capabilities of DEDICOM as a new tool for reporting latent relationships in large semantic graphs, which we represent by an adjacency tensor. For an application we consider the email communications of former Enron employees that were made public, and posted to the web, by the U.S. Federal Energy Regulatory Commission during its investigation of Enron. We represent the Enron email network as a directed graph with edges labeled by time. Using the three-way DEDICOM model on this data, we show unique latent relationships that exist between types of employees and study their communication patterns over time. Joint work with Richard Harshman and Tamara Kolda. 5:00 - 5:30 Alex Vasilescu Media Laboratory Massachusetts Institute of Technology MULTILINEAR (TENSOR) ALGEBRAIC FRAMEWORK FOR COMPUTER VISION AND GRAPHICS Principal components analysis (PCA) is one of the most valuable results from applied linear algebra. It is used ubiquitously in all forms of data analysis -- in data mining, biometrics, psychometrics, chemometrics, bioinformatics, computer vision, computer graphics, etc. This is because it is a simple, non-parametric method for extracting relevant information through the demensionality reduction of high-dimensional datasets in order to reveal hidden underlying variables. PCA is a linear method, however, and as such it has severe limitations when applied to real world data. We are addressing this shortcoming via multilinear algebra, the algebra of higher order tensors. In the context of computer vision and graphics, we deal with natural images which are the consequence of multiple factors related to scene structure, illumination, and imaging. Multilinear algebra offers a potent mathematical framework for explicitly dealing with multifactor image datasets. I will present two multilinear models that learn (nonlinear) manifold representations of image ensembles in which the multiple constituent factors (or modes) are disentangled and analyzed explicitly. Our nonlinear models are computed via a tensor decomposition, known as the M-mode SVD, which is an extension to tensors of the conventional matrix singular value decomposition (SVD), or through a generalization of conventional (linear) ICA called Multilinear Independent Components Analysis (MICA). I will demonstrate the potency of our novel statistical learning approach in the context of facial image biometrics, where the relevant factors include different facial geometries, expressions, lighting conditions, and viewpoints. When applied to the difficult problem of automated face recognition, our multilinear representations, called TensorFaces (M-mode PCA) and Independent TensorFaces (MICA), yields significantly improved recognition rates relative to the standard PCA and ICA approaches. Recognition is achieved with a novel Multilinear Projection Operator. In computer graphics, our image-based rendering technique, called TensorTextures, is a multilinear generative model that, from a sparse set of example images of a surface, learns the interaction between viewpoint, illumination and geometry, which determines surface appearance, including complex details such as self-occlusion and self shadowing. Our tensor algebraic framework is also applicable to human motion data in order to extract "human motion signatures" that are useful in graphical animation synthesis and motion recognition. 5:30 - 6:00 Rasmus Bro Chemometrics Group, Department of Dairy and Food Science The Royal Veterinary and Agricultural University MULTI-WAY ANALYSIS OF BIOINFORMATIC DATA In many metabonomic (bioinformatic) investigations, the data suffer from specific problems such as baseline issues, shifts in time etc. Additionally, such data are often characterized by being overwhelmingly information rich which is a problem in mathematical and statistical terms but even more so from a scientific point of view because the desire is often to obtain valid causal explanations for complex biological problems. We will show how tailored multi-way analysis tools can help in analysis of such complex data and show several examples of different origin. 6:00 - 6:30 (cancelled with apologies) Pierre Comon Laboratoire I3S CNRS and University of Nice, Sophia-Antipolis INDEPENDENT COMPONENT ANALYSIS VIEWED AS A TENSOR DECOMPOSITION The problem of identifying linear mixtures of independent random variables only from outputs can be traced back to 1953 with the works of Darmois or Skitovich. They pointed out that when data are non Gaussian, a lot more can be said about the mixture. In practice, Blind Identification of linear mixtures is useful especially in Factor Analysis, in addition to many other application areas (including Signal & Image Processing, Digital Communications, Biomedical, or Complexity Theory). Harshman and Carroll provided independently in the seventies numerical algorithms to decompose a data record stored in a 3-way array into elementary arrays, each representing the contribution of a single underlying factor. The main difference with the well known Principal Component Analysis is that the mixture is not imposed to be a unitary matrix. This is very relevant because the actual mixture often has no reason to have orthogonal columns. The Parafac ALS algorithm, widely used since that time, theoretically does not converge for topological reasons, but yields very usable results after a finite number of iterations, under rank conditions however. Independently, the problem of Blind Source Separation (BSS) arose around 1985 and was solved -explicitly or implicitly- with the help of High-Order Statistics (HOS), which are actually tensors. It gave rapidly birth to the more general problem of Independent Component Anlalysis (ICA) in 1991. ICA is a tool that can be used to extract Factors when the physical diversity does not allow to store directly and efficiently the data in tensor format, in other words when the data cannot be uniquely decomposed directly. The problem then consists of decomposing a data cumulant tensor, of arbitrarily chosen order, into a linear combination of rank-one symmetric tensors. For this purpose, several algorithms can be thought of. 6:30 - 8:00 CLOSING RECEPTION