## Algorithms, Incentives, and Optimization Techniques for Social Data

### Summary of Outocmes

This project resulted in innovation along four fronts:- Network algorithms at scale. We developed algorithms that were deployed at multiple companies, including Twitter. For example, see the following blog post at Twitter. We developed the first sub-linear algorithms for Personalized PageRank, a measure of importance for ranking search results based on an individual user's social network.
- Crowdsourced Democracy: We did foundational work in algorithms and mechanisms for decision making at scale. We also deployed our work in the form of a platform for participatory budgeting that was used by over a dozen cities and municipalities.
- Fundamental Algorithms: We discovered fundamental properties of graphs and networks. For example, we established new bounds on the efficiency of the perfect and maximum matching problems. We also studied connectivity properties of random sub-graphs of a large graph, and used that to prove that a certain kind of distributed currency will provide high liquidity in practice.
- We studied the privacy of social networks, including identifying features of Facebook's ad targeting that could lead to the disclosure of sensitive user information. This was communicated to and fixed by Facebook.

#### A technical overview, and a list of selected publications, is presented below.

### Overview

In recent years, individuals have become mass producers of content, generating articles, blogs, opinions, reviews, and recommendations, in a decentralized manner. This content is then discovered and consumed by other users, and any centralized review is rendered infeasible by the sheer magnitude of the available content. Consequently, there is a need to utilize user feedback, both explicit and implicit, in order to provide optimum rankings and recommendations to users on the Internet. The same broad problem occurs in determining appropriate advertisements for a given search keyword, automatic moderation of discussion boards, and automated deductions of user preference on social networks.User activity data on the Internet tends to be very large, with hundreds of millions of users performing billions of actions every day. At the same time, this data is also very sparse, since each user only performs a very small share of possible actions on the Internet (eg. searches for a small fraction of possible keywords, reviews or purchases a small fraction of all products, etc.). Our goal is to design algorithms and optimization techniques to optimally utilize such data. We treat the sparse data as a ``prior belief'' on user preferences. We are also designing economic incentives to obtain useful and rich data, robust to manipulation. Further, we are interested in the field of reputation and recommendation systems, broadly defined.

This project is funded primarily by collaborative NSF grants 0904314 and 0904325. Research related to this project is often discussed at the RAIN seminar (Research on Algorithms for the INternet).

### Project Participants

**PIs**: Ashish Goel (Stanford) and Sanjeev Khanna (Penn)

**Senior Personnel:**N. Shivakumar (Stanford)

**Graduate Students:**THEORY: Anand Bhalgat, Tanmoy Chakraborty, Pranav Dandekar, Zhiyi Huang, Michael Kapralov, Aleksandra Korolova, David Lee, Ian Post, Sudeepa Roy. TOOLS: Noah Burbank, Eli Marschner, Eric Mibuari.

### Research Activities

Liquidity in Credit Networks: A Little Trust Goes a Long Way.

P. Dandekar, A. Goel, R. Govindan, and I. Post.

*Preliminary version presented at Proc. Workshop on the Economics of Networks, Systems, and Computation (NetEcon)*(NetEcon), 2010.

*Full version appeared in ACM Conference on Electronic Commerse (EC)*, 2011.

Strategic Formation of Credit Networks: Preliminary Report.

Pranav Dandekar, Bryce Wiedenbeck, Ashish Goel and Michael Wellman.

Preliminary report appeared in

*TADA 2011.*Send email to Ashish Goel if you want to receive a copy.

Connectivity in Random Forests and Credit Networks.

A. Goel, S. Khanna, S. Raghvendra and H. Zhang.

*ACM-SIAM Sympoisum on Discrete Algorithms, 2015*

Our approach also yields an algorithm for finding a perfect matching in a regular bipartite graph that runs in $O(n \log n)$ time, a well-studied special case of the BVN decomposition. This improves upon the previous best running time of $O(n^{1.5})$.

Perfect matchings in O(n log n) time in regular bipartite graphs.

A. Goel, M. Kapralov, and S. Khanna.

*Proc. 42nd ACM Symposium on Theory of Computing*(STOC), 2010.

FAST-PPR: scaling personalized pagerank estimation for large graphs

P. Lofgren, S. Banerjee, A. Goel, C. Seshadhri

Appeared in KDD 2015.

Title: Fast Incremental and Personalized PageRank

B. Bahmani, A. Chowdhury, and A. Goel.

*Appeared in VLDB 2011.*

Privacy Violations Using Microtargeted Ads: A Case Study,

A. Korolova.

*Appeared in PADM*2010.

Personalized Social Recommendations - Accurate or Private?

A. Machanavajjhala, A. Korolova, and A. D. Sarma

*Appeared in VLDB*, 2011.

Improved Approximation Results for Stochastic Knapsack Problems

A. Bhalgat, A. Goel, and S. Khanna.

*Proc. 22nd Annual ACM-SIAM Symposium on Discrete Algorithms*(SODA), 2011.

In more recent work, we have further improved the (8/3 + t)-approximation to 2+t.

A $(2+\epsilon)$-Approximation Algorithm for the Stochastic Knapsack Problem

Anand Bhalgat

*Under submission*

Many applications use probabilistic databases as a way to model uncertain and imprecise data. For instance, a probabilistic relational database associates a probability value with each tuple, indicating the likelihood of its presence. We initiated a study of computational tractability of answering queries on probabilistic databases that use the difference operation.

Queries with Difference on Probabilistic Databases

S. Khanna, S. Roy, and V. Tannen.

*37th International Conference on Very Large Data Bases (VLDB)*, 2011.

**Under development; please see the beta version.**

Telescopes and microscopes help you zoom in and focus on a specific spot. The goal of this project is to help users and societies see the bigger picture: focus not just on individual variables and constraints but also on the overall objective function and feasibility. We provide users with data to help apply the widescope philosophy to the problem of reducing federal and state budget deficits, and we are developing social networking primitives and systems to facilitate users to tackle these problems in a collaborative fashion. We are also developing general purpose theoretical constructs and systems to allow an application of the widescope philosophy to other important problems.

A Wall Street Journal Report about out work.

Compression Without a Common Prior: An Information-theoretic Justification for Ambiguity in Language

B. Juba, A. Kalai, S. Khanna, and M. Sudan

*2nd Symposium on Innovations in Computer Science (ICS),*2011.

Any physical channel of communication offers two potential reasons why its capacity (the number of bits it can transmit in a unit of time) might be unbounded: (1) (Uncountably) infinitely many choices of signal strength at any given instant of time, and (2) (Uncountably) infinitely many instances of time at which signals may be sent. However channel noise cancels out the potential unboundedness of the first aspect, leaving typical channels with only a finite capacity per instant of time. We show the surprising result that under a natural model of noise and delay errors, this latter source of infinity is indeed realizable, and the capacity of the associated physical channel is infinite.

Delays and the Capacity of Continuous-time Channels

S. Khanna and M. Sudan.

*To appear in 52nd IEEE Symposium on Foundations of Computer (FOCS),*2011.

Social Welfare in One-Sided Matching Markets without Money

A. Bhalgat, D. Chakrabarty, and S. Khanna.

*14th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX)*, 2011.

We model the economics of producing content in online social networks such as Facebook and Twitter using an

*attention market*. We treat attention as a mechanism for sharing the profit from consuming information and introduce two main utility models: incremental utility and Shapley utility. We prove a bi-criteria bound on the price of anarchy; in particular we show that the social welfare from central control over level of contribution by users is no larger than the social welfare from selfish agents with twice as large consumption utilities.

*Under submission*

Algorithms for the Generalized Sorting Problem

Z. Huang, S. Kannan, and S. Khanna.

*To appear in 52nd IEEE Symposium on Foundations of Computer (FOCS),*2011.