CIF: Small: Foundations of Decentralized Data Science: Optimizing Utility, Privacy and Communication Efficiency

The deluge of data generated daily across mobile devices, sensors, and servers holds the promise of unprecedented inferential power to revolutionize numerous industries and scientific domains, from medicine, to engineering, to infrastructure. Traditional data science that pools this data to a single location is becoming increasingly unrealistic due to bandwidth limitations in communication networks. Legal, administrative, and ethical constraints in sharing proprietary, personal, or sensitive data pose further challenges on the path to realizing this promise. This project asks the following question: Can one extract value from data generated across an entire network without having to collect and process it in a single location?
The broad goal of the project is to harness the inferential power of distributed data without the systemic privacy risks and costs resulting from traditional data collection. The project pursues decentralized schemes for a wide range of canonical data science tasks where users send narrowly scoped messages to querying servers to complete the desired task. These messages are designed to preserve the privacy of the user data and its sensitive characteristics while minimizing the total communication cost. As a result, they provide optimal trade-offs between accuracy for the desired task, privacy for the user data, and communication efficiency. The schemes adapt to the structure of the underlying data and network and, when available, can leverage low intrinsic dimensionality of the data and multi-round interactions over the network. This project also develops information-theoretic performance benchmarks that delineate what is impossible under privacy and communication constraints and establish optimality of the proposed schemes under various criteria. As such, the project delivers a rigorous and comprehensive theoretical foundation for decentralized data science that allows many canonical tasks to be efficiently and privately implemented on distributed data.

Project Participants:
  • Ayfer Ozgur, Principal Investigator
  • Surin Ahn, Graduate Student
  • Wei-Ning Chen, Graduate Student
  • Daria Reshetova, Graduate Student
  • Dan Song, Graduate Student


Collaborators:
  • Peter Kairouz, Google
  • Graham Cormode, META
  • Akash Bharadwaj, META

Papers:
  • Wei-Ning Chen, Dan Song, Ayfer Ozgur, Peter Kairouz, Privacy Amplification via Compression: Achieving the Optimal Privacy-Accuracy-Communication Trade-off in Distributed Mean Estimation, 2023. [pdf]
  • Daria Reshetova, Wei-Ning Chen, Ayfer Ozgur, Training generative models from privatized data, 2023. [arXiv]
  • Wei-Ning Chen, Ayfer Ozgur, Graham Cormode, Akash Bharadwaj, The communication cost of security and privacy in federated frequency estimation, AISTATS 2023. [pdf]
  • Yikun Bai, Xiugang Wu, Ayfer Ozgur, Information constrained optimal transport: From Talagrand, to Marton, to Cover, IEEE Transactions on Information Theory 2023.[pdf]
  • Wei-Ning Chen, Peter Kairouz, Ayfer Ozgur, Breaking the Communication-Privacy-Accuracy Trilemma, IEEE Transactions on Information Theory 2023. [pdf]
  • Wei-Ning Chen, Christopher A. Choquette-Choo, Peter Kairouz, and Ananda Theertha Suresh, The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning, ICML 2022. [pdf]
  • Wei-Ning Chen, Peter Kairouz, Ayfer Ozgur, The poisson binomial mechanism for unbiased federated learning with secure aggregation, ICML 2022. [pdf]
  • Surin Ahn, Wei-Ning Chen, Ayfer Ozgur, Estimating Sparse Distributions Under Joint Communication and Privacy Constraints, IEEE International Symposium on Information Theory (ISIT), 2022. [pdf]


Teaching:


Outreach: