CIF: Small: Foundations of Decentralized Data Science: Optimizing Utility, Privacy and Communication Efficiency The deluge of data generated daily across mobile devices, sensors, and servers holds the promise of unprecedented inferential power to revolutionize numerous industries and scientific domains, from medicine, to engineering, to infrastructure. Traditional data science that pools this data to a single location is becoming increasingly unrealistic due to bandwidth limitations in communication networks. Legal, administrative, and ethical constraints in sharing proprietary, personal, or sensitive data pose further challenges on the path to realizing this promise. This project asks the following question: Can one extract value from data generated across an entire network without having to collect and process it in a single location? The broad goal of the project is to harness the inferential power of distributed data without the systemic privacy risks and costs resulting from traditional data collection. The project pursues decentralized schemes for a wide range of canonical data science tasks where users send narrowly scoped messages to querying servers to complete the desired task. These messages are designed to preserve the privacy of the user data and its sensitive characteristics while minimizing the total communication cost. As a result, they provide optimal trade-offs between accuracy for the desired task, privacy for the user data, and communication efficiency. The schemes adapt to the structure of the underlying data and network and, when available, can leverage low intrinsic dimensionality of the data and multi-round interactions over the network. This project also develops information-theoretic performance benchmarks that delineate what is impossible under privacy and communication constraints and establish optimality of the proposed schemes under various criteria. As such, the project delivers a rigorous and comprehensive theoretical foundation for decentralized data science that allows many canonical tasks to be efficiently and privately implemented on distributed data. Project Participants:
Collaborators:
Papers:
Teaching:
Outreach:
|