Reading list

Required reading list

  1. Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, Hari Balakrishnan. Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications. IEEE/ACM Transactions on Networking (TON), 2003.
    PDF

  2. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. 6th Symposium on Operating Systems Design and Implementation (OSDI), 2004.
    PDF

  3. Christopher Olston, Benjamin Reedy, Utkarsh Srivastavava, Ravi Kumar, Andrew Tomkins. Pig Latin: A Not-So-Foreign Language for Data Processing. SIGMOD, 2008.
    PDF

  4. Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, Volume 21 , Issue 7 (July 1978), pages 558-565.
    PDF

Optional reading list

  • Distributed database design

    • Vertical fragmentation

      • S. Navathe, S. Ceri, G. Wiederhold, J. Dou. Vertical Partitioning Algorithms for Database Design. ACM Transactions on Database Systems (TODS), volume 9, issue 4, 1984.
        PDF

      • B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H. Jacobsen, N. Puz, D. Weaver, R. Yerneni, PNUTS: Yahoo!’s Hosted Data Serving Platform. PVLDB 2008
        PDF

  • Query processing and optimization in distributed databases

    • Privacy-preserving join

      • R. Agrawal, A. Evfimievski, R. Srikant. Information Sharing Across Private Databases. ACM SIGMOD International Conference on Management of Data, San Diego, California, 2003.
        PDF

  • Data Replication

    • Adaptive scheme for replicating data

      • S. Kadambi, J. Chen, B. Cooper, D. Lomax, R. Ramakrishnan, A. Silberstein, E. Tam, and H. G. Molina. Where in the world is my data? VLDB, 2011.
        PDF

    • Paxos

      • Jonathan Kirsch and Yair Amir. Paxos for System Builders. CNDS, March 2008.
        PDF

    • Zab

      • Flavio P. Junqueira, Benjamin C. Reed, and Marco Serafini. Zab: High-performance broadcast for primary-backup systems. DSN-DCCS, 2011.
        PDF

  • P2P

    • Designing a Super-peer Network

      • Yang, Beverly and Garcia-Molina, Hector (2003) Designing a Super-peer Network. In: IEEE International Conference on Data Engineering, (ICDE 2003), March 5-8, 2003, Bangalore, India.
        PDF

  • Distributed information retrieval

    • Crawling

      • P. Boldi, B. Codenotti, M. Santini, S. Vigna. UbiCrawler: A Scalable Fully Distributed Web Crawler. Software Practice & Experience 34(8): 711-726.
        PDF

      • A. Arasu, J. Cho, H.Garcia-Molina, A. Paepcke, S. Raghavan. Searching the Web. ACM Transactions on Internet Technology, Vol. 1, No. 1, August 2001, Pages 2–43.
        PDF

    • Caching

      • R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, F. Silvestri. The Impact of Caching on Search Engines. ACM SIGIR International Conference on Information Retrieval, Amsterdam, The Netherlands, 2007.
        PDF

  • Open Source Systems

    • S4

      • Leonardo Neumeyer, Bruce Robbins, Anish Nair, Anand Kesari. S4: Distributed Stream Computing Platform. 2010 IEEE International Conference on Data Mining Workshops.
        PDF

    • Hyracks

      • Vinayak Borkar, Michael Carey, Raman Grover, Nicola Onose, Rares Vernica. Hyracks: A Flexible and Extensible Foundation for Data-Intensive Computing. ICDE, 2011.
        PDF

    • BigTable

      • FAY CHANG, JEFFREY DEAN, SANJAY GHEMAWAT, WILSON C. HSIEH, DEBORAH A. WALLACH, MIKE BURROWS, TUSHAR CHANDRA, ANDREW FIKES, and ROBERT E. GRUBER. Bigtable: A Distributed Storage System for Structured Data. OSDI, 2006.
        PDF

    • Pregel

      • Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: A System for Large-Scale Graph Processing. SIGMOD, 2010.
        PDF