- M. Stonebraker. What Goes Around Comes Around. Readings in Database Systems. 2004.
-
M. Stonebraker et al. "One Size Fits All": An Idea Whose Time Has Come and Gone, 2005
- A. Halevy et al. The Unreasonable Effectiveness Of Data, IEEE Intelligent Systems, 2009.
|
-
L. Shapiro: Join Processing in Database Systems with Large Main
Memories. ACM Trans. Database Syst. 11(3): 239-264 (1986)
-
P. Selinger et al.: Access Path Selection in a Relational
Database Management System. SIGMOD 1979: 23-34
-
S. Chaudhuri: An Overview of Query
Optimization in Relational Systems. PODS 1998: 34-43
- Y. Ioannidis et al. Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD 1995.
- AHV Chapter 6.4 and Yannakakis's Algorithm (Acyclic Joins)
-
H. Ngo et al.:
Skew Strikes Back: New Developments in the Theory of Join Algorithms,
Manuscript, 2013
|
-
J. Gray et al.: Data Cube: A Relational Aggregation Operator
Generalizing Group-by, Cross-Tab, and Sub Totals. DMKD 1(1): 29-53 (1997).
- S. Abiteboul et al. Complexity of Answering Queries Using Materialized Views, PODS 1998.
- M Stonebraker et al. C-Store: A Column-oriented DBMS. VLDB 2005: 553-56.
- Fagin's Algorithm
|
-
J. Dean et al. MapReduce: simplified data processing on large clusters. Commun. ACM 51(1): 107-113 (2008).
- Parallel DBMS versus MapReduce
- Theory. Ullman and Suciu's papers about MapReduce and Joins
|
- C. Olston et al. Pig Latin: a not-so-foreign language for data processing. SIGMOD Conference 2008: 1099-1110
- A. Gates et al. Building a High-Level Dataflow System on top of MapReduce: The Pig Experience. PVLDB 2(2): 1414-1425 (2009)
- A. Thusoo et al., Hive: A Warehousing Solution Over A MapReduce Framework, VLDB, 2009.
- S. Melnik et al., Dremel: Interactive Analysis Of Web-Scale Datasets, VLDB, 2010.
|
- Y. Zhang: RIOT: I/O-Efficient Numerical Computing without SQL, CIDR 2009.
- ArrayStore.
- F. Niu. Hogwild!: A Lock-Free Approach to
Parallelizing Stochastic Gradient Descent NIPS, 2011.
- J. Canny. Big data analytics with small footprint: squaring the cloud
, KDD 2013.
- Notes: Simple Analysis of First-order Methods and QR decomposition.
|
- Y. Low, et al., Distributed GraphLab: A Framework For Machine Learning And Data Mining In The Cloud, VLDB, 2012
- M. Zaharia, et al., Resilient Distributed Datasets: A Fault-Tolerant Abstraction For In-Memory Cluster Computing, NSDI, 2012
- J. Hellerstein. The MADlib Analytics Library or MAD Skills, the SQL. PVLDB 2012
- Y. Bu. HaLoop: Efficient Iterative Data Processing on Large Clusters. VLDB 10.
|
- D. Ferruci et al. Building Watson: An Overview of the DeepQA Project. AI Magazine, 2013.
- Google's Knowledge Graph.
- Kasneci et al. The YAGO-NAGA approach to knowledge discovery, 2009
- Niu et al. Elementary: Large-scale Knowledge-base Construction
via Machine Learning and Statistical Inference, 2012.
- A. Carlson. Toward an Architecture for Never-Ending Language Learning, AAAI 2010.
- O. Deshpande et al. Building, maintaining, and using knowledge bases: a report from the trenches. SIGMOD 2013.
- Probabilistic Inference in Large Factor Graphs
|
- J. Gray: Granularity of Locks and Degrees of Consistency in a Shared Data Base, 1976.
- P. Lehman et al. Efficient Locking for Concurrent Operations on B-Trees. TODS 6(4): 650-670 (1981)
- C. Mohan et al.: ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. TODS 17(1): 94-162 (1992)
- C. Mohan et al.: Transaction Management in the R* Distributed Database Management System. TODS 11(4): 378-396 (1986)
- L. Lamport, Paxos Made Simple, ACM SIGACT News, 2001.
- CAP Theorem.
|
- W. Vogels. Eventually consistent. Commun. ACM 52(1): 40-44 (2009).
- F. Chang et al. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 26(2): (2008).
- G. DeCandia et al., Dynamo: Amazon's Highly Available Key-Value Store, SOSP, 2007
- B. Cooper et al., PNUTS: Yahoo!'s Hosted Data Serving Platform, VLDB, 2008
- P. Alvaro et al. Consistency Analysis in Bloom: a CALM and Collected Approach. CIDR 11.
- CALM Conjecture: a proof and a refutation.
|
- VoltDB and HStore (Main Memory Systems)
- J. Lee, et al., High-Performance Transaction Processing In SAP HANA, ICDE Bulletin, 2013
- J.C. Corbett, et al., Spanner: Google's Globally-Distributed Database, OSDI, 2012
- J. Shute, et al., F1: A Distributed SQL Database That Scales, VLDB, 2013
- M. Demirbas. An Overview Of Spanner, Online, 2013
|
- A. Parameswaran, et al., Crowdscreen: Algorithms For Filtering Data With Humans, SIGMOD, 2012
- M. Stonebraker, et al., Data Curation At Scale: The Data Tamer System, CIDR, 2013
- M. Franklin, et al., CrowdDB: Answering Queries With Crowdsourcing, SIGMOD, 2011
|
- R. Agrawal. Fast
Algorithms for Mining Association Rules in Large Databases. VLDB 1994.
- H. T. Kung: On Optimistic Methods for Concurrency Control. TODS 6(2): 213-226 (1981).
-
J. Gray and L. Lamport. Consensus on Transaction Commit. MSR-TR-2003-96.
|
- Chou: An Evaluation of Buffer Management
Strategies for Relational Database Systems. Algorithmica 1(3): 311-336
(1986).
- J. Gray. The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time. SIGMOD 1987.
- E. O'Neil: The LRU-K Page Replacement Algorithm For Database Disk Buffering. SIGMOD 1993: 297-306
- G. Graefe: The five-minute rule 20 years later (and how flash memory changes the rules). Commun. ACM 52(7): 48-59 (2009)
|
- T. Imielinksi et al.
Incomplete Information in Relational Databases, JACM 1994
-
Chapter 19 in AHV
- A. Das Sarma et al. Representing Uncertain Data: Uniqueness, Equivalence, Minimization, and Approximation, 2005.
-
Suciu et al. Probabilistic Databases, Synthesis Lectures, 2011.
|
- M.J. Hanson. Efficient Reading of
Papers in Science and Technology, 1989
-
P. Valduriez, Some Hints to Improve Writing of Technical Papers, 1994.
|