CS 236: Advanced Databases


Course Description:

In this course, we will discuss various issues arising in the context of data management. The course will begin with a review of such issues as file systems, architecture of database management systems, data models, and relational databases. We will also examine logical and physical design of databases, hardware and software implementation of database systems, and distributed databases. The bulk of the class will consist of reading papers drawn from the research literature.

Prerequisites:

Students must have taken a course in databases.

Class times:

Mondays and Wednesdays, 3:30pm - 4:50pm. The class meets in Bourns A125.

Office hours:

5pm - 6pm, MW, or by appointment. Tel: 827-2451. E-mail: ravi@cs.ucr.edu.

Grading:

Class participation: 15%, project: 50%, exams: 35%.

Project or Research Paper

You will need to complete a research paper or a systems project for the class. Please see the "Assignments" section in Canvas for details.

Books Useful to this Class

The bulk of the readings are expected to be from the research literature. A list of readings from the literature will be made available. No textbook is specifically required, but the following books are likely to be useful:

Database Conferences

Here is a list of conferences with papers of relevance to this class.

The conferences have been ranked as "Tier-1" (highest prestige), "Tier-2", etc. Database conferences are prefixed with (DB). However, on often finds relevant papers in conferences on Data Mining (DM), Machine Learning (AI), Information Retrieval (IR), the World-Wide Web (W3), etc.

These conferences will give you a good idea of the nature of current research in the field of databases.

Paper Readings in the Class

Here is a preliminary list of papers we will read in this class.

Indexing

R-tree indices:

  • Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57, R-tree.pdf
  • N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An Efficient and Robust Access Method For Points and Rectangles. SIGMOD Conference 1990, rstar.pdf
  • The Grid File: J. Nievergelt, H. Hinterberger, K.C. Sevcik. The Grid File: An Adaptable, Symmetric Multikey File Structure. ACM Trans. Database Syst. 9(1): 38-71 (1984), grid-file.pdf see also this summary.

    Space Filling Curves:
  • H.V. Jagadish. Linear clustering of objects with multiple attributes. SIGMOD Conference 1990, hilbert-curve.pdf
    Atinder's slides on R-Trees: rtree-slides
    Here are slides on R-Trees, grid-file and space filling curves from G. Kollios: Kollios-NTUA-structures-slides
    You can find a framework (implemented by Marios Hadjieleftheriou) to create spatial indices here.

    Spatial Queries
    Join Processing:
  • Leonard D. Shapiro: Join Processing in Database Systems with Large Main Memories. TODS 11(3): 239-264, join.pdf
    Donghui's slides on join processing: join-slides

    Spatial Joins:
  • T. Brinkhoff, H-P Kriegel, B. Seeger: Efficient Processing of Spatial Joins using R-trees. Proc. SIGMOD, 1993, r-tree-join.pdf
  • Ming-Ling Lo, Chinya V. Ravishankar: Spatial Joins using Seeded Trees. SIGMOD Conference 1994: 209-220, seeded.trees.pdf
  • Ming-Ling Lo, Chinya V. Ravishankar: Spatial Hash-Joins. SIGMOD Conference 1996: 247-258, shj.pdf
  • Nick Koudas, Kenneth C. Sevcik: Size Separation Spatial Join. SIGMOD Conference 1997: 324-335, ssj.pdf
    Donghui's slides on spatial joins: spatial-join-slides
    Ravi's slides on seeded-tree joins: seeded-trees-join slides

    Nearest Neighbors:
  • N. Roussopoulos, S. Kelley, F. Vincent: Nearest Neighbor Queries. SIGMOD Conference 1995: 71-79, roussopoulosNN95.pdf
  • G.R. Hjaltason, H. Samet: Ranking in Spatial Databases. SSD 1995: 83-95, hjaltason95ranking.pdf
    NN slides from G. Kollios: slides1 and from Y. Tao: slides2

    Skyline Queries:
  • Stephan Börzsönyi, Donald Kossmann, Konrad Stocker: The Skyline Operator. ICDE 2001: 421-430, skyline-operator.pdf
  • Jan Chomicki, Parke Godfrey, Jarek Gryz, Dongming Liang: Skyline with Presorting. ICDE 2003:717-719, skyline-presorting.pdf
  • Dimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger: An Optimal and Progressive Algorithm for Skyline Queries. SIGMOD Conference 2003: 467-478, skyline-bbs.pdf
  • Skyline slides from Y. Tao: skyline slides

    Data Intensive Applications
  • Dean, J. and Ghemawat, S. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (Jan. 2008), 107-113, MapReduce.pdf
    The map-reduce slides from Cloudera.

    Aggregation for Data Intensive Applications:
  • Jian Wen, Vinayak R. Borkar, Michael J. Carey, Vassilis J. Tsotras: Revisiting Aggregation for Data Intensive Applications: A Performance Study. CoRR abs/1311.0059 (2013), aggregation.pdf
    Here are the slides on aggregation, aggregation-slides

    Top-K Queries
  • R. Fagin. "Combining fuzzy information: an overview." SIGMOD Record, Vol 31,No 2, June 2002, pp. 109-118, fagin-sigrec02.pdf Here are the Top-k slides

    Temporal Databases And Indexing
  • Slides on Temporal DBs and Indexing: temporal databases, snapshot index, MVB-Tree.
  • B. Salzberg and V.J. Tsotras: Comparison of Access Methods for Time-Evolving Data. ACM Comput. Surv. 31(2): 158-221 (1999), tempDB-survey.
  • V.J. Tsotras, N. Kangerlaris: The Snapshot Index: An I/O-optimal access method for timeslice queries. Inf. Syst. 20(3): 237-260 (1995), SI-index.
  • B. Becker, S. Gschwind, T. Ohler, B. Seeger, P. Widmayer: An Asymptotically Optimal Multiversion B-Tree. VLDB J. 5(4): 264-275 (1996), MVB-Tree

    Data Outsourcing and Security
  • H. Hacigümüs, B. Iyer, C. Li, and S. Mehrotra. Executing SQL over encrypted data in the database-service-provider model. In Proc. ACM SIGMOD, pages 216-227, 2002.
  • B. Hore, S. Mehrotra, M. Canim, and M. Kantarcioglu. Secure multidimensional range queries over outsourced data. The VLDB Journal, pages 1-26, 2011.
  • Jonathan L. Dautrich and Chinya V. Ravishankar, ``Compromising Privacy in Precise Query Protocols'', Proc. of the 16th International Conference on Extending Database Technology (EDBT 2013), Genoa, Italy, March 2013.
  • Peng Wang and Chinya V. Ravishankar, ``Secure and Efficient Range Queries on Outsourced Databases Using #-trees'', Proc. 29th International Conference on Data Engineering (ICDE 2013), Brisbane, Australia, April 2013.