CS 236: Advanced Databases
In this course, we will discuss various issues arising in the context of
data management. The course will begin with a review of such issues as
file systems, architecture of database management systems,
data models, and relational databases. We will also examine logical and
physical design of databases, hardware and software implementation of database
systems, and distributed databases. The bulk of the class will consist of
reading papers drawn from the research literature.
Students must have taken a course in databases.
Mondays, Wednesdays, & Fridays 9:10pm - 10:00pm. The class meets in MSE 003.
By appointment. Tel: 827-5318
Class participation: 15%, project: 50%, exams: 35%.
About the project or research paper
You have the choice of doing either a project or a research paper.
The project or research paper is a major part of the class grade, and you
should therefore expect to spend quite a bit of effort on it. You have the
choice of doing either the systems project that is asigned, or working on a
Ideally, a research paper should be publishable. However, a project that lays
the groundwork for what may publishable would also be acceptable. The
project may take several forms, but in all cases, its value depends on the new
contributions it makes. A project could be a software (or hardware) system
that implements and examines a new idea. Alternatively, it could be a
theoretical contribution that combines or extends existing ideas in novel or
To give you a sense of what to shoot for, take a look at
Research paper progress
Since projects are open-ended, you need to conform to these deadlines to make
sure you will be able to finish it on time.
- Week 2: Initial half-page description of interest area.
- Week 4: Specifics of the topic to be researched, with a list of
- Week 5: Initial detailed report on the state-of the art in the
and outline of initial results.
- Week 8: Updated report on results obtained.
- Week 10: Final version of project report due.
Please look at the project
description for details on the
The bulk of the readings are expected to be from the research literature. A
list of readings from the literature will be made available. No textbook is
specifically required, but the following books are likely to be useful:
``Database Management Systems'', R. Ramakrishnan and J. Gehrke, McGraw Hill
``Fundamentals of Database Systems'', R. Elmasri and S. Navathe, Pearson
Some conferences with papers of relevance to this class.
Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching.
SIGMOD Conference 1984: 47-57, R-tree.pdf
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An
Efficient and Robust Access Method For Points and Rectangles. SIGMOD Conference
The Grid File:
J. Nievergelt, H. Hinterberger, K.C. Sevcik. The Grid File: An Adaptable,
Symmetric Multikey File Structure. ACM Trans. Database Syst. 9(1): 38-71
see also this summary.
Space Filling Curves:
H.V. Jagadish. Linear clustering of objects with multiple attributes. SIGMOD
Conference 1990, hilbert-curve.pdf
Atinder's slides on R-Trees: rtree-slides
Here are slides on R-Trees, grid-file and space filling curves from G.
You can find a framework (implemented by Marios Hadjieleftheriou) to create
spatial indices here.
Leonard D. Shapiro: Join Processing in Database Systems with Large Main
Memories. TODS 11(3): 239-264, join.pdf
Donghui's slides on join processing: join-slides
T. Brinkhoff, H-P Kriegel, B. Seeger: Efficient Processing of Spatial Joins
using R-trees. Proc. SIGMOD, 1993, r-tree-join.pdf
Ming-Ling Lo, Chinya V. Ravishankar: Spatial Joins using Seeded Trees.
SIGMOD Conference 1994: 209-220, seeded.trees.pdf
Ming-Ling Lo, Chinya V. Ravishankar: Spatial Hash-Joins. SIGMOD Conference
1996: 247-258, shj.pdf
Nick Koudas, Kenneth C. Sevcik: Size Separation Spatial Join. SIGMOD
Conference 1997: 324-335, ssj.pdf
Donghui's slides on spatial joins: spatial-join-slides
Ravi's slides on seeded-tree joins: seeded-trees-join slides
N. Roussopoulos, S. Kelley, F. Vincent: Nearest Neighbor Queries. SIGMOD
Conference 1995: 71-79, roussopoulosNN95.pdf
G.R. Hjaltason, H. Samet: Ranking in Spatial Databases. SSD 1995: 83-95,
NN slides from G. Kollios: slides1 and from Y. Tao: slides2
Stephan Börzsönyi, Donald Kossmann, Konrad Stocker: The Skyline Operator. ICDE
2001: 421-430, skyline-operator.pdf
Jan Chomicki, Parke Godfrey, Jarek Gryz, Dongming Liang: Skyline with
Presorting. ICDE 2003:717-719, skyline-presorting.pdf
Dimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger: An Optimal and
Progressive Algorithm for Skyline Queries. SIGMOD Conference 2003:
Skyline slides from Y. Tao: skyline slides
Data Intensive Applications
Dean, J. and Ghemawat, S. 2008. MapReduce: simplified data processing
on large clusters. Commun. ACM 51, 1 (Jan. 2008), 107-113,
The map-reduce slides from Cloudera.
Aggregation for Data Intensive Applications:
Jian Wen, Vinayak R. Borkar, Michael J. Carey, Vassilis J. Tsotras:
Revisiting Aggregation for Data Intensive Applications: A Performance
Study. CoRR abs/1311.0059 (2013), aggregation.pdf
Here are the slides on aggregation, aggregation-slides
R. Fagin. "Combining fuzzy information: an overview." SIGMOD
Record, Vol 31,No 2, June 2002, pp. 109-118, fagin-sigrec02.pdf
Here are the Top-k slides
Temporal Databases And Indexing
Slides on Temporal DBs and Indexing: temporal databases, snapshot
B. Salzberg and V.J. Tsotras: Comparison of Access Methods for
Time-Evolving Data. ACM Comput. Surv. 31(2): 158-221 (1999),
V.J. Tsotras, N. Kangerlaris: The Snapshot Index: An I/O-optimal
access method for timeslice queries. Inf. Syst. 20(3): 237-260
B. Becker, S. Gschwind, T. Ohler, B. Seeger, P. Widmayer: An
Asymptotically Optimal Multiversion B-Tree. VLDB J. 5(4): 264-275
Data Outsourcing and Security
H. Hacigümüs, B. Iyer, C. Li, and S. Mehrotra. Executing
SQL over encrypted data in the database-service-provider
model. In Proc. ACM SIGMOD, pages 216-227, 2002.
B. Hore, S. Mehrotra, M. Canim, and M. Kantarcioglu.
Secure multidimensional range queries over outsourced data.
The VLDB Journal, pages 1-26, 2011.
Jonathan L. Dautrich and Chinya V. Ravishankar, ``Compromising Privacy in
Precise Query Protocols'', Proc. of the 16th International Conference on
Extending Database Technology (EDBT 2013), Genoa, Italy, March 2013.
Peng Wang and Chinya V. Ravishankar, ``Secure and Efficient Range Queries on
Outsourced Databases Using #-trees'', Proc. 29th International Conference on
Data Engineering (ICDE 2013), Brisbane, Australia, April 2013.