Textbook: Learning Spark Lightning-Fast Data Analytics (2nd Edition) by Jules S. Damji, Brooke Wenig, Tathagata Das & Denny Lee.
CS 167 covers the data management and systems aspects of big data platforms such as Hadoop, Spark, and AsterixDB. In this course, you will learn how the data is stored in a distributed file system and how the queries run in parallel. The course will cover the following topics.
- An overview of big data management systems
- Distributed big-data storage
- Programming models in big data (e.g., MapReduce and RDD)
- Column-based storage and analytics on big data
- Big spatial data
- Document Databases
- Machine learning on big data
- Big-data Visualization
- (10%) Active class participation (Quizzes and activities)
- (15%) Assignments
- (30%) Labs
- (15%) Mid-term 1
- (15%) Mid-term 2
- (15%) Mid-term 3