CS345 - Topics in Data Warehousing
Autumn 2004

  • [09/15/2004] CS 345 will be offered in Autumn Quarter 2004. The topic will be Topics in Data Warehousing.
  • [10/05/2004] A list of online resources has been added (see the "Resources" link). This includes optional supplemental readings such as research papers and web sites, arranged by topic.
  • [10/26/2004] The Resources page has been updated with links to query processing resources, and the Assignments page has been updated with support information for Assignment #2.
  • [11/15/2004] The due date for Assignment #3 has been extended to Friday, November 19. Late assignments will be accepted without penalty until Monday, November 22, 11:59 PM. The Assignments page has been updated with support information for Assignment #3.
  • [11/23/2004] Assignment #3 turned out to be significantly more difficult than intended. As a result, everyone who made a good-faith effort to complete the assignment will receive full credit, even if they weren't entirely successful. For Assignment #4, you will have the option of either finishing Assignment #3, or else doing an alternate assignment. Thus if you have already successfully completed Assignment #3, you have nothing to do for Assignment #4.
  • [12/06/2004] The final exam will be held on Wednesday, December 8, from 7:00 PM - 10:00 PM, in the regular classroom (Building 200, Room 200-034).
Time and Place
The course will meet Tuesdays and Thursdays 1:15pm-2:30pm in Building 200, Room 200-034.

Building 200 is the Lane History Corner in the Main Quad.

Course Abstract

In the modern world, information technology is ubiquitous, resulting in the collection of massive quantities of data. Many organizations hope that, through the analysis of the data they have collected, they can gain new insights that will allow them to be more effective in accomplishing their goals. However, the analysis of large data sets poses many challenges: How can data from disparate sources be organized in a coherent manner? How can datasets that are many gigabytes in size be efficiently queried? What are the right questions to ask about the data in order to yield useful insights?

The goal of this course is to introduce students to the challenges involved in managing and querying large datasets. Students will be exposed to various approaches to addressing these challenges, from the complementary viewpoints of academic research and industrial practice.

Topics to be covered include: differences between transaction processing and data analysis applications; the process of data warehouse design and implementation; star schemas and dimensional modeling; query processing strategies for data analysis queries; physical database design; database tuning principles; and data mining.