CS345 - Topics in Data Warehousing
Autumn 2004

Project Overview
Each student is responsible for completing a course project. The goal of the project is to allow students to select an aspect of the course material that they are particularly interested in and explore it in greater depth. The project can be any of three types:
  1. Research project

    This type of project attempts to explore new approaches to a data warehousing problem whose solution is not yet well understood. There are two types of research projects you might undertake:

    1. Propose an innovative new approach to a research problem, and study the strengths and weaknesses of the proposed approach via an experimental and/or analytical comparison with alternative approaches.
    2. Select a technique from the research literature and perform a careful experimental evaluation of its effectiveness. The evaluation should go beyond what was done by the researchers who proposed the technique (e.g., varying some parameters that were held constant in previous evaluations, or extending the scope and scale of the evaluation). Alternatively, compare multiple competing techniques to understand the advantages and disadvantages of each in various circumstances.

  2. Survey of research literature

    This type of project involves picking an area of active data warehousing research, reading several recent research papers in that area, and writing a report summarizing the papers, comparing and contrasting the techniques used, and critiquing the results.

    Some possible research areas to survey include:

    • Approximate Record Matching and Data Cleaning
    • Selection of Indexes and Materialized Views
    • Data Provenance / Data Lineage
    • Indexing and Querying of Geospatial Data
    • Integration of Text and Relational Data
    • Specialized Indexing Techniques for Data Warehouses
    • Incremental Maintenance of Materialized Views
    • Data Visualization

    Here are some rough guidelines for an appropriate scope:
    Number of research papers to be read closely and discussed in depth: 2-4
    Number of additional research papers to be skimmed and briefly summarized (to provide context and background for the research problem being studied): 2-4
    Length of the report: 10-20 pages

  3. Programming project

    This type of project involves a significant programming effort and produces an interesting software artifact. Here are some examples of possible programming projects:

    • Implement a graphical extraction tool to help automate the data warehouse load process. The tool should let users select data sources, specify the mapping between source system tables and data warehouse tables, and perform simple transformations such as renaming columns or concatenating two columns. One the mapping from source systems to data warehouse staging tables has been set up, the tool should allow the user to perform the load.
    • Build a GUI that allows data analysts to conveniently access a data warehouse. The interface should allow the user to easily specify an aggregation query by selecting the grouping columns and measures to be aggregated, and it should present the results in a user-friendly manner (either in report form or using some sort of data visualization technique).
    • Extend an existing database system (such as RedBase, Berkeley DB, PostgreSQL, MySql, etc.) to add a query processing technique such as bitmap indexes, or join indexes, or star queries via semi-joins.
    • Build a advisory tool that analyzes a query workload and recommends a good set of aggregate tables to construct, based on the data characteristics and the given workload. The tool should report its recommendations to the user, and then construct the tables if the user so chooses.