Each student is responsible for completing a course project. The goal
of the project is to allow students to select an aspect of the
course material that they are particularly interested in and explore
it in greater depth. The project can be any of three types:
-
Research project
This type of project attempts to explore new approaches to a data
warehousing problem whose solution is not yet well understood. There
are two types of research projects you might undertake:
- Propose an innovative new approach to a research problem, and
study the strengths and weaknesses of the proposed approach via an
experimental and/or analytical comparison with alternative approaches.
- Select a technique from the research literature and perform a
careful experimental evaluation of its effectiveness. The evaluation
should go beyond what was done by the researchers who proposed the
technique (e.g., varying some parameters that were held constant in
previous evaluations, or extending the scope and scale of the evaluation).
Alternatively, compare multiple competing techniques to understand the
advantages and disadvantages of each in various circumstances.
-
Survey of research literature
This type of project involves picking an area of active data
warehousing research, reading several recent research papers in that
area, and writing a report summarizing the papers, comparing and
contrasting the techniques used, and critiquing the results.
Some possible research areas to survey include:
-
Approximate Record Matching and Data Cleaning
-
Selection of Indexes and Materialized Views
-
Data Provenance / Data Lineage
-
Indexing and Querying of Geospatial Data
-
Integration of Text and Relational Data
-
Specialized Indexing Techniques for Data Warehouses
-
Incremental Maintenance of Materialized Views
-
Data Visualization
Here are some rough guidelines for an appropriate scope:
Number of research papers to be read closely and discussed in depth:
2-4
Number of additional research papers to be skimmed and briefly
summarized (to provide context and background for the research
problem being studied): 2-4
Length of the report: 10-20 pages
-
Programming project
This type of project involves a significant programming effort and
produces an interesting software artifact. Here are some examples of
possible programming projects:
-
Implement a graphical extraction tool to help automate the data warehouse
load process. The tool should let users select data sources,
specify the mapping between source system tables and data warehouse
tables, and perform simple transformations such as renaming columns
or concatenating two columns. One the mapping from source systems
to data warehouse staging tables has been set up, the tool should
allow the user to perform the load.
-
Build a GUI that allows data analysts to conveniently access a data
warehouse. The interface should allow the user to easily specify
an aggregation query by selecting the grouping columns and measures
to be aggregated, and it should present the results in a
user-friendly manner (either in report form or using some sort of
data visualization technique).
-
Extend an existing database system (such as RedBase, Berkeley DB,
PostgreSQL, MySql, etc.) to add a query processing technique such as
bitmap indexes, or join indexes, or star queries via semi-joins.
-
Build a advisory tool that analyzes a query workload and recommends a good
set of aggregate tables to construct, based on the data characteristics and
the given workload. The tool should report its recommendations to the
user, and then construct the tables if the user so chooses.
|