Data Warehousing from the Web - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Data Warehousing from the Web

Description:

Warehouse projects provide students with robust capstone experiences and produce ... Data warehousing projects yield many pedagogical benefits. ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 2
Provided by: justi91
Category:

less

Transcript and Presenter's Notes

Title: Data Warehousing from the Web


1
Data Warehousing from the Web
Chris Fernandes (cfernand_at_union.edu) and Michael
Whalen (whalenm_at_vu.union.edu)
Department of Computer Science, Union College,
Schenectady, NY, 12308
Summary
Procedure
Results
Warehouse projects provide students with robust
capstone experiences and produce interesting
results.
Meeting scheduler
Course analysis
Introduction
Enrollment data
Room availability
Data warehousing is the process of collecting
information from various repositories and
combining it into a single structured repository
that can be queried for new information such as
performance trends. Many Internet web sites
contain useful but unstructured data, thus making
them ideal for student projects related to data
warehouses. We describe one such project,
developed from the registration web pages at
Union College, which allows faculty and students
to get on-line access to course enrollment
trends, classroom availability, and other
pertinent information. The results of this
project were so successful in the type of
information that could be obtained that the
administration became concerned about student
privacy issues, allowing the student to extend
his work into the area of warehouse security.
HTML data is automatically parsed nightly for
content and transferred to the warehouse backend,
called SCOUR (Search Contents Of Unions Registry)
Query results can be displayed in a variety of
formats, including histograms and importing
results to a spreadsheet.
Conclusions
  • Data warehousing projects yield many pedagogical
    benefits. They allow students to build bridges
    between many areas of computer science including
  • Projects can be diverse. Raw data abounds on the
    Web in many fields.
  • Projects can have flexible scope to meet time
    constraints. One can easily extend a warehouse
    with security to restrict access to sensitive
    queries.

Unlike traditional databases, the SCOUR warehouse
contains historical and summarized data for use
in statistical queries
  • database and data warehouse theory
  • GUI design
  • security and authorization
  • interface usability
  • privacy and ethics

UNION registrar web pages contain semistructured
data
Dynamic web-based front ends were created for a
variety of queries. It was essential to maintain
ease of use by non-technical operators.
Write a Comment
User Comments (0)
About PowerShow.com