Title: INTRODUCTION TO THE PENN STATE DATA WAREHOUSE
1INTRODUCTION TO THE PENN STATE DATA WAREHOUSE
2Where Have We Been?
- Transactions processed on the mainframe -- ISIS,
IBIS, ADIS - Many years of historical data, millions of
mainframe records - Accessing mainframe data requires knowledge and
skills in Natural programming language
3Steps Required to get Data from the Mainframe
- Security forms to get access to files
- Complete form. Specify purpose of request, what
data is required. Signed by Access and Security
Representative (ASR). Must be approved by all
data stewards (may be several).
4Steps Required to get Data from the Mainframe
- Security forms to get access to files
- Write a Natural program
- Natural is a complex language, requires
programming skills and extensive training. File
structures are complex.
5Steps Required to get Data from the Mainframe
- Security forms to get access to files
- Write a Natural program
- Write the JCL
- Use Roscoe system to write Job Control Language.
6Steps Required to get Data from the Mainframe
- Security forms to get access to files
- Write a Natural program
- Write the JCL
- Submit the job
- Goes into a queue with other similar jobs first
in, first out.
7Steps Required to get Data from the Mainframe
- Security forms to get access to files
- Write a Natural program
- Write the JCL
- Submit the job
- Wait at least overnight
- Jobs in queue run only at night. Queues can be
quite long. Some jobs will wait several days in
the queue until they reach the top.
8Steps Required to get Data from the Mainframe
- Security forms to get access to files
- Write a Natural program
- Write the JCL
- Submit the job
- Wait at least overnight
- Job failed? Repeat steps
- Check the status. If anything failed, fix
problems and repeat the steps.
9Steps Required to get Data from the Mainframe
- Security forms to get access to files
- Write a Natural program
- Write the JCL
- Submit the job
- Wait at least overnight
- Job failed? Repeat steps
- Days or weeks to complete
- Experienced Natural programmer writing a simple
job will take several days of coding, testing,
running. Anything complicated can take longer.
10Solution to a problem -- AIDA
- Administrative Information Decision Aid (AIDA)
- Available for ISIS (registration, admissions,
enrollment) and IBIS (human resources) data - Run on the mainframe against extract (i.e. not
live data) files - Create reports or data files for downloading
-
11Solution to a problem -- AIDA
- AIDAs were developed to fulfill ad-hoc reporting
needs - ADVANTAGES
DISADVANTAGES - No programming required
Inflexible - Easy to use
Outdated technology - Available to everyone, Long
development time - even from terminals for a
new AIDA - Eliminated a lot of ad-hoc
Difficult to change or to - programming add
new features
12What is a Data Warehouse?
- The consolidation of data from mainframe legacy
systems into subject-oriented tables that are
accessible through desktop tools
13What is a Data Warehouse?
- Move the data from the mainframe to a server
where it can be accessed from the users PC - Snapshot data, NOT live
- Use for data analysis, NOT operational
Data extracted periodically. Changes on the
mainframe may not be reflected on the warehouse
for a week or more.
Transcripts are on the warehouse, but official
transcripts are only available through ISIS.
14AIS Three-Tier Data Structure
Enterprise Information System extracts data from
the warehouse and summarizes it
EIS summary data
On-line transactions update the mainframe
systems immediately
Data Warehouse extracts detail data from the
mainframe on a periodic schedule
Data Warehouse detail data extracted periodically
ISIS, IBIS, ADIS
detail transactions processed immediately
15Data Transformation
- Data goes through a series of steps as
- it is moved to the warehouse
- Extract programs
- Write Natural programs to extract data from the
mainframe data base
16Data Transformation
- Data goes through a series of steps as
- it is moved to the warehouse
- Extract programs
- Verify data
- Verify accuracy and consistency of data --
ensure data legibility
17Data Transformation
- Data goes through a series of steps as
- it is moved to the warehouse
- Extract programs
- Verify data
- Create tables
- Create normalized tables on the warehouse --
eliminate data redundancy (i.e. address appears
in one place only)
18Data Transformation
- Data goes through a series of steps as
- it is moved to the warehouse
- Extract programs
- Verify data
- Create tables
- Load tables
- Load warehouse tables with extracted data
19Data Transformation
- Data goes through a series of steps as it is
moved - to the warehouse
- Extract programs
- Verify data
- Create tables
- Load tables
- Refresh data
- Establish a schedule to refresh the data.
Frequency depends on volatility of the data.
Some refreshed weekly, some once per semester
20How is the Warehouse Accessed?
Query Tool
to retrieve data
Tools are off-the-shelf software that run on the
desktop. Users can purchase whatever package
they want based on platform, price, preference.
End Users
Data Warehouse
21Features of Query Tools
- Easy access to data
- Programming ability not required. Tools have
Graphical User Interface -- point and click.
22Features of Query Tools
- Easy access to data
- Quick results
- Results are returned quickly, often within
minutes.
23Features of Query Tools
- Easy access to data
- Quick results
- Interactive approach to creating reports
- Query criteria can be easily and quickly
adjusted to modify results or obtain additional
data
24Features of Query Tools
- Easy access to data
- Quick results
- Interactive approach to creating reports
- Data available on the users desktop
- Results returned directly to desktop
25Features of Query Tools
- Easy access to data
- Quick results
- Interactive approach to creating reports
- Data available on the users desktop
- Manipulation of data for customized report layout
- Data can be exported to other desktop software
for formatting and analysis
26Features of Query Tools
- Easy access to data
- Quick results
- Interactive approach to creating reports
- Data available on the users desktop
- Manipulation of data for customized report layout
- Unlimited retrieval of data
- Amount of data retrieved limited only by query
tool and size of PC hard drive
27Query Tool Requirements
- To access Penn States data warehouse, you can
purchase any query tool that satisfies the
following two criteria - Uses Structured Query Language (SQL)
- Supports Open Data Base Connectivity (ODBC)
28Readily Available Query Tools
- Microsoft Access
- Microsoft Excel
- Microsoft Query
If you have Microsoft Office, you already have
three query tools on your desktop that will
work with Penn States data warehouse.
29Who Can Access the Warehouse?
- Users in departments, colleges, campuses
- Hundreds of Penn State data warehouse users in
departments, colleges, campuses.
30Who Can Access the Warehouse?
- Users in departments, colleges, campuses
- Two requirements for warehouse users
- Understand the data
- This understanding is critical and not simple to
acquire. If youve used AIDA or the mainframe
systems, you already have some of this knowledge.
Takes time and training.
31Who Can Access the Warehouse?
- Users in departments, colleges, campuses
- Two requirements for warehouse users
- Understand the data
- Familiar with query tools
- Hands-on training available
32How to Get Connected
- A PC or Macintosh connected to the Penn State
- Backbone with TCP/IP communications software
- SQL Client License or MS BackOffice license
- (Purchased from the Microcomputer Order
Center) - For Windows, SQL Client Tools (provided by the
- MOC when the SQL Client License is
purchased) - must be installed on the PC. Not required
for Macintosh. - Open Data Base Connectivity (ODBC) compliant
- query tool
33 First Steps for New Users
- Be authorized as a warehouse user
- Request access for each database separately.
Requests submitted via e-mail. Instructions on
the web site.
34 First Steps for New Users
- Be authorized as a warehouse user
- Set up your workstation
- Purchase client license. Install client tools
for Windows. Instructions on the web site.
35 First Steps for New Users
- Be authorized as a warehouse user
- Set up your workstation
- Select a query tool
- ODBC-compliant. Install on your PC.
36 First Steps for New Users
- Be authorized as a warehouse user
- Set up your workstation
- Select a query tool
- Get training
- Training offered through HRDC and TLT hands-on
and web-based
37 First Steps for New Users
- Be authorized as a warehouse user
- Set up your workstation
- Select a query tool
- Get training
- Learn the data
- Work with it documentation on the web site
help available from steward offices.
38 First Steps for New Users
- Be authorized as a warehouse user
- Set up your workstation
- Select a query tool
- Get training
- Learn the data
- Get help
- User groups, listservs, web site
39More Detailed Information ... on our Web Site
http//ais.its.psu.edu/dataware/
Web site contains data documentation, instructions
for setting up your PC to access the warehouse,
sample queries, schedules for training and user
group meetings.
40Benefits of the Warehouse
- Improves access to administrative information for
faculty and staff - Can get data quickly and easily to do analysis.
We can work with better information, make
decisions based on data.
41Benefits of the Warehouse
- Improves access to administrative information for
faculty and staff - Reduces cost
- Mainframe can be used for transaction
processing. Programmers can use skills to
enhance the mainframe systems instead of writing
ad-hoc reports.
42Benefits of the Warehouse
- Improves access to administrative information for
faculty and staff - Reduces cost
- Timely retrieval of data
- Can retrieve data in minutes rather than days.
43Benefits of the Warehouse
- Improves access to administrative information for
faculty and staff - Reduces cost
- Timely retrieval of data
- Flexibility
- Data can be retrieved with limitless
combinations of criteria and can be exported into
other desktop tools for analysis and manipulation.
44What Does the Future Hold?
- More data will be added to the warehouse
- Mainframe will primarily be used for transaction
processing