Title: Creating the Warehouse Environment
1Chapter 2
- Creating the Warehouse Environment
2Section 2.1
- Supporting the Warehouse Environment
3Objectives
- Peruse the physical directory structures that
support the Warehouse Environment. - Define Warehouse Environment metadata items.
4Business Scenario
- The structure of the metadata reflects the
business process(es) being modeled.
5Platform Considerations
- SAS/Warehouse Administrator software executes on
Windows or UNIX platforms. - Data structures are hierarchical and may require
multiple levels in directory structures.
6Platform Considerations
- The Sterling Airlines case study uses a
three-level directory structure. - DW root level
- Support level
- Data Warehouse Environment level.
7Portability Considerations
- The multi-level directory structure aides in
porting the data warehouse from development to
production. - Relative path specifications reduce editing of
the metadata when switching from development to
production.
8Infrastructure of a Data Warehouse
- Directory structures are used to store
- metadata
- data tables
- specialty data stores
- source code libraries.
- Most directories contain SAS files and are
accessed as SAS libraries.
9Metadata Repository
Data Warehouse Metadata
Warehouse Environment Metadata
10Data Tables
DW Foundation Layer Data Tables
Source Data Tables
11Specialty Data Stores
Information Mart Items and Files
Summary Tables and MDDBs
12Addin Tools, Scripts, and Process Entries
Control Scripts
Source Code Library
13Navigating the Data Warehouse Environment
- SAS/Warehouse Administrator has two primary user
interfaces - Data Warehouse Environment (referred to as WA
Explorer window)- display and edit metadata for
warehouse elements - Process Editor- diagram and edit metadata for
warehouse processes.
14Navigating the Data Warehouse Environment
- Start an interactive SAS session.
- Invoke the Data Warehouse application.
- Open the Sterling Airline Warehouse Environment.
15Creating a Warehouse Environment
16Creating the Environment Metadata Repository
- The metadata that you enter in the Environment
Property window is used to - 1. generate a LIBNAME statement for the
environment - 2. assign a name and description to the
environment. - After entering metadata in this window, verify
your field entries before you select OK. After
selecting OK, you CANNOT change these values,
except for the description.
17Getting Started
- Step 1 Create a subdirectory structure.
- Step 2 Start a SAS interactive session.
- Step 3 Invoke the SAS/WA software application.
- Step 4 Create a Warehouse Environment object on
the SAS/WA desktop.
18Creating a Warehouse Environment
- Start an interactive SAS session.
- Invoke the Data Warehouse application.
- Define a new Data Warehouse Environment.
19Section 2.2
20Environment Global Metadata
- There are five types of global metadata
- SAS libraries
- hosts
- DBMS connections
- contacts
- scheduling servers.
21Global Metadata in a Data Warehouse Environment
- Global information is shared among metadata
objects in the warehouse environment. - Data object locations are defined with hosts, SAS
libraries, and DBMS connections - All objects have contacts, that is, an owner and
an administrator. - Jobs execute on hosts and may be executed by
scheduling servers.
22Global Items for the Case Study
- 1. On which hosts
- does the source data reside?
- will the warehouse tables be stored?
- Are connections to a DBMS needed?
- Are SAS librefs needed to reference the
- source data?
- warehouse tables?
- source code?
- metadata?
continued...
23Global Items for the Case Study
- 4. Who are the contacts (owner and administrator)
associated with the various data stores and
warehouse processes? - 5. What scheduling servers will be used to
schedule Warehouse Administrator jobs? (For the
case study, this item is deferred to a later
time.)
24Define Items Used Globally Window
- To open this window, select File ? Setup from the
pull-down menu. - Select the type of item to be defined.
- Select Add or Edit to define or modify a
definition. Select Remove to delete a definition.
- Global metadata can also be updated by accessing
its property sheet from any attribute selector
that requires the metadata.
25Add Global Metadata
- This demonstration illustrates the use of the
Define Items Used Globally window and its
property sheets. - It also explains the attributes of each type of
global item.
26Exercises
These exercises reinforce the concepts discussed
previously.
27The Next Step
- The environment with global metadata is now
ready to supporta data warehouse. - Next, you add the operational data definitions
(source data).
Environment
Global Metadata
28Section 2.3
- Defining theOperational Data Sources
29Objectives
- Define Operational Data Definition Groups to
organize Operational Data Definitions. - Define Operational Data Definitions.
- Use SAS/Warehouse Administrator software to add
and modify metadata attributes.
30ODD Groups
Environment
- Create the grouping elements to contain the
Operational Data Definitions.
Global metadata
ODD Group
ODD Group
ODD Group
ODD Group
31ODD Groups
- One or more Operational Data Definition (ODD)
Groups are contained within a Warehouse
Environment. - An Operational Data Definition Group organizes
Operational Data Definitions (data sources). - ODD Groups may also contain Information Marts.
32The Process
- Step 1 Add an Operational Data Definition Group
to the warehouse environment. - STEP 2 Complete the ODD Group Properties window.
33 Creating an ODD Group
- Add an initial ODD group for data sources
maintained by the Operations Department. - In the process, work with two interfaces
- the Metadata Explorer
- the property sheet editor
34Current Status
Environment
- The Environment contains
- global metadata
- an ODD group.
global metadata
ODD group
35Exercises
- Add three additional ODD group
- Customer Relations
- IT Technology
- Reservations Department
36Current Status
Environment
- The environment contains
- global metadata
- four ODD groups.
- Next, define the source data.
Global metadata
ODD Group
ODD Group
ODD Group
ODD Group
37Data Sources for Data Warehouses within the
Warehouse Environment
- Operations ODD Group
- Flight Information ODD (Oracle)
- Actual Flight Times ODD (OSHost)
- Customer Relations ODD Group
- Passenger Delay Compensation ODD (OSHost)
- IT ODD Group
- Mileage ODD (Local Host)
- Reservations ODD Group
- Holiday Calendar ODD (Local Host)
- Flight Schedule ODD (OSHost)
38Operational Data Definition (ODD)
- An Operational Data Definition identifies a data
source. - a SAS data file, or
- one of several SAS views.
- The first step is to locate or create the SAS
data file or SAS view.
A DBMS table that is accessed via a SAS/ACCESS
libname engine is considered a SAS data file.
39Adding an ODD Metadata Object
- In the upcoming demo, we
- add an ODD to an ODD Group
- complete the properties of the ODD.
40ODD Properties
41Importing Column Metadata
- Column attributes can be imported
- existing table or view
- DDL file created with data modeling software.
- Cobol file description
- PROC CONTENTS output data set
- and they can be
- specified manually.
42Add an Operational Data Definition
- Create an ODD for the actual flight times data
source owned by the Operations Department. - Import column metadata
43Current Status
Environment
- The first Operational Data Definition has been
added. - One data source is now fully defined in the
metadata.
Global metadata
Operations ODD Group
Actual Flight Times
44Exercises
Add four additional ODDs.
- Operations ODD Group
- Flight Information ODD
- Actual Flight Times ODD (Added in Demo)
- Customer Relations ODD Group
- Passenger Delay Compensation ODD
- IT ODD Group
- Mileage ODD
- Reservations ODD Group
- Flight Schedule ODD
45Current Status
Environment
- Four ODD groups are defined.
- Five ODDs each reference a data source.
- One additional data source is an external file.
Global metadata
ODD Group
ODD Group
ODD Group
ODD Group
46Section 2.4
47Objectives
- Name examples of ODD input sources.
- Define an external file as an input source.
48Introduction
- Input sources can be
- one or more external files
- one or more data files (SAS data files)
- a combination of data and external files.
49ODD Input Source
- When an ODD is
- a DATA step view with a fileref on the INFILE
statement, or - an SQL view with a libref on the FROM clause
- the fileref or libref must be defined before the
view can be used. - This does not include pass-through SQL views or
SQL views with an embedded libname.
50External File Input Source
- data cargo/viewcargo infile freight
input _at_1 id 6. _at_9 weight 9.2 run - In this example, the cargo DATA step view
represents an ODD. Metadata must be defined for
the Warehouse Administrator to assign the freight
fileref.
51SAS File Input Source
- proc sql create view planes as
select manuf, name, model from
sterling.aircraft quit - In this example, the planes SQL view represents
an ODD. Metadata must be defined for the
Warehouse Administrator to assign the sterling
libref.
52Add an ODD to Extract Data from an External File
- 1. Analyze the columns in the input data to be
included. - 2. Determine the host and library for the ODD
and the user-written code to generate the Data
step view. - 3. Create the code to generate the SAS view or
table. Methods included the DATA step, SQL
procedure, Query window, External File
Interface, SAS/ASSIST, SAS/TOOLKIT software.
53The External File DateCodes.dat
- Record Layout
- columns 1-10 Date
- column 12 Holiday (Y/N)
- column 13 Hflag
- HHoliday
- WWeekend
- Aday after holiday/weekend
- column 14 PriceWar (Y/N)
- column 15 BlackOut (Y/N)
54DATA Step to Create a SAS View
Fileref DATES
Input 3 fields
55 Add an ODD to Extract Data from an External File
- Add the Holiday Calendar ODD to the Reservations
ODD group. - Use the Process Editor to create a view of the
external file.
56Current Status
Environment
- Four ODD Groups have been added.
- Six data sources are now defined as ODDs in the
metadata.
Global metadata
ODD Group
ODD Group
ODD Group
ODD Group