Title: Copy of 35
1a presentation by W H Inmon
2no wholesale movement of data into the data
warehouse reading directly from the
native operational dbms
3movement of incremental changes of data into
the edw in a snapshot format
4using the log tape to find delta data then move it
5the log tape processing can be run on a processor
separate from the oltp processor
6there is no impact on the online window
7another approach is to capture changes as they
occur through the BMC approach
8another opportunity for a performance gain is
to move the data during physical I/O operations
9note that etl processing still needs to be done
after physical disk movement
10make sure that ETL supports -
- record level processing - parallel streaming -
mainframe or server based transformation -
robust transformation - native dbms record
selection - incremental selection of oltp data
11dbms selection throughout the architecture
is very important
12it is HIGHLY unlikely that any one dbms will
be optimal for all processing
13project/ad hoc warehouses
project or temporary warehouses can save a lot of
development and analytical effort
14exploration
project
sample
using sampling techniques for initial
analysis can save HUGE resources
15iteration 1 iteration 2 iteration 3 .
final analysis
doing iterative analysis against samples of data
then doing the final analysis against the large
data base can save LOTS of resources
16end user education can save huge amounts of
resources and is simple to do
17metadata can help performance
if an analyst knows what has already
been created, there is no need to recreate it
18but how does an analyst know what has
already been created?
an analyst knows through looking at metadata..
19do not use referential integrity
infrastructure that was designed for the
operational environment
20if relationships are defined, enforce
through audit programs
21monthly
yearly
weekly
hourly
daily
a rolling summary structure of data can save huge
resources
22condensation of data maximizes on I/O and buffer
hits
23the physical colocation of data can
optimize performance
24perform inserts and deletes during off hours
25using 3rd party utilities for standard data
base operations - - back up - recovery -
indexing - relationship management