Title: Trends In Data Warehousing
1Chapter 3
Data Warehouse Fundamentals
- Trends In Data Warehousing
Paul K Chen
1
2Data Warehousing is Becoming Mainstream
- In the early stages, four significant factors
drove many - companies to move into data warehousing
- Fierce competition
- Government deregulation
- Need to revamp internal processes
- Imperative for customized marketing
3Walmart vs. Amazon.com
- Walmart is the US company most quoted for the
successful application deployment of Data
Warehousing technology. - Walmart filed lawsuit against Amazon.com for its
unlawful way of pirating its DW technology by
hiring away its DA personnel by offering hefty
stock option to these people.
4Significant Factors
- These significant factors reflect the new trends
in data - warehousing
- Multiple Data Types
- Data Visualization
- Parallel Processing
- Query Tools
- Browser Tools
- Data Fusion
- Multidimensional Analysis
- Agent Technology
- E-Business- ERP, KM, CRM
5Decision Making and Data Warehousing
- A data warehouse is the data, processes, tools,
and facilities to manage and deliver complete,
timely, accurate, and understandable business
information to authorized individuals for
effective decision making.
- Structured Data
- Includes traditional relational databases
- Typically internal and enterprise-owned
- Predetermined
- Unstructured Data
- Includes articles, reports, images, and videos
- Utilizes external data and expert opinion
- Ad hoc
3
6Decision Making and Data Warehousing
- Management Systems
- Extend relational databases to store and support
multimedia - User-defined types (UDT) and functions (UDF) in
SQL-3 - Specialized Servers
- Used for data which is incompatible with
relational databases (e.g., Streaming video
servers) - Objects may be linked to a relational database
- Search Engines
- Query by Image Content (shape, color, texture,
etc) - Text retrieval on free-text documents
- Audio and video searching
7 Decision Making and Data Warehousing
- The trend is toward unstructured data and ad
hoc warehouses.
Trend toward multimedia.
4
8Types of Decision Support Tools
- Data Inquiry
- A request for a set of data based on some search
criteria - Data Interpretation
- Manipulation and visualization of a set of data
(statistical analysis) - Multidimensional Analysis (OLAP)
- n-dimensional spreadsheet analysis
- Information Discovery
- Pattern recognition, trends
- Browsers
- Search metadata catalogs
- Search information object lists
- Launch analysis tools
5
9File-based Processing
6
10Types of Decision Support Tools
- Trend toward utilization of the Web,
facilitated by Java.
7
11Data Warehouse Architectures
- Single Level
- Decision support tools access operational data
directly - Feasible only with clean data
- Valid for unstructured data
- Two Level Reconciled
- Scrubbed operational data supporting ad hoc
queries - Two Level Derived
- Summarized data
- Three Level
- Maintains both scrubbed operational data, and
summarized data.
10
12Data Warehouse Architectures
- Trend toward multidimensional data.
11
13Data Stores and Access Enablers
- Specialized Multidimensional Databases
- Data is peregrinated and loaded into
multidimensional databases - Long loading times but quick response
- Relational-like Stores
- Indexing is used to proved pseudo-multidimensional
functionality - Relational Data Stores
- An extra semantic layer generates
multidimensional data on the fly - Hybrids
- Details are stored in a traditional relational
format - A subset is cached in a multidimensional data
structure
12
14Database Management System (DBMS)
13
15Data Stores and Access Enablers
- Trend toward multidimensional data.
14
16Metadata
- Integrated Components
- All components (sources, stores, etc) use a
common metadata repository to maintain their
metadata - Standardized Metadata Interchange
- Components keep their own metadata
- Components use a common interchange information
model and syntax to share metadata - Synchronized Metadata Interchange
- Metadata changes are updated automatically across
all components - Building of Business Metadata
- Manually entered, free-text, plain language
descriptions
17Metadata
- Trend toward better metadata, exchanged between
systems.
18Middleware - Gluing the Warehouse Together
- Definition software that shields users and
developers from differences in services and
resources used by applications - Data warehouses often have heterogeneous
databases, operating systems, networks, hardware,
applications
19Business Issues for Middleware
- Role of middleware
- Assist developer in data extraction/transformation
and populating DW - Assist business user in accessing DW
- Therefore needed at different points in life
cycle - Types
- Copy management data extraction,
transformation, replication, and propagation - Gateways DB and independent gateways
- Program-to program RPCs, TP monitors, ORBs
- Message-oriented
20Data Quality
- Preprocessing Ownership
- Source application owners know their data
- Warehouse owners still must integrate the entire
system - Automated Preprocessing Tools
- Specialized packages
- Generalized tools using pattern processing,
lexical analysis, and statistical matching to
reconcile a wide range of data sources - Custom programming
- Reliability and Credibility of External Data
- Quality ratings
- Posted statistical meta-information (sample size,
randomness, etc)
15
21Data Quality
Trend toward better understanding of data
quality.
16
22Significant Trends- Multiple Data Types
Image
Spatial
Structured Numeric
Video
Data Warehouse Repository
Structured Text
17
Audio
Unstructured Documents
23Significant Trends- Data Visualization
- More Chart Types-Pie chart, scatter plot
- Interactive Visualization
- Chart Manipulation
- Drill Down
24Significant Trends- Parallel Processing
- Aims to solve decision-support problems using
multiple nodes working on the same problem. - Performs many database operations simultaneously,
splitting individual tasks into smaller parts so
that tasks can be spread across multiple
processors. - Parallel DBMSs must be capable of running
parallel queries, parallel data loading, table
scanning, and data archiving, and back up.
25Significant Trends- Parallel Processing
- Shared memory architecture (SMP)
- All the servers share all the data
- Shared nothing architecture (MPP)
- Each server has its own partition of data
26Significant Trends- Query Tools, Browse Tools
- Flexible Presentation online results and report
generator - Aggregate Awareness
- Crossing Subject Areas
- Multiple Heterogeneous Sources
- Integration
- Overcoming SQL Limitations
- Data Fusion
27Significant Trends- Integrating ERP and Data
Warehouse
- Option 1 Companies implement the data warehouse
solutions of the ERP vendor with the currently
available functionality and await the
enhancements. - Option 2 Companies implement customized data
warehouse and use third-party tools to extract
data from the ERP datasets. Retrieving and
loading data from the proprietary ERP datasets
is not easy. - Option 3 It is a hybrid approach that combines
the functionalities provided by the vendors data
warehouse with additional functionalities from
third-party tools.
28Significant Trends- Integrating KM and Data
Warehouse
Whats KM?
- It is a systematic process for capturing,
integrating, organizing, and communicating
knowledge accumulated by employees. - It is a vehicle to share corporate knowledge so
that employees may be more more effective and be
productive in their work. - A knowledge management system must store all such
knowledge in a knowledge repository.
29Significant Trends- Integrating KM and Data
Warehouse
A specific corporate scenario
- Sales have dropped in the South region.
- Your marketing VP is able to discern this from
your data warehouse by running some queries and
doing some preliminary analysis. If he or she has
access to a document prepared by an analyst
explaining why the sales are low and suggesting
remedial action. - Knowledge must be linked to the sales result to
provide context to the sales numbers from the
data warehouse.
30Significant Trends- Integrating KM and Data
Warehouse
An airplane sales scenario The following
information is essential For a successful pitch
for airplane sales.
- Model configuration
- Production schedule (Delivery schedule)
- Part replacement
- Warranty
- Knowledge obtained from the knowledge
management - system can provide context to the information
received from - the data warehouse to understand the story
behind the above - information.
31Summary of Trends
- Ad Hoc Questions
- Multidimensional Analysis (OLAP)
- Web-Enabled Data Warehouse
- Multimedia
- Middleware
- Metadata Interchange
- Integrating ERP with Data Warehouse
- Integrating KM with Data Warehouse
32Complete E-Business Suite A Review
ERP
EAI
Marketing
Sales
Projects
Financial Services
One Database
Order Mgt
Procurement
Human Resources
Customer Relationship(CRM)
Manufacturing
Supply Chain (SCM)
33Information System Categories
34Information System Categories
35Data Warehouse ERP
- ERP Enterprise Resource Planning
- A software solution that addresses
enterprise needs taking the process view of an
organization to meet the - organization goals tightly integrating all
the functions - of an organization.
- -- It integrates all the departments and
functions across - a company into a single computer system
that can serve all those different
departments particular - needs.
36WHY ERP?
- Business
- Customer satisfaction
- Business development new areas, products
and services - Ability to face competition
- Efficient processes required for companys
growth - IT
- Present software does not met business needs.
- Legacy systems difficult to maintain
- Obsolete hardware/software difficult to
maintain -
37How ERP?
- ERP Combines various department systems into a
single, integrated software program that runs
off a single database so that the various
departments can more easily share information and
communicate with each other. - The best part of ERP is the way in which it
improves the order fulfillment process that is
taking the customer order and process it into an
invoice and revenue. - It doesnt handle the front-end that is handled
by CRM (Customer Relationship Management).
38How ERP?(contd)
- When a customer service representative enters a
customer order to an ERP system, he has all the
information necessary to complete the order such
as customers credit rating and order history
from the finance module, the companys inventory
levels from the warehouse module and the shipping
docks trucking schedule from the logistics
module. - How its being done It integrates the financial
information and customer order information . It
does so by integrating the following -
- Database
- Application
- Interfaces
- Tools
- BPR
39How ERP? (contd)
- It standardizes and speeds up the manufacturing
process. This saves time, increases productivity
and reduces head count. - It reduces the inventory. Due to the information
available about all the orders it helps to
maintain the right level of stock and smoothes
the manufacturing process.
40Data Warehouse EAI
- What is EAI? EAI refers to Enterprise Application
Integration. EAI is the merging of applications
and data from various new and legacy systems
within a business. Various means are employed to
accomplish EAI, including middleware, in order to
unify IT resources, maximize new ERP investments,
diminish errors and get everyone on the same
page. EAI enables companies to link their
existing software applications with each other
and with portals. EAI provide the ability to get
their applications to exchange critical data. EAI
is usually close to the top of any CIO's list of
concerns. There are different approaches to EAI.
Some rely on linking specific applications with
tailored code, but most rely on generic
solutions, typically called middleware. XML,
combined with SOAP and UDDI is a kind of
middleware.
41E-Business
- Trend toward better understanding as well as
consolidation of internal processes and data -
- Trend toward web-enabled data warehouse.