Title: Chapter 7: Databases and Data Warehouses
1Chapter 7 Databases and Data Warehouses
2Ideas From the First Part of Chapter 7
- Problems with the traditional file approach (pre
database) - Data redundancy
- Data integrity
- Data security
- Program data dependence
- Programmers are required to access data
- Advantages of the database approach
- Reductions in data redundancy
- Application-data independence
- Better control better security
- Flexibility
3More Ideas
- Object-oriented database model
- Successor to the relational model
- Integration of data and programs
- Handles wider variety of field types
- Entity-relationship (ER) diagrams
- Graphical method of displaying relationships
between tables - An ER diagram is an example of a schema
(conceptual model of the database) - Tool for IS professionals
4CREATING A DATABASE ENVIRONMENT
An Entity-Relationship Diagram
5Physical versus Logical Views
- In managing information, physical deals with the
structure of information as it resides on various
storage media. - Logical deals with how knowledge workers view
their information needs, and includes such terms
as - CHARACTER - our smallest unit of information.
- FIELD - group of related characters.
- RECORD - group of related fields.
- FILE - group of related records.
- DATABASE - group of logically associated files.
- DATA WAREHOUSE - information from many databases.
6Other Logical Structures in a Database
- DATA DICTIONARY - contains the logical structure
of information in a database. - Definitions of all fields, records, and tables
- Relationships between tables
- Who is responsible for maintaining data in the
database - Descriptions of who is authorized to access
different parts of the database - Data dictionary contains meta data (data about
the data)
7Sample Data Dictionary Report
8Components of a DBMS
- Data definition subsystem (language DDL)
- Defines the structure of the database tables
(design view in Access) - Creates and maintains the data dictionary
- Defines the relationships between tables
- Add, delete, or modify field properties
9More Components of a DBMS
- Data manipulation subsystem (language DML)
- Add, delete, and modify data in the database
- Contains the query languages (QBE or SQL) for the
database. SQL is both a DML and DDL - Contains report generation capability
- Data administration subsystem
- Manage the overall database environment by
providing facilities for - Backup and recovery
- Security management
10Data Warehouse
- Definition- a database with tools that stores
current and historical data that is designed to
support business analysis activities and
decision-making tasks of managers typically a
relational database model is used - Benefits
- improved access
- improved information
- isolation from operational systems
- tools permit advanced data analysis
- Users
- Data marts
11Building a Data Warehouse (ETL)
- Extraction phase create files on the computer
that will store the data warehouse and move
transaction data to this machine data may come
from many sources or parts of the organization - Transformation phase cleanse and standardize
the data. Why is this necessary? - Load phase transfer the data from the
transformation phase into the data warehouse - The ETL process becomes automated to make regular
transfers of transaction data into the data
warehouse
12Comparison of Data in a Data Warehouse and
Operational Data
- Operational Data
- Data is on many systems
- Current operational data
- Inconsistent data definitions
- Functionally organized data
- Data are constantly changing
- Support OLTP
- Warehouse Data
- Integrated in one enterprise-wide system
- Recent and historical data
- Consistent data definitions
- Data are organized around business entities
- Data are stabilized
- Support OLAP
13Data-Mining and Data-Mining Tools
- Data-mining is the process of selecting,
exploring, and modeling large amounts of data to
discover previously unknown relationships that
support decision making. - Traditional data mining tools answer questions
about variables that we think are related - Query languages (QBE or SQL)
- Report generators
- Multidimensional analysis tools (OLAP or pivot
tables) - Standard statistical procedures (regression,
ANOVA) - Knowledge discovery tools are data-mining tools
for finding relationships that are not
discernable to the human eye (see next slide)
14Data-Mining as Knowledge DiscoverySelected
Examples
15Multidimensionality
- Multidimensional data analysis (or OLAP) enables
users to view data using various dimensions,
measures and time frames (i. e., OLAP) - dimensions products, business units, country,
industry (e.g., categories) - measures money, unit sales, head count,
variances - time daily, weekly, monthly, quarterly, yearly)
- This type of analysis also provides the ability
to view data in different ways (tables, charts,
3-D, geographically) - OLAP tools provide for this
- Pivot tables in Excel or Access
16Examples of OLAP Tools
- Go to www.fedscope.opm.gov
- Under data cubes on entry page click on
employment - Demonstrate drill down and adding charts
- Data for this example comes from the Central
Personnel Data File (CPDF) of the federal
government - The OLAP tool used to build this site is from a
company named Cognos (PowerPlay) - OLAP tools based on Excel
- http//wLCubed.com
- http//www.cubularity.com
17Multidimensionality
18Database Architecture The Physical and Logical
Layout of the Hardware, Data, and Applications
- Centralized databases with remote access
- Distributed Databases
- With replication a full copy of the entire
database is stored at all sites - With fragmentation the database is partitioned
- Parts of database are stored where they are most
often accessed
19Web Databases
- The ease of use of Web browsers enables firms to
link their databases to the Web - Ease of use enables users to
- Access and retrieve information from a database
- Enter information into the database
- The user requires no special training in a DBMS
to perform the above activities prior to the
browser and the Web this would not have been
feasible - What does this mean?
20Federal Trade Commissions Fair Information
Practice Principles (1973)
- Notice/awareness disclosure of practices before
collecting data - Choice/consent opt in/opt out for consumers
- Access/participation consumers can review and
contest data for accuracy and completeness - Security data collectors must take steps to
secure data for accuracy and unauthorized use - Enforcement there must be a mechanism in place
to enforce FIP principles - Laws enforce these principles for data collected
by federal agencies not so in the private sector
21Spreadsheets Versus DBMS
- Linkage between elements
- spreadsheet - between cells in same table
- DBMS - between elements in different tables
- Orientation
- spreadsheet is toward calculations
- DBMS is tilted toward organization and linkage of
data elements in different tables - Capabilities
- DBMS has extensive querying and reporting power
- spreadsheet is limited
- Memory requirements
- entire spreadsheet table must be in memory
- not true for the database table