Title: 11. Building Information Systems
1DATA RESOURCE MANAGEMENT
2Data Hierarchy in a Computer System
3Entitities and Attributes
4Problems with the Traditional File Environment
- Data redundancy
- Program-Data dependence
- Lack of flexibility
- Poor security
- Lack of data-sharing and availability
5Traditional File Processing
6Database Management System (DBMS)
- Creates and maintains databases
- Eliminates requirement for data definition
statements - Acts as interface between application programs
and physical data files - Separates logical and physical views of data
7The Contemporary Database Environment
8Components of DBMS
- Data definition language Specifies content and
structure of database and defines each data
element - Data manipulation language
- Manipulates data in a database
- Data dictionary Stores definitions of data
elements, and data characteristics
9Sample Data Dictionary Report
10Relational Data Model
11Three Basic Operations in a Relational Database
- Select Creates subset of rows that meet specific
criteria - Join Combines relational tables to provide users
with information - Project Enables users to create new tables
containing only relevant information
12Three Basic Operations in a Relational Database
13FLAT FILE NOT NORMALIZED
14A Normalized Relation of ORDER
15Ensuring Database Integrity
- Database integrity involves the maintenance of
the logical and business rules of the database. - There are two kinds of DB Integrity that must
be addressed - Entity Integrity
- Referential Integrity
16Entity Integrity
- Entity integrity deals with within-entity rules.
- These rules deal with ranges and the permission
of null values in attributes or possibly between
records
17Examples of Entity Integrity
- Data Type Integrity very common and most basic.
Checks only for data type compatibility with DB
Schema, such as numeric, character, logical,
date format, etc. - Commonly referred to in GIS manuals as
- Range and List domains
- Ranges - acceptable Numeric ranges for input
- List - acceptable text entries or drop-down lists.
18Enforcing Integrity
- Not a trivial task!
- Not all database management systems or GIS
software enable users to enforce data integrity
during attribute entry or edit sessions. - Therefore, the programmer or the Database
Administrator must enforce and/or check for
Integrity.
19Referential Integrity
- Referential integrity concerns two or more tables
that are related. - Example IF table A contains a foreign key that
matches the primary key of table B THEN
values of this foreign key either match the value
of the primary key for a row in table B or must
be null. - Necessary to avoid Update anomaly, Delete
anomaly.
20Querying Databases Elements of SQL
- Basic SQL Commands
- SELECT Specifies columns
-
- FROM Identifies tables or views
- WHERE Specifies conditions
21Using SQL- Structured Query Language
- SQL is a standard database protocol, adopted by
most relational databases - Provides syntax for data
- Definition
- Retrieval
- Functions (COUNT, SUM, MIN, MAX, etc)
- Updates and Deletes
22SQL Examples
- CREATE TABLE SALESREP
- Item definition expression(s)
- item, type, (width)
- DELETE table
- WHERE expression
23Data Retrieval
- SELECT list FROM table WHERE condition
- list - a list of items or for all items
- WHERE - a logical expression limiting the number
of records selected - can be combined with Boolean logic AND, OR, NOT
- ORDER may be used to format results
24UPDATE tables
- SET item expression
- WHERE expression
- INSERT INTO table
- VALUES ..
25Database Normalization
- Normalization The process of structuring data to
minimize duplication and inconsistencies. - The process usually involves breaking down a
single Table into two or more tables and defining
relationships between those tables. - Normalization is usually done in stages, with
each stage applying more rigorous rules to the
types of information which can be stored in a
table.
26Normalization
- Normalization a process for analyzing the design
of a relational database - Database Design - Arrangement of attributes into
entities - It permits the identification of potential
problems in your database design - Concepts related to Normalization
- KEYS and FUNCTIONAL DEPENDENCE
27Ex Database Normalization (1)
- Sample Student Activities DB Table
- Poorly Designed
- Non-unique records
- John Smith
- Test the Design by developing sample reports and
queries
28Ex Database Normalization (2)
- Created a unique ID for each Record in the
Activities Table - Required the creation of an ID look-up table
for reporting (Students Table) - Converted the Flat-File into a Relational
Database
29Ex Database Normalization (3)
- Wasted Space
- Redundant data entry
- What about taking a 3rd Activity?
- Query Difficulties - trying to find all swimmers
- Data Inconsistencies - conflicting prices
30Ex Database Normalization (4)
- Students table is fine
- Elimination of two columns and an Activities
Table restructuring, Simplifies the Table - BUT, we still have Redundant data (activity fees)
and data insertion anomalies.
Problem If student 219 transfers we lose all
references to Golf and its price.
31Ex Database Normalization (5)
- Modify the Design to ensure that every non-key
field is dependent on the whole key - Creation of the Participants Table, corrects our
problems and forms a union between 2 tables.
This is a Better Design!
32The Normal Forms
- A series of logical steps to take to normalize
data tables - First Normal Form
- Second
- Third
- Boyce Codd
- Theres more, but beyond scope of this
33First Normal Form (1NF)
- All columns (fields) must be atomic
- Means no repeating items in columns
Solution make a separate table for each set of
attributes with a primary key (parser, append
query)
Customers CustomerID Name
Orders OrderID Item CustomerID OrderDate
34Second Normal Form (2NF)
- In 1NF and every non-key column is fully
dependent on the (entire) primary key - Means Do(es) the key field(s) imply the rest of
the fields? Do we need to know both OrderID and
Item to know the Customer and Date? Clue
repeating fields
Solution Remove to a separate table (Make Table)
OrderDetails OrderID Item
Orders OrderID CustomerID OrderDate
35Third Normal Form (3NF)
- In 2NF and every non-key column is mutually
independent - means Calculations
- Solution Put calculations in queries and forms
OrderDetails OrderID Item Quantity Price
Put expression in text control or in
query Quantity Price
36Data Warehousing and Datamining
- Data warehouse
- Supports reporting and query tools
- Stores current and historical data
- Consolidates data for management analysis and
decision making
37What is a Data Warehouse?
- "A warehouse is a subject-oriented, integrated,
time-variant and non-volatile collection of data
in support of management's decision making
process". - Bill Inmon (1990)
- "A Data Warehouse is a repository of integrated
information, available for queries and analysis.
Data and information are extracted from
heterogeneous sources as they are generated. - Anonymous
38Components of a Data Warehouse
39 Data Mining
- ON-LINE ANALYTICAL PROCESSING (OLAP) ability to
manipulate, analyze large volumes of data from
multiple perspectives - MINING Seeking relationships that are not known
in advance. A function of the software and data
organization.
40 DW Characteristics
- Subject OrientedData that gives information
about a particular subject instead of about a
company's ongoing operations. - Integrated Data that is gathered into the data
warehouse from a variety of sources and merged
into a coherent whole. - Time Variant All data in the data warehouse is
identified with a particular time period.
41Data Acquisition
- The process of moving company data from the
source systems into the warehouse. - Often the most time-consuming and costly effort.
- Performed with software products known as ETL
(Extract/Transform/Load) tools. - Over 50 ETL tools on market.
42Data Cleansing
- Typically performed in conjunction with data
acquisition. - A complicated process that validates and, if
necessary, corrects the data before it is
inserted. - AKA "data scrubbing" or "data quality assurance".