Title: Introduction to Database Modeling and Design
1Chapter 1
- Introduction to Database Modeling and Design
1
2Introduction to Database Modeling and Design
- Portions of the Materials at this website
subject- - Database Modeling and Design -are drawn from the
- Textbooks below
-
- Database Modeling and Design, 4th Edition, 2006
- Authors Toby Teorey, Sam Lightstone, and Tom
Nadeau - Publisher Morgan Kaufmann
- Database Systems, 2nd Edition
- Authors Thomas Connolly and Carolyn Begg
- Publisher Addison Wesley
- Â
-
3Road Map for Learning By Subject
Introduction
Chapters 1
Chapter 2
The Entity_Relationship Model
Chapters 3
The Unified Modeling Language
Chapters 6
Chapter 4
Reqts Analysis and Conceptual DM
Normalization
Chapters 5
Transforming the Conceptual Data Model to SQL
Chapter 8
Chapters 7
Business Intelligence
Logical data Modeling Physical Data Modeling
Chapter 9
CASE Tools for Logical Database Design
4Chapter 1 Topics
- Data and Database Management
- The Database Life Cycle
- Conceptual Data Modeling
5Data and Information
- Were told we live in the information age.
- People often talk about data and information as
if there were the same. They are, in many
regards, opposite. - A datum is just a factyour name is a fact, your
phone number is a fact. - Information is data that is presented in a
meaningful, understandable. beneficial format.
Information is data that has been organized ,
sequenced, correlated and summarized, such as a
phone book.
6Data and Information
- A phone book is information. It not only contains
names and phone numbers, but it correctly
associates each persons phone number with their
names. It presents this list of correlated names
and phone numbers in alphabetical sequence, so
that we find the phone number from the name. In
addition, it divides the phone numbers into two
types personal and business. - It is the function of the computer to convert
data to information.
7Definitions
- Database The database is a place where you put
your data data that you wish to convert to
information at some future time. - Database Management System A DBMS is the
software that converts the data in your database
to information. It is the DBMS that provides you
the capability for cross-referencing,
correlating, sorting, summarizing, etc.
8Information as A Competitive Weapon
- Information technology and quality information
are not - the goals, but merely to support organizations to
reach - goals of
- Superior products and services
- Greater productivity
- Eventually success
9Data, Information, and Decision
- Data
- Information (Data Process)
- Knowledge
- Decision (Information
- Knowledge)
- Data/Information/Decision
- Data Resource Management (DRM)
- MIS (OLTP) OOAD
- KM (Knowledge Mgt), KWS (Knowledge Work Systems)
- DSS ESS, EIS (Executive Level)
-
- Data Warehousing/Data Mart/Data Mining/OLAP
(Executive, Collaborative and individual levels)
10Data, Information, and Decision
- Data Data
processing - Processing
System Analysis/Design - Information MIS,
Database Systems - Object (DataProcessing) Object-Oriented
SD/DA - Knowledge
Artificial Intelligence - Information
Expert system - Decision (executive level) DSS, EIS
- Decision (all levels, sophisticated) Data
warehousing -
Data Mining
11Databases Modeling
Databases Modeling
Type of Database
New Trend
Constructs
Characteristics
Relational Database
ERD EER
Row/ Column
Dimensional Modeling
Multi-Dimensional Database
OLAP DW
Cube
Distributed Component Object Model
Distributed Database
Client Object (DCOM)
XML
Object-Oriented Database
UML
Object
Class Diagram
Object Data Operations(Services)
Entity Data only
12Topic 1
Data and Database Management
13Relational Model Terminology
- A relation is a table with columns and rows.
- Only applies to logical structure of the
database, not the physical structure. - Attribute is a named column of a relation.
- Domain is a set of allowable values for one or
more attributes.
14Relational Model Terminology
- Tuple is a row of a relation.
- Degree is a number of attributes in a relation.
- Cardinality is a number of tuples in a relation.
- Relational Database is a collection of normalized
relations. (?)
15Alternative Terminology for Relational Model
16Instances of Branch and Staff (part) Relations
17Examples of Attribute Domains
18Data and Database Management
- The characteristics of file-based systems and the
problems with the file-based approach. - The meaning of the term Database Management
System (DBMS). - The typical functions of a DBMS and the major
components of the DBMS environment. - Multi-User DBMS Architectures.
- The personnel involved in the DBMS environment.
- The history of the development of DBMSs.
- The advantages and disadvantages of DBMSs
19File-based Systems
- A collection of application programs that perform
services for the end users (e.g. reports). - Each program defines and manages its own data.
5
20File-based Processing
6
21Limitations of File-based Approach
- Separation and isolation of data
- Each program maintains its own set of data. Users
of one program may be unaware of potentially
useful data held by other programs. - Duplication of data
- Same data is held by different programs. Wasted
space and potentially different values and/or
different formats for the same item.
7
22Limitations of File-based Approach
- Data dependence
- File structure is defined in the program code.
- Incompatible file formats
- Programs are written in different languages, and
so cannot easily access each others files.
8
23Limitations of File-based Approach
- Fixed Queries/Proliferation of application
programs - Programs are written to satisfy particular
functions. Any new requirement needs a new
program.
9
24Database Approach
- Arose because
- Definition of data was embedded in application
programs, rather than being stored separately and
independently. - No control over access and manipulation of data
beyond that imposed by application programs. - Result - the database and Database Management
System (DBMS).
10
25Database
- A shared collection of logically related data
(and a description of this data), designed to
meet the information needs of an organization. - System catalog (data dictionary or metadata)
provides the description of the data to enable
programdata independence. - Logically related data comprises entities,
attributes, and relationships of an
organization's information.
11
26Database Management System (DBMS)
- A software system that enables users to define,
create, and maintain the database and which
provides controlled access to this database.
12
27Database Management System (DBMS)
13
28Database Approach
- Data definition language (DDL).
- Permits specification of data types, structures
and any data constraints. All specifications are
stored in the database. - Data manipulation language (DML).
- General enquiry facility (query language) of the
data.
14
29DDL Example
- ALTER TABLE STAFF
- ADD FOREIGN KEY (Branch_No)
- REFERENCES Branch
30DML Example
- Select
- From property_for_rent
- Where type Flat and rent gt 400
31Database Approach
- Controlled access to database may include
- A security system.
- An integrity system.
- A concurrency control system.
- A recovery control system.
- A user-accessible catalog.
- A view mechanism.
- Provides users with only the data they want or
need to use.
15
32Components of DBMS Environment
16
33Components of DBMS Environment
- Hardware
- Can range from a PC to a network of computers.
- Software
- DBMS, operating system, network software (if
necessary) and also the application programs. - Data
- Used by the organization and a description of
this data called the schema.
17
34Objectives of Three-Level Architecture
- All users should be able to access same data.
- A user's view is immune to changes made in other
views. - Users should not need to know physical database
storage details.
35Objectives of Three-Level Architecture Proposed
by ANSI-SPARC
- DBA should be able to change database storage
structures without affecting the users' views. - Internal structure of database should be
unaffected by changes to physical aspects of
storage. - DBA should be able to change conceptual structure
of database without affecting all users.
ANSI-SPARC ( American National Standards
Institute Standards Planning and Requirements
Committee)
36ANSI-SPARC Three-level Architecture
37ANSI-SPARC Three-level Architecture
- External Level
- Users' view of the database. Describes that part
of database that is relevant to a particular
user. - Conceptual Level
- Community view of the database. Describes what
data is stored in database and relationships
among the data.
38ANSI-SPARC Three-level Architecture
- Internal Level
- Physical representation of the database on the
computer. Describes how the data is stored in
the database.
39An example of the three levels
- External level
- 01 employee,
- 02 employee_city pic
x(20), - 02 employee_state
pic x(2). -
- conceptual level
- employee
- employee_city
varchar(20) - employee_state
char(2) - Internal Level
- employee
- employee_city varchar(20) NULl
- employee_state
char(2) NOT NULL
40Data Independence
- Logical Data Independence
- Refers to immunity of external schemas to changes
in conceptual schema. - Conceptual schema changes e.g. addition/removal
of entities. - Should not require changes to external schema or
rewrites of application programs.
41Data Independence
- Physical Data Independence
- Refers to immunity of conceptual schema to
changes in the internal schema. - Internal schema changes e.g. using different file
organizations, storage structures/devices. - Should not require change to conceptual or
external schemas.
42Data Independence and the ANSI-SPARC Three-level
Architecture
43Functions of a DBMS
- Data Storage, Retrieval and Update.
- Must furnish users with the ability to store,
retrieve, and update data in the database. - A User-Accessible Catalog.
- Must furnish a catalog in which descriptions of
data items are stored and which is accessible to
users.
44Functions of a DBMS
- Transaction Support
- Must furnish a mechanism to ensure that either
all the updates corresponding to a given
transaction are made or that none of them are
made. - Concurrency Control Services
- Must furnish a mechanism to ensure that database
is updated correctly when multiple users are
updating the database concurrently.
45Functions of a DBMS
- Recovery Services
- Must furnish a mechanism for recovering the
database in the event that the database is
damaged in any way. - Authorization Services
- Must furnish a mechanism to ensure that only
authorized users can access the database.
46Functions of a DBMS
- Support for Data Communication
- Must be capable of integrating with communication
software. - Integrity Services
- Must furnish a means to ensure that both the data
in the database and changes to the data follow
certain rules.
47Functions of a DBMS
- Services to Promote Data Independence
- Must include facilities to support the
independence of programs from the actual
structure of the database. - Utility Services
- Should provide a set of utility services.
48Components of a DBMS
49Components of a DBMS
- Query processor
- Database manager (DM)
- File manager
- DML preprocessor
- DDL compiler
- Catalog manager
50Components of Database Manager (DM)
51Components of Database Manager (DM)
- Authorization control
- Command processor
- Integrity checker
- Query optimizer
- Transaction manager
- Scheduler
- Recovery manager
- Buffer manager
52Catalog Manager (System Catalog)
- A repository of information (metadata) describing
the data in the database. - Typically stores
- Names of authorized users.
- Names of data items in the database.
- Constraints on each data item.
- Data items accessible by a user and the type of
access.
53Catalog Manager (System Catalog)
- It is used by modules such as
- Authorization Control.
- Integrity Checker.
54Multi-User DBMS Architectures
- Teleprocessing
- File-server
- Client-server
55Teleprocessing
- Traditional architecture.
- Single mainframe with a number of terminals
attached. - Trend is now towards downsizing.
56Teleprocessing Topology
57File-server
- File-server is connected to several workstations
across a network. - Database resides on file-server.
- DBMS and applications run on each workstation.
58File-server
- Disadvantages include
- Significant network traffic.
- Copy of DBMS on each workstation.
- Concurrency, recovery and integrity control more
complex.
59File-server Architecture
60Client-server
- Server holds the database and the DBMS.
- Client manages the user interface and runs
applications.
61Client-server
- Advantages include
- Wider access to existing databases.
- Increased performance.
- Possible reduction in hardware costs.
- Reduction in communication costs.
- Increased consistency.
62Roles in the Database Environment
- Data Administrator (DA)
- Database Administrator (DBA)
- Database Designers (Logical and Physical)
- Application Programmers
- End Users (native and sophisticated)
19
63History of Database Systems
- First-generation
- Hierarchical (IMS) and Network (Nomad)
- Second generation
- Relational
- Third generation
- Extended Relational
- Object-Oriented
20
64Advantages of DBMS
- Control of data redundancy.
- Data consistency
- More information from the same amount of data.
- Sharing of data.
- Improved data integrity.
- Improved security.
- Enforcement of standards.
- Economy of scale.
21
65Advantages of DBMS
- Balanced conflicting requirements
- Improved data accessibility and responsiveness
- Increased productivity
- Improved maintenance through data independence
- Increased concurrency
- Improved backup and recovery services
22
66Disadvantages of DBMS
- Complexity
- Size
- Cost of DBMS
- Additional hardware costs
- Cost of conversion
- Performance
- Higher impact of a failure
23
67Topic 2
The Database Life Cycle
68System Development Life Cycle A brief overview
- It is a systematic approach to solving business
- problem. Its divided into seven phases
- Identifying problems, opportunities, and
objectives - Determining system requirements
- Analyzing system needs
- Designing the recommended systems
- Developing and documenting software
- Testing and maintaining the system
- Implementing and evaluating the systems
-
69System Development Life Cycle A brief overview
70System Development Life Cycle A brief overview
71The Database Life Cycle
- The database life cycle incorporates the basic
steps involved in - designing a global schema of the logical
database, allocating data - across a computer network, and defining local
DBMS-specific - schemas
- Requirements Analysis (chapter 4)
- Conceptual data modeling (chapter 4 )
- Logical Design (chapters 4, 5 and 6 )
- View integration (chapter 4 )
- Transformation of the CDM to SQL tables
(chapter 5) - Normalization of tables (chapter 6)
- Physical Design (chapter 7 )
- Database Implementation, Monitoring, and
Modification -
72Definition of The Business Requirements
- The definition of requirements is the users
statement of how he or she wants to do business,
and the information required to support his or
her new methods of operations.
73Definition of The Business Requirements
- The requirements can be broadly divided into two
areas - 1 Data requirementsthe data required for
processing, and the natural data relationships. - 2 The software platform for the database
implementation these are the limitations and
demands imposed upon the computing solutions
such as architectural plan, data storage
specifications and information system performance
expectations.
74Data Requirements
Retail Salesperson View
Customer View
Customer
Salespersons
Customers
Order Unit
Sale Unit
Orders
Products
Orders
Airline Company
Supermarket Chain
Supermarket Chain
Customer
Time
Promotion
Time
Sale Unit
Flight
Frequent Flyer Flights
Store
Product
Fare Class
Airport
Status
75Data Model
- Collection of concepts for describing data,
relationships between data and constraints on the
data in an organization. - Data Model comprises
- A structural part (a set of rules according to
- which databases can be constructed)
- A manipulative part (defining the types of
operations that are allowed on the data) - Possibly a set of integrity rules
76Data Model
- Purpose
- To represent the data in an understandable way.
- Categories of data models include
- Conceptual
- Logical (Functional)
- Physical
77Levels of Modeling
Descriptive The dealer sold 200 cars last month.
Operational
Primarily Two Dimensional Database System
(OLTP)
Explanatory For every increase in 1 in the
interest, auto sales decrease by 5 .
Star Schema Cube
Traditional DW
(OLAP)
Predictive predictions about future buyer
behavior.
Cube sophisticated analytical tools
Data Mining
78Levels of Modeling
Explanatory WHAT IF PROCESSING ANALYZE
WHAT HAS PREVIOUSLY OCCURRED TO BRING ABOUT
THE CURRENT STATE OF THE DATA
Predictive
Descriptive SIMPLE QUERIES REPORTS
DETERMINE IF ANY PATTERNS EXIST BY
REVIEWING DATA RELATIONSHIPS
Statistical Analysis/Expert System/ Artificial
Intelligence
Normalized Tables
Dimensional Tables
Classification Value Prediction
Roll-up Drill Down
Query
79Descriptive Modeling
- Conceptual Data Model (Analysis - Requirements
Gathering Whats it?) - Logical Data Model (Design-How is it?)
- Physical Data Model (Implementation)
80Explanatory Modeling
- Also called Dimensional Modelling (to be
discussed in chapter 8) - Ways to derive the database component of a data
warehouse - Every dimensional model (DM) is composed of one
table with a composite primary key, called the
fact table, and a set of smaller tables called
dimension tables.
81Predictive Modeling
- Similar to the human learning experience
- Uses observations to form a model of the
important characteristics of some phenomenon. - Uses generalizations of real world and ability
to fit new data into a general framework. - Can analyze a database to determine essential
characteristics (model) about the data set.
82Conceptual Data Modeling
Retail Salesperson View
Product
Customer
Orders
Product_No
Product_Name Product_Qty
Customer_no Customer_name Customer_address
N
N
N
N
1
Served-by
Sold-by
N
Salesperson_no Salesperson_name
Salesperson
83Conceptual Data Modeling
Customer View
Order
Customer
Places
Order_no
Oredr_date Order_Qty
Customer_no Customer_name Customer_address
1
N
84Use Hotel Case For Illustration-Customer View
Room
Guest
Room_no Type Price
Guest_no Guest_name Guest_address
Books
85Use Hotel Case For Illustration-Hotel Front-Desk
Clerk View
Room
Guest
Room_no
Guest_no Guest_name Guest_address
Registered-by
Assigned-by
Clerk_No
Hotel Clerk
86Use Hotel Case For Illustration-View Integration
Booking
Guest
Hotel_no Guest_no Date_from Date_to Room_no
Guest_no Guest_name Guest_address
Hotel
Room
Hotel_No Hotel_name City
Room_no Hotel_no Type Price
11
1M
Identifying Relationship
Dependent Entity (Attribute Entity)
87View Integration From Concept of
Specialization/Generalization
- Generalization
- The process of minimizing the differences between
- entities by identifying their common features.
- Specialization
- the reverse of generalization, is an
inversion of the same concept.
88Specialization/Generalization
Customers
Order
Aircraft
Generalization
Specialization
Commercial
Military
B52
B-1B
747
777
89View Integration From Concept of Subtype and
Super-type
Faculty
Full Time Faculty
Part Time Faculty
Part (day) Time Faculty
Part (Night)Time Faculty
90View Integration Objectives
- How to merge data models based on specific user
views into a global logical data model of the
enterprise. - How to ensure that the resultant global model is
a true and accurate representation of the part of
the enterprise we are attempting to model.
91Transformation of the CDM to SQL Tables
- Objective of this step is to build a logical data
model based on the conceptual data model of the
users view of the enterprise and then to
validate this model using the technique of
normalization and against the required
transactions.
92Transformation of the CDM to SQL Tables
- To refine the local conceptual data model to
remove undesirable features and to map this
model to a local logical data model. This
involves - (1) Remove MN relationships.
- (2) Remove complex relationships.
- (3) Remove recursive relationships.
- (4) Remove relationships with attributes.
- (5) Remove multi-valued attributes.
- (6) Re-examine 11 relationships.
- (7) Remove redundant relationships.
93Removing Advertises MN Relationship
94Removing Leases Complex Relationship
95Normalization
- A technique for producing a set of relations with
desirable properties, given the data requirements
of an enterprise. - Developed by E.F. Codd (1972).
- Often performed as a series of tests on a
relation to determine whether it satisfies or
violates the requirements of a given normal form.
96Normalization
- Four most commonly used normal forms are first
(1NF), second (2NF), third (3NF) and Boyce-Codd
(BCNF) normal forms. - Based on functional dependencies among the
attributes of a relation. - A relation can be normalized to a specific form
to prevent the possible occurrence of update
anomalies.
97Data Redundancy
- Major aim of relational database design is to
group attributes into relations to minimize data
redundancy and reduce file storage space required
by base relations. - Problems associated with data redundancy are
illustrated by comparing the Staff and Branch
relations with the Staff_Branch relation.
98Data Redundancy
11
99Data Redundancy
- Staff_Branch relation has redundant data the
details of a branch are repeated for every member
of staff. - In contrast, the branch information appears only
once for each branch in the Branch relation and
only the branch number (Branch_No) is repeated in
the Staff relation, to represent where each
member of staff is located.
100Update Anomalies
- Relations that contain redundant information may
potentially suffer from update anomalies. - Types of update anomalies include
- Insertion
- Deletion
- Modification
101Functional Dependency
- Main concept associated with normalization.
- Functional Dependency
- Describes the relationship between attributes in
a relation. - For example, if A and B are attributes of
relation R, B is functionally dependent on A
(denoted A Ù B), if each value of A in R is
associated with exactly one value of B in R.
102The Process of Normalization
- Formal technique for analyzing a relation based
on its primary key and the functional
dependencies between the attributes of that
relation. - Often executed as a series of steps. Each step
corresponds to a specific normal form, which has
known properties.
103The Process of Normalization
- As normalization proceeds, the relations become
progressively more restricted (stronger) in
format and also less vulnerable to update
anomalies.
104Topic 3
Conceptual Data Modeling
105Conceptual Data Modeling
- The process of developing a conceptual data model
that is - a complete and accurate representation of an
organization's data requirements. - independent of implementation details.
106Conceptual Data Modeling What?
- What is it?
- 1. Â It is a conceptual representation of data
without concern for its logical (functional) or
physical aspects. - 2. It is a set of high-level business data
models which provides a framework for the data
modeling activities at the next level.
107Conceptual Data Modeling When?
- When should it be done?
- 1. In support of the data requirements of a
process model under development at the
corresponding level. - or
- 2. Outside the system application lifecycle
on a department, division, or company wide basis.
108Conceptual Data Modeling Who?
- Who should do it?
- 1. The group responsible for assuring that
data structure reflects business policies and
rules. -
- 2. It should be a joint effort between the
owners and custodians of the data, the users of
the data, and the analysts.
109Conceptual Data Modeling Why?
- Documents the type of data (information) which
must be represented in a system independent of
specific application, organizations, or
technology. - Maximizes data sharing minimizing redundancy.
- Provides foundations for physical database
design. - Describes the unique business enterprise
specifically.
110Conceptual Data Modeling Why?
- Outside of application life cycle on a
company-wide basis. - Data modeling expresses inherent associations
which are the most part, independent of anyone
one application. - Data entities change very little even through the
way they are used can change for each
application. - A complete maintained conceptual data model
should shorten the requirements definition phases
of system development life cycle.
111Data Modeling Approach
- Data partitioning
- Use a top-down approach to define the data
requirements of a system. The purpose is to
divide and conquer (from subject to entity), and
to evolve from the conceptual level to logical
level until physical database is derived. - Standard deliverables
- For each of the levels, there is a set of
standard deliverables that must be produced. The
documentation items must be well defined so that
the data at each level is well understood.
112Data Partitioning Via Modeling
(How)
(What, Why, Who, Where)
Subjects
Conceptual Level
Technical considerations
Entities
Relationships
Logical Level
Data Elements
Frequencies
Physical Level
Data Definition Language -DDL(create, Alter, drop
tables)
Data Manipulation Language (select, insert,
delete, update)