Title: Databases, Schemas and Data Models
1 Databases, Schemas and Data Models
Subschema User A
Subschema User B
Subschema User C
External Level
Glenn Ted Eric
YOU ARE HERE
Conceptual Level
Inventory Record Item integer(5),
non-null Description char(15) ...
Sales Record Invoice integer(6)
Customer Record Customer integer(6) Name
char(30) ...
Internal Level
2Modeling the Data
- Entity Relationship Diagram (ERD)
- Entity thing about which we wish to store
information - people, places, objects, or event
- Relationships among entities
- Attributes of the entity
- Multivalued attribute
3Entities
- Things
- Distinguishable based on attributes
- Represents a class (gt1) of things
- Entity Type vs. Occurrence (Instance)
- Student vs. You
- Distinguishable based on value of attributes
- Where are the entities?
- REA Data Model
- Resources acquired and used
- Events engaged in
- Agents participating in events
- E-R-A may be better ordering
4Attributes
- Characteristics of Entity
- Name, ID, Major
- A compound attribute is one that actually
consists of more primitive attributes. aka
concatenated attribute, composite attribute, and
data structure. - Attribute properties
- The data type for an attribute defines what class
of data can be stored in that attribute - The domain of an attribute defines what values an
attribute can legitimately take on - The default value for an attribute is that value
which will be recorded if not specified by the
user
5Special Attributes
- We need to uniquely identify each entity instance
based on the data value of one or more attributes - A key is an attribute, or a group of attributes,
which assumes a unique value for each entity
instance. It is sometimes called an identifier. - A group of attributes that uniquely identifies an
instance of an entity is called a concatenated
key. aka composite key and compound key. - A candidate key is a candidate to become the
primary identifier of instances of an entity. It
is sometimes called a candidate identifier. - A primary key is that candidate key which will
most commonly be used to uniquely identify a
single entity instance.
6Special Attributes
- Any candidate key that is not selected to become
the primary key is called an alternate key - A subsetting criteria is a attribute (or
concatenated attribute) whose finite values
divide all entity instances into useful subsets.
aka inversion entry - Multi-valued Attributes
- Can take on multiple-values for a single entity
- major
- OK in initial stages but must be normalized
later
7 Relationships
- Association between one or more entities
- Events use Resources and may require Agents
- Bi-directional
- Relationships are defined by Cardinality,
Optionality and Degree
8 Chicken Feet
- Cardinality is the number of entities involved in
relationship - Optionality is whether relationship is mandatory
(minimum)
Cardinality
Minimum
Maximum
Graphic Notation
Interpretation
Instances
Instances
Exactly one
1
1
Zero or one
0
1
One or more
1
many ( gt 1 )
Zero, one, or more
0
many ( gt 1 )
More than one
gt 1
gt 1
9 Examples
11
M1
MN
10Relationship Degree
Degree 1 Unary (recursive)
Degree 3 Ternary
Unary (recursive)
STUDENT
ADVISOR
MAJOR
11Associative Entity
Which instance of passenger and flight are we
interested in?
12What Data
13Logical Data Modeling
- Take the initial component elements and ..
- Formally structure data into a stable form
- Normalization
- Allow the actual needs of organization to be
reflected - Enable construction of a physical data model
- Characteristics of a good data model
- Pictorial
- Rigorous and Specific
- Top-Down Decomposable
- Provide Focus
- Minimally Redundant
- Transparent
- Easily Navigated
14Relational Data Model
- Relations represented as two-dimensional tables
- Named columns that represent the attributes and
unnamed rows representing unique instances of the
entity.
15Data Base Design Options One Uniform Table
Student ID 281521344 281521344 281521344 12345678
9 123456789
Course Unique 00071 00150 00179 00071 00153
Student Name Bill Bixby Bill Bixby Bill
Bixby Lana Turner Lana Turner
Course Description AMIS 531 ENGL 100 HIST
203 AMIS 531 MATH 102
- data redundancy
- lack of flexibility (must be a student)
16Data Base Design OptionsVary the Number of
Columns
Course Unique 00071 00071
Course Unique 00150 00153
Course Unique 00179
Student ID 281521343 123456789
Student Name Bill Bixby Lana Turner
- how many columns?
- wasted space or shortage of space
17Data Base Design Options Use Multiple Tables
Enrollment Table
Course Table
Student Table
Course Unique 00071 00150 00153 00179
Course Description AMIS 531 ENGL 100 MATH
102 HIST 203
Student ID 281521343 281521343 281521343 1234567
89 123456789
Course Unique 00071 00150 00179 00071 00153
Student ID 281521343 123456789
Student Name Bill Bixby Lana Turner
- Best Solution
- normalization
- avoids anomalies
- update, insert, delete
- allows use of simple yet powerful query language
18Data Normalization
- Three step process to a stable data form
- Functional Dependency
- For any relation R attribute B is functionally
dependent on attribute A if, for every valid
instance of A, that value A uniquely determines B - Normalization is designed to create a controlled
set of dependencies that conform to constraints - First Normal Form No repeating data elements
- Second Normal Form 1NF and no partial functional
dependencies - Third Normal Form 1 and 2 NF and no transitive
dependencies
19First Normal Form
- Get rid of multi-valued attributes
New entity
20Second Normal Form
- One or more non-key attributes can be identified
by less than the primary key
Identifier Product ID?
Focus on entities with concatenated keys
21Third Normal Form
- One or more non-key attributes can be derived
from other non-key attribute(s)
Data depend only on the key
No dependence on Order but Customer
Can be calculated but costly
Can be calculated
22 De-normalization
- Performance tradeoff YTD Sales Calculation
23Fully Normalized Data Model
24Minimum Cardinality Control
Resources
Agents
Events
(1,)
(0,)
(1,1)
(0,)
(0,)
(1,)
A sale must have a customer, but we track
potential customers
Some inventory may just sit there!
(1,1)
(1,1)
All sales are not paid for immediately
(0,)
(0,)
(1,1)
(1,1)
(0,)
(0,)
25Data Storage Databases
- File based systems focus on applications
- Database(DB) systems focus on entire organization
File1 Item A Item B
Data Base Item A Item B Item C Item D
Data base management system (DBMS)
File2 Item C Item D
26 Data Storage Data Warehouse
Data Sources
85 queries
15 queries
Users
Enter/Input