Title: Prof. Ray Larson
1Lecture 12 Database Design
SIMS 202 Information Organization and Retrieval
- Prof. Ray Larson Prof. Marc Davis
- UC Berkeley SIMS
- Tuesday and Thursday 1030 am - 1200 pm
- Fall 2003
- http//www.sims.berkeley.edu/academics/courses/is2
02/f03/
2Lecture Overview
- Review
- Databases and Database Design
- Database Life Cycle
- ER Diagrams
- Database Design
- Normalization
- Discussion Questions
3Lecture Overview
- Review
- Databases and Database Design
- Database Life Cycle
- ER Diagrams
- Database Design
- Normalization
- Discussion Questions
4Models (1)
5Database System Life Cycle
6Another View of the Life Cycle
Integration 4
Operations 5
Design 1
Physical Creation 2
Conversion 3
Growth, Change 6
7Database Design Process
8Entity
- An Entity is an object in the real world (or even
imaginary worlds) about which we want or need to
maintain information - Persons (e.g. customers in a business,
employees, authors) - Things (e.g. purchase orders, meetings, parts,
companies)
Employee
9Attributes
- Attributes are the significant properties or
characteristics of an entity that help identify
it and provide the information needed to interact
with it or use it (This is the Metadata for the
entities)
10Relationships
- Relationships are the associations between
entities - They can involve one or more entities and belong
to particular relationship types - One to One
- One to Many
- Many to Many
11Relationships
12Types of Relationships
- Concerned only with cardinality of relationship
1
1
n
1
n
m
Chen ER notation
13More Complex Relationships
1/1/1
1/n/n
n/n/1
SSN
Project
Date
1
4(2-10)
Manages
1
Is Managed By
n
14Weak Entities
- Owe existence entirely to another entity
15Supertype and Subtype Entities
16Many to Many Relationships
Employee
17Lecture Overview
- Review
- Databases and Database Design
- Database Life Cycle
- ER Diagrams
- Database Design
- Normalization
- Discussion Questions
18Database Design Process
19Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
20Requirements Analysis
- Conceptual Requirements
- Systems Analysis Process
- Examine all of the information sources used in
existing applications - Identify the characteristics of each data element
- Numeric
- Text
- Date/time
- Etc.
- Examine the tasks carried out using the
information - Examine results or reports created using the
information
21Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
22Conceptual Design
- Conceptual Model
- Merge the collective needs of all applications
- Determine what Entities are being used
- Some object about which information is to
maintained - What are the Attributes of those entities?
- Properties or characteristics of the entity
- What attributes uniquely identify the entity
- What are the Relationships between entities
- How the entities interact with each other?
23Developing a Conceptual Model
- Overall view of the database that integrates all
the needed information discovered during the
requirements analysis - Elements of the Conceptual Model are represented
by diagrams, Entity-Relationship or ER Diagrams,
that show the meanings and relationships of those
elements independent of any particular database
systems or implementation details - Can also be represented using other modeling
tools (such as UML)
24Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
25Logical Design
- Logical Model
- How is each entity and relationship represented
in the Data Model of the DBMS - Hierarchic?
- Network?
- Relational?
- Object-Oriented?
26Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
27Physical Design
- Internal Model
- Choices of index file structure
- Choices of data storage formats
- Choices of disk layout
28Database Design Process
Application 1
Application 2
Application 3
Application 4
External Model
External Model
External Model
External Model
Application 1
Conceptual requirements
Application 2
Conceptual Model
Logical Model
Conceptual requirements
Internal Model
Application 3
Conceptual requirements
Application 4
Conceptual requirements
29Database Application Design
- External Model
- User views of the integrated database
- Making the old (or updated) applications work
with the new database design
30Terms and Concepts
- Key
- An attribute or set of attributes used to
identify or locate records in a file - Primary Key
- An attribute or set of attributes that uniquely
identifies each record in a file - Candidate Key
- An attribute or set of attributes that might be
used as a primary key
31Lecture Overview
- Review
- Databases and Database Design
- Database Life Cycle
- ER Diagrams
- Database Design
- Normalization
- Discussion Questions
32Normalization
- Normalization theory is based on the observation
that relations with certain properties are more
effective in inserting, updating and deleting
data than other sets of relations containing the
same data - Normalization is a multi-step process beginning
with an unnormalized relation
33Normal Forms
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Boyce-Codd Normal Form (BCNF)
- Fourth Normal Form (4NF)
- Fifth Normal Form (5NF)
34Normalization
35Unnormalized Relations
- First step in normalization is to convert the
data into a two-dimensional table - In unnormalized relations data can repeat within
a column - (The following is a highly contrived example that
actually bears only a slight resemblance to the
current implementation of the Phone/Photo project
database)
36Unnormalized Relations
37First Normal Form
- To move to First Normal Form a relation must
contain only atomic values at each row and column - No repeating groups
- A column or set of columns is called a Candidate
Key when its values can uniquely identify the row
in the relation
38First Normal Form
391NF Storage Anomalies
- Insertion A new person has not yet taken a
picture -- hence no Picture -- Since Picture
is part of the key we cant insert - Insertion If People is are known and likely to
be photographed, but havent been yet -- there is
be no way to include that person in the database - Update If a Person changes status (e.g. Mary
Jones becomes a Student) we have to change
multiple rows in the database - Deletion (type 1) Deleting a Person record may
also delete all info about People in the pictures - Deletion (type 2) When there are functional
dependencies (like Object and Object_features)
changing one item eliminates other information
40Second Normal Form
- A relation is said to be in Second Normal Form
when every nonkey attribute is fully functionally
dependent on the primary key - That is, every nonkey attribute needs the full
primary key for unique identification
41Second Normal Form
Person Table
42Second Normal Form
People Table
43Second Normal Form
Picture Table
441NF Storage Anomalies Removed
- Insertion Can now enter new Persons who havent
yet taken pictures - Insertion Can now enter People who havent been
photographed - Deletion (type 1) If Charles Brown withdraws his
photos the corresponding tuples from Person and
Picture tables can be deleted without losing
information on David Rosen - Update If John White takes a third picture, and
has changed status (e.g., graduate), we only need
to change the Person table in one place
452NF Storage Anomalies
- Insertion Cannot enter the fact that a
particular object has a particular feature unless
it is associated with a particular picture - Deletion If John White describes some other
object that Beth Little has while shopping, we
lose the fact that the bookbag is blue - Update If the features of an object change
change we have to update multiple occurrences of
object features
46Third Normal Form
- A relation is said to be in Third Normal Form if
there are no transitive functional dependencies
between nonkey attributes - When one nonkey attribute can be determined with
one or more nonkey attributes there is said to be
a transitive functional dependency - The Obect_Feature column in the Picture table is
determined by the Object - Object_Feature is transitively functionally
dependent on Object so Picture is not 3NF
47Third Normal Form
Picture Table
48Third Normal Form
Object Table
492NF Storage Anomalies Removed
- Insertion We can now enter the fact that an
object has a particular feature - Deletion If John White describes some other
object that Beth Little has while shopping, we
dont lose the fact that the bookbag is blue - Update The features for each object appear only
once
50Boyce-Codd Normal Form
- Most 3NF relations are also BCNF relations
- A 3NF relation is NOT in BCNF if
- Candidate keys in the relation are composite keys
(they are not single attributes) - There is more than one candidate key in the
relation, and - The keys are not disjoint, that is, some
attributes in the keys are common
51Most 3NF Relations Are Also BCNF Is This One?
52BCNF Relations
53Additional Issues
- Why separate Person and People?
- They are really all People/Persons in different
roles - Shouldnt a picture have a unique ID regardless
of Who is in it? - Cant we have multiple people in the same
picture, multiple objects, etc.? - Cant objects have multiple characteristics?
54BCNF Relations
55BCNF Added Capabilities
- Can now have a picture with no (identified)
people in it - Can have multiple objects, activities, and people
associated with each picture
56Fourth Normal Form
- Any relation is in Fourth Normal Form if it is
BCNF and any multivalued dependencies are trivial - Eliminate non-trivial multivalued dependencies by
projecting into simpler tables
57Fifth Normal Form
- A relation is in 5NF if every join dependency in
the relation is implied by the keys of the
relation - Implies that relations that have been decomposed
in previous NF can be recombined via natural
joins to recreate the original relation
58Fifth Normal Form Relations
People Table
59Normalizing to Death
- Normalization splits database information across
multiple tables - To retrieve complete information from a
normalized database, the JOIN operation must be
used - JOIN tends to be expensive in terms of processing
time, and very large joins are very expensive
60Lecture Overview
- Review
- Databases and Database Design
- Database Life Cycle
- ER Diagrams
- Database Design
- Normalization
- Discussion Questions
61Questions Brooke Maury
- Discussion Questions on Hoffer McFadden
- If the goal of the relational database model is
to encode a conceptual design into a logical
design, is it possible that improved technology
and the development of new modeling techniques
will supplant the RDBMS? Specifically, what
impact will XML and the development of document
engineering have on organizing information in
multiple normalized tables? - Conversely, what does the relational model have
that would be lost if a conceptual design was
encoded in another model?
62Questions Brooke Maury
- The drive to develop the RDBM was in part
motivated by a need to minimize the space
required and improve the performance of database
systems by removing redundancies. What impact
will very inexpensive data storage and computing
power have on the relational database model and
the third normal form especially?
63Questions Shane Ahern
- Discussion Questions for "Logical Database Design
and the Relational Model" - Is the normalization process described really
necessary? When I design a database schema, I
find that by thinking of tables in terms of they
entities they represent (employees, sales,
events), I avoid most of the problems of
normalization that the process seeks to address
(i.e. salesperson and region in Sales table,
salesperson is clearly a distinct entity from
sales). If the formal process described in the
article is not followed, are there potential
pitfalls that might lead to problems with your
database schema?
64Questions Shane Ahern
- The article points out that "the relational model
does not yet directly support supertype/subtype
relationships." Once the tables in a relational
database have been decomposed to third normal
form, the database is efficient from systems
point-of-view, but the tables no longer represent
a representation of the data that is intuitive to
humans. The object-oriented model more accurately
mirrors the way we think about the concepts that
we wish to store in databases. So perhaps
object-oriented database systems are worth
considering. What about XML databases?
65Questions Arthur Law
- The three models that we have been presented
with, Entity Relationship Model, NIAM Model, and
Object Oriented Model all enforce a specific
thought process in the organization and
relationship between items in a database. With
all of our recent discussion of computers
understanding natural language are these methods
now out of date with how we should be organizing
information? Should we use artificial
intelligence or learning algorithms to
statistically determine the relationship between
entities or is there still value in using these
models?
66Questions Arthur Law
- Each model is approximately one decade apart in
development and a quick Google search shows that
companies are using databases with one of the
three models. However, as new models arise there
doesn't seem too much interest in migrating from
one data model to another. Which makes sense
given that an organization using a given model
probably finds that it works. Now with the
proliferation of XML, we see more information
being shared between organizations, so are we
fated for an expensive and lengthy translation
process between databases? Or should all DB
administrators be responsible for upgrading to
the latest model?
67Lecture Overview
- Review
- Databases and Database Design
- Database Life Cycle
- ER Diagrams
- Database Design
- Normalization
- Discussion Questions
- Next Time/Readings
68Next Time
- Guest Lecture Bob Glushko on XML and Document
Engineering - Readings on Class website
- No assigned discussion questions (but bring your
questions on the readings)