Title: LOGICAL SCHEMA DESIGN
1COMP 332PRINCIPLES OFDATABASE DESIGN
2LOGICAL SCHEMA DESIGN OUTLINE
- Approach to Logical Schema Design
- Mapping Generalization Hierarchies
- Relational Model
- ER to Relational Mapping
- Reverse Engineering for Database Design
3INTRODUCTION 4 BCN 11
- What? Translate the conceptual schema into a
logical schema tailored to a specific target
DBMS - source data model entity-relationship
- target data models relational, network,
hierarchical,
- Goal to obtain a representation that uses as
efficiently as possible the facilities for
structuring data and modelling constraints
available in the target data model - Common characteristics (deficiencies) of most
target data models - lack of abstraction features no generalization
supported - lack of enforcing some constraints existence
often not enforced - reducing schema transformations
4ONE-PHASE APPROACH TO LOGICAL DESIGN BCN 11.1
conceptualschema
targetdata model
databaseload
performancecriteria
- populationof the DB
- queries typefrequency
DBMS-dependent Logical Design
- response time
- storage used
- CPU utilization
- I/O time
logical schema intarget data model
5TWO-PHASE APPROACH TO LOGICAL DESIGN
conceptualschema
targetdata model
databaseload
performancecriteria
- populationof the DB
- queries typefrequency
DBMS-independent Logical Design
conceptual-to-logicalschema design choices
designerpreferences
DBMS-dependent Logical Design
- response time
- storage used
- CPU utilization
- I/O time
logical schema intarget data model
6TWO-PHASE APPROACH TO LOGICAL DESIGN
- DBMS-independent logical design
- common to all DBMSs
- conceptual schema is simplified to make
subsequent phase easier - essentially we remove constructs not
availablein the target data model(s) - DBMS-dependent logical design
- governed by modelling features available in the
target data model(s) - both phases have room for design choices
7DATABASE LOAD
- How can we deal with database load considerations
in the DBMS-independent phase?
HARD!
- wish to account for the main load on the database
- measure effect of the most important database
operations - for each database operation we consider
- their structure (i.e., navigation schemas)
- their frequency
- the number of instances of entities and
relationships visited - perform some obvious transformations that will
help the subsequent DBMS-dependent phase
8GENERALIZATION HIERARCHIES BCN 11.4
Esuper
U
U
U
. . .
Esub2
Esub1
Esubn
- Cannot be represented in most current commercial
DBMSs - Need to use just entities and relationships
- Considerations
- attribute inheritance from super-entity to
sub-entities must be accounted for - implicit IS_A relationship must be captured
9GENERALIZATION HIERARCHIES (contd)
- Option 1 retain super-entity delete
sub-entities - Algorithm
- Propagate attributes of Esub1, Esub2, , Esubn
into Esuper - Add a new attribute (discriminator attribute) da
- for disjoint generalization, da is single-valued
- for overlapping generalization, da is
multi-valued - value for da indicates to which sub-entity an
instance belongs - Adjust any relationship that related to Esubi
to now relate to Esuper with min-card0
Esuper U Esub1 U Esub2 U U Esubn U da
10GENERALIZATION HIERARCHIES (contd)
- Option 1 retain super-entity delete sub-entities
da
11GENERALIZATION HIERARCHIES (contd)
- Option 1 retain super-entity delete
sub-entities - Advantages
- simplest solution no relationships are needed
- applies to all types of generalizations
(complete, incomplete, disjoint, overlapping) - Disadvantages
- may generate a large number of null values for
attributes that apply only to sub-entities - operations that access only sub-entities now need
to search entire set of instances for all
super/sub-entities
12GENERALIZATION HIERARCHIES (contd)
- Option 2 retain sub-entities delete
super-entity - only for disjoint, complete generalizations!
- Algorithm
- Propagate attributes of Esuper into each of
Esub1, Esub2, , Esubn - Adjust relationship types that related to Esuper
to now relate to each Esubi - Delete Esuper
13GENERALIZATION HIERARCHIES (contd)
- Option 2 retain sub-entities delete
super-entity - only for disjoint, complete generalizations!
14GENERALIZATION HIERARCHIES (contd)
- Option 2 retain sub-entities delete
super-entity - only for disjoint, complete generalizations!
- Advantages
- common concept of super-entity is not needed in
the logical design - Disadvantages
- only practical for disjoint, complete
generalizations - sub-entity relationship is lost
- cannot tell that sub-entities are a
specialization of a super-entity - operations that access only super-entity must now
access all sub-entities - relationships that relate to super-entity must be
repeated for all sub-entities
15GENERALIZATION HIERARCHIES (contd)
- Option 3 establish relationships among
super-entity and sub-entities - Algorithm
- Change each super-entity/sub-entity relationship
into a relationship that relates Esuper to each
Esubn - Connectivity for super-entity is (0,1) for all
relationships with sub-entities - Connectivity for each sub-entity is (1,1)
- Make a sub-entity a weak entity if it does not
have a key of its own - Delete the generalization hierarchy
Esuper
1
1
1
Is-Esub1
Is-Esub2
Is-Esubn
1
1
1
. . .
Esub1
Esub2
Esubn
16GENERALIZATION HIERARCHIES (contd)
- Option 3 establish relationships among
super-entity and sub-entities
17GENERALIZATION HIERARCHIES (contd)
- Option 3 establish relationships among
super-entity and sub-entities - Advantages
- most general solution always feasible
- it can model all four types of coverage of
generalizations - Disadvantages
- resulting schema is quite complex
- inserting a new instance of a sub-entity requires
inserting a super-entity instance (maybe) and a
relationship instance with the super-entity - there is redundancy (at the conceptual level) in
using a relationship to represent each IS_A
relationship
18GENERALIZATION MAPPING STRATEGY
- Consider retrieval operations on super-entity and
sub-entities - Set 1 operations that use Esuper regardless of
how instances of Esuper divide into Esub1,
Esub2, , Esubn - Select Option 1 or Option 3
- Set 2 operations that use a combination of some
Esuper attributes and attributes from only Esub1
or Esub2 or Esubn - Select Option 2 (if disjoint coverage)
- If Set 1 predominant, further consider update
operations - Operations that use attributes of both Esuper and
some (one or more ) Esub1, Esub2, , Esubn
together - Select Option 1
- Operations that use either attributes of both
Esuper or Esub1, Esub2, , Esubn (but never
together) - Select Option 3
19OTHER LOGICAL DESIGN CONSIDERATIONS
- Derived attributes
- store versus calculate
- retrieval operations benefit from storing it
- update operations benefit from calculating it
- Need to balance these two conflicting needs!
writes more expensive than reads!
- Primary key selection
- Select as primary key the candidate key that is
used for direct access by the maximum number of
operations - Prefer
- simple keys to multi-attribute keys
- internal key to external key
- Primary keys are kept minimal in size and
simple in structure
20RELATIONAL DATA MODEL Appendix A BCN 12
- Based on the mathematical theory of relations
- R D1 X D2 X X Dn
- Di are sets and are called the domains of R
- (database) relation over domains D1, D2, , Dn
is a time-varying, finite set of n-tuples where
the i-th component of each tuple is a value in
domain Di - (n-)tuple an ordered sequence of n values (v1,
v2, , vn) such that vi Î Di or null - degree of a relation number of domains
- cardinality of a relation number of tuples
21RELATIONAL DATA MODEL (contd)
- relation intension relation scheme/relational
schema - WARD(ward, name, -of-beds)
- STAFF(employee, name, address, HKID, duty,
ward) - relation extensiontables(s)
relation scheme
relational schema
22RELATIONAL DATA MODEL CONSTRAINTS
- key
- candidate key/primary key
- entity integrity
- If X is a primary key of R, then X cannot contain
null values. - referential integrity
- foreign key given two relations R, S, relation R
may reference S via a set of attributes that
forms the primary key of S - The attributes in R are called a foreign key.
- The value of the foreign key in a tuple R must
either be equal to the primary key of a tuple in
S or be entirely null.
23ER TO RELATIONAL TRANSFORMATION
- Need to transform
- entities regular (strong)
- weak (note discriminator required if
max-cardgt1) - relationships degree
- connectivity
- Notation
- delete(R) - delete all matching occurrences
(cascading delete) according to
primary/foreign key value - - only delete matching tuples if this is
the last tuple being deleted - exists(R) - there exists an instance of R with a
matching primary/ foreign key value or such an
instance must be inserted - set null(FKT(R)) - set the corresponding values
of the foreign key of T in R to null - null/not null(FKT(R)) - the foreign key of T in R
allows/does not allow null values
24ER TO RELATIONAL TRANSFORMATION EXISTENCE
CONSTRAINTS
- mandatory existence
- On delete delete(R)
- On insert exists(R)
- not null(FKT(R))
- optional existence o
- On delete no action
- On insert no action
- null(FKT(R)
25EXAMPLE HOSPITAL ER SCHEMA
26EXAMPLE HOSPITAL ER SCHEMA (PRE-PROCESSED)
27GENERAL RESULT OF TRANSFORMATION
- 1. an entity relation with the same information
content as the original entity - entities with 11, 1N (on the one (parent) side)
or NM binary relationships - entities with NM binary recursive relationships
- entities with ternary/higher degree relationships
or a generalization hierarchy - 2. an entity relation with the embedded foreign
key of the parent entity - entities with 1N binary relationship (on the N
(child) side) - entities with 11 binary relationships for one of
the entities - entities with 11 or 1N binary recursive
relationships - 3. a relationship relation with the foreign
keys of all the entities in the relationship - binary NM, binary recursive NM or
ternary/higher degree relationships
28ER TO RELATIONAL TRANSFORMATION OVERVIEW
- STEP 1 Complex Attributes
- STEP 2 Regular (Strong) Entities
- STEP 3 11 Relationships
- Option 1 One relation
- Option 2 Two relations
- Option 3 Three relations
- STEP 4 1N Relationships
- Option 1 Two relations
- Option 2 Three relations
- STEP 5 NM Relationships
- STEP 6 N-ary Relationships
- STEP 7 Weak Entities
- STEP 8 Recursive Relationships
- STEP 9 Multi-valued Attributes
29STEP 1 COMPLEX ATTRIBUTES
- For each complex attribute A in entity S or
relationship R - Option 1 multiple attributes (consider all
components as individual attributes of S or R)
lose relatedness of attributes
Option 2 single attribute (consider the entire
complex attribute as a single attribute of S or
R)
lose parts of attribute
30STEP 1 COMPLEX ATTRIBUTES (contd)
- Option 3 create a new entity (consider the
complex attribute as a new entity)
can be strong or weak
can be optional or mandatory
- new entity is strong or weak depending on whether
it has an internal key or an external key - existence with original entity is optional or
mandatory depending on whether the attribute can
be null or not
31STEP 2 REGULAR (STRONG) ENTITIES
- For each regular (strong) entity S
- create a relation RS with all attributes of S
- the primary key of S becomes the primary key of
RS
Hospital example
Person(HKID, name, address)
Person
Staff(employee, duty)
Staff
Ward(ward, name, -of-beds)
Ward
Patient(registration, birthdate, sex)
Patient
Doctor(doctor, specialty)
Doctor
32STEP 3 BINARY 11 RELATIONSHIPS
- For each binary 11 relationship R
- identify the relations RS and RT that correspond
to the entities S and T participating in R - Option 1 one relation (mandatory existence both
sides only) - merge RS, RT and attributes of R into one
relation RST - delete RS and RT
- if RS and RT have the same primary key, then this
becomes the primary key of RST - if RS and RT have different primary keys, then
choose one as the primary key of RST the other
primary key cannot be null
RS(KS, ...)
A
RST( )
KS, ,
A
KT, ,
RT(KT, ...)
33STEP 3 BINARY 11 RELATIONSHIPS (contd)
- Option 2 two relations (using null values)
- include attributes of R in either RS or RT
- include the primary key of RS in RT, RT in RS, or
in each other as foreign key attributes - Note If we choose to include the primary key in
only one relation, then we choose (a) the
relation whose entitys participation is
mandatory, if any, as this minimizes null
values - (b) the relation with the fewest tuples.
RS(KS , )
KT
A,
RT(KT , )
KS
A,
34STEP 3 BINARY 11 RELATIONSHIPS (contd)
- Integrity constraints
- Mandatory (both sides)
- On delete RS delete(RT)
- RT delete(RS)
- On insert RS exists(RT)
- RT exists(RS)
- not null(FKS(RT))
Optional (one side) On delete RS
delete(RT) RT no action On insert RS no
action RT exists(RS) not null(FKS(RT))
35STEP 3 BINARY 11 RELATIONSHIPS (contd)
Optional (both sides T has fewest tuples) On
delete RS set null(FKS(RT)) RT no
action On insert RS no action RT no
action null(FKS(RT))
36STEP 3 BINARY 11 RELATIONSHIPS (contd)
- Option 3 three relations (minimizes null values
optional existence only) - create a new relation RR
- include attributes of R, if any, as attributes of
RR - include as foreign key attributes in RR the
primary keys of relations RS and RT that
represent the participating entities S and T in R - the primary key of RR is the primary key of
either RS or RT
RS(KS, )
RR( )
KT,
KS,
KT,
KS,
A
RT(KT, )
37STEP 3 BINARY 11 RELATIONSHIPS (contd)
On delete RS delete(RR) RT
delete(RR) RR no action On insert RS no
action RT no action RR exists(RS) and
exists(RT) not null(FKT(RR))
38STEP 3 BINARY 11 RELATIONSHIPS (contd)
39STEP 3 BINARY 11 RELATIONSHIPS (contd)
Hospital example
40STEP 4 BINARY 1N RELATIONSHIPS
- For each binary 1N relationship R
- identify the relation RS that corresponds to the
entity S that participates on the N-side of the
relationship - Option 1 two relations
- include attributes of R in RS
- include as foreign key in RS the primary key of
the relation RT that represents the entity T that
participates on the 1-side of the relationship
RS(KS, ... )
, A,
KT
RT(KT, ...)
41STEP 4 BINARY 1N RELATIONSHIPS (contd)
- Integrity constraints
- Mandatory (both sides)
- On delete RS delete(RT)
- RT delete(RS)
- On insert RS exists(RT)
- RT exists(RS)
- not null(FKT(RS))
Optional (N-side) On delete RS no
action RT delete(RS) On insert RS
exists(RT) RT no action not null(FKT(RS))
42STEP 4 BINARY 1N RELATIONSHIPS (contd)
- Optional (1-side)
- On delete RS delete(RT)
- RT set null(FKT(RS))
- On insert RS no action
- RT exists(RS)
- null(FKT(RS))
Optional (both sides) On delete RS no
action RT set null(FKT(RS)) On insert RS
no action RT no action null(FKT(RS))
43STEP 4 BINARY 1N RELATIONSHIPS (contd)
- Option 3 three relations (minimizes null values
optional existence only) - create a new relation RR
- include attributes of R, if any, as attributes of
RR - include as foreign key attributes in RR the
primary keys of relations RS and RT that
represent the participating entities S and T in R - the primary key of RR is the primary key of the
relation that participates on the N-side of R
RS(KS, )
RR( )
KS,
KT,
A
RT(KT, )
44STEP 4 BINARY 1N RELATIONSHIPS (contd)
On delete RS delete(RR) RT
delete(RR) RR no action On insert RS no
action RT no action RR exists(RS) and
exists(RT) not null(FKT(RR))
45STEP 4 BINARY 1N RELATIONSHIPS (contd)
Hospital example
46STEP 5 BINARY NM RELATIONSHIPS
- For each binary NM relationship R
- create a new relation RR
- include attributes of R, if any, as attributes of
RR - include as foreign key attributes in RR the
primary keys of relations RS and RT that
represent the participating entities S and T in R - the primary key of RR is the combination of the
foreign key attributes of both RS and RT
RS(KS, )
A
RR( )
KS, KT,
KS,
RT(KT, )
47STEP 5 BINARY NM RELATIONSHIPS (contd)
- Integrity constraints
- Mandatory (both sides)
- On delete RS delete(RR)
- RT delete(RR)
- RR delete(RS) and delete(RT)
- On insert RS exists(RR)
- RT exists(RR)
- RR exists(RS) and exists(RT)
Optional (N-side) On delete RS
delete(RR) RT delete(RR) RR
delete(RS) On insert RS exists(RR) RT
no action RR exists(RS) and exists(RT)
48STEP 5 BINARY NM RELATIONSHIPS (contd)
- Optional (M-side)
- On delete RS delete(RR)
- RT delete(RR)
- RR delete(RT)
- On insert RS no action
- RT exists(RR)
- RR exists(RS) and exists(RT)
RS(KS )
RR(KS, KT, A)
RT(KT )
Optional (both sides) On delete RS
delete(RR) RT delete(RR) RR no
action On insert RS no action RT no
action RR exists(RS) and exists(RT)
49STEP 5 BINARY NM RELATIONSHIPS (contd)
Hospital example
50STEP 6 N-ARY RELATIONSHIPS
- For each n-ary relationship R, ngt2
- create a new relation RR
- include attributes of R, if any, as attributes of
RR - include as foreign key attributes in RR the
primary keys of the relations that represent the
participating entities in R - the primary key of RR is the combination of the
foreign key attributes except if one (or more) of
the connectivity constraints is max-card1, then
the foreign key of the entity that participates
in a max-card1 relationship can be the key of RR
51STEP 6 N-ARY RELATIONSHIPS (contd)
RS(KS )
RT(KT )
RU(KU )
RR( )
KT,
KS,
A
KU,
- The primary key of RR depends on the
connectivity of each entity
- Integrity constraints
- The general rules apply to each participating
entity.
52STEP 7 WEAK ENTITIES
- For each weak entity T with strong entity S
- create a relation RT with attributes of T
- include attributes of R in RT
- include as foreign key attributes the primary key
attributes of S - the primary key of RT is a combination of the
primary key of S and the discriminator da of T,
if any (note discriminator is required if
max-card gt 1) - integrity constraints depend on the connectivity
of the relationship R except that the existence
of T in R is always mandatory
RS(KS )
RT( . . ., da, )
KS,
A
53STEP 7 WEAK ENTITIES (contd)
54STEP 8 RECURSIVE RELATIONSHIPS
- For each recursive relationship R of entity S
- create a new relation RR
- include attributes of R, if any, as attributes of
RR - include as foreign key attributes in RR two
copies of the primary key of RS - the names of these two attributes correspond to
the roles of S in R - the primary key of RR is either one or both of
the foreign key attributes depending on the
connectivity of the relationship
role 1
role 2
RS1(KS1 )
RR( )
KS1,
A
KS2,
RS2(KS2 )
55STEP 9 MULTI-VALUED ATTRIBUTES
- For each multi-valued attribute A in relation RS
- create a new relation RA
- include an attribute corresponding to A
- include the primary key of RS as foreign key
attribute(s) - the primary key of RA is the combination of A and
the foreign key from RS - Integrity constraints
- Nulls not allowed Nulls allowed
- On delete RA not allowed RA no action
- RS delete(RA) RS delete(RA)
- On insert RA exists(RS) RA exists(RS)
- RS exists(RA) RS no action
56FINAL HOSPITAL EXAMPLE RELATIONAL SCHEMA
Person(HKID, name, address) On delete Person
delete(Staff) and delete(Patient) and
delete(Doctor) Staff(employee, duty, HKID,
ward) On insert Staff exists(Person) and
exists(Ward) not null(HKID) not
null(ward) Ward(ward, name, -of-beds) On
delete Ward delete(Staff) and set
null(ward) Patient(registration, birthdate,
sex, HKID, ward, bed) On delete Patient
delete(Attends) and delete(Diagnosis) On
insert Patient exists(Person) and
exists(Attends) and exists(Diagnosis) not
null(HKID) null(ward)
57FINAL HOSPITAL EXAMPLE RELATIONAL SCHEMA
Doctor(doctor, specialty, HKID) On
delete Doctor delete(Attends) On
insert Doctor exists(Person) not
null(HKID) Attends(registration, doctor) On
delete Attends delete(Patient) On
insert Attends exists(Patient) and
exists(Doctor) Diagnosis(registration,
diagnosis-type, complications) On
delete Diagnosis delete(Patient) On
insert Diagnosis exists(Patient)
58EXAMPLE ER TO RELATIONAL TRANSFORMATION SOS
SCHEMA
1
Takes
Sales-clerk
N
1
N
N
1
Order
Customer
Order-item
Places
A
N
d
U
U
Is-for
Phone-order
Fax-order
Order(date, time) Customer(customer-id, name,
billing-address, delivery-address, telephone,
fax) Item(item, description, price) Warehouse(wa
rehouse-id, address, telephone,
fax) Order-item(quantity, price) Stocked-in(quant
ity-on-hand) Country(name, size,
volume) Sales-clerk(employee-id, name)
1
Item
N
Stocked-in
M
N
1
Warehouse
Country
Located-in
59REVERSE ENGINEERING FOR DATABASE DESIGN BCN 12.5
- What?
- Extraction of the abstract (data structure) from
the concrete (data instances)
- Why?
- many databases implemented according tofile
systems or older DBMS - DBMSs often designed in an ad hoc manner
- 1. To get a good conceptual understanding of the
data stored by existing applications - 2. To redo the logical and physical design for a
database - 3. To convert from older DBMS technology
(hierarchical, network) to newer technology
(relational, object-oriented)
60RELATIONAL TO ER TRANSFORMATION
- Assumptions
- Relations are normalized (3NF or BCNF)
- some 2NF relations are O.K. if they do not cause
update anomalies - Primary key and foreign key attributes are
(re-)named to make referential integrity explicit - Attributes with the same name have identical
domains and meaning - All relations have primary keys
- candidate keys may also be specified
61CLASSIFCATION OF RELATIONS
- Primary relation
- The primary key contains no other key of another
relation
- Weak primary relation
- The primary key K1 of a relation R1 contains the
primary key K2 of a relation R2 - R1 is the weak primary relation
- Secondary relation
- The primary key is a concatenation of the primary
keys of other relations - Not an exhaustive list just most commonly
occurring
62RELATIONAL TO ER TRANSFORMATION STEPS
- Step 1 Preprocess and classify relations
- Step 2 Interchange primary and candidate keys,
if necessary - If the primary key of a relation R1 matches the
candidate key(s) of one or more relations R2, R3,
, Rn, we designate those candidate keys as the
new primary keys - Allows relationships that depend on primary
keys to be identified - Step 3 Assign appropriate names to parts of keys
that occur in secondary relations to remove
ambiguity - Allows relationships to be more easily
identified - Step 4 Transform primary relations into entities
63RELATIONAL TO ER TRANSFORMATION STEPS (contd)
- Step 5 Transform weak primary relations into
weak entities - Step 6 Detect generalization relationships among
entities - On the basis of external knowledge about entities
- Step 7 Transform secondary relations into
relationships - Extra attributes become relationship attributes
- Connectivities needs to be specified by the
database designer - Step 8 Transform referential integrity
constraints of non-key attributes into
relationships - Relationship is either 11 or 1N
- Step 9 Transform unclassified relations
- Need to handled on a case by case basis
64HOSPITAL EXAMPLE RELATIONAL SCHEMA
Person(HKID, name, address) Staff(employee,
duty, HKID, ward) Ward(ward, name,
-of-beds) Patient(registration, birthdate, sex,
HKID, ward, bed) Doctor(doctor, specialty,
HKID) Diagnosis(patient, diagnosis-type,
complications) Attends(patient, doctor)
primary primary primary primary primary
weak primary secondary
65REVERSE ENGINEERED HOSPITAL ER SCHEMA