Title: Gitte Christensen
1Managing external data Part 1 Design of
Databases
- Gitte Christensen
- Dyalog Ltd
2Purpose
- To give you a crash course in data analysis and
databases - After part 1 Design of Databases you will be able
to analyse and organise data based on a
requirement spec or use case. - After part 2 Database programming you will be
able to use relational data in your APL
applications - After part 3 Database Implementation you will be
able to choose between different storage methods
based on structure and use of data and
performance considerations
3Agenda
- The Relational Model
- Entity/Relation model
- Convert E/R to table structure
- Relational Algebra
- Semistructured data
- Multidimensional data
4Data Models
- A Database models some portion of the real world.
- Data Model is link between users view of the
world and bits stored in computer. - We will concentrate on the Relational Model
5Data Models
- A data model is a collection of concepts for
describing data. - A database schema is a description of a
particular collection of data, using a given data
model. - The relational model of data is the most widely
used model today. - Main concept relation, basically a table with
rows and columns. - Every relation has a schema, which describes the
columns, or fields.
6Levels of Abstraction
Users
- Views describe how users see the data.
- Conceptual schema defines logical structure
- Physical schema describes the files and indexes
used. - (sometimes called the ANSI/SPARC model)
7Data Independence
- A Simple Idea Applications should be insulated
from how data is structured and stored. - Logical data independence Protection from
changes in logical structure of data. - Physical data independence Protection from
changes in physical structure of data.
8Entity-Relationship Model
9Purpose of E/R Model
- The E/R model allows us to sketch database
designs. - Kinds of data and how they connect.
- Not how data changes.
- Designs are pictures called entity-relationship
diagrams. - Later convert E/R designs to relational DB
designs.
10Entity Sets
- Entity thing or object.
- Entity set collection of similar entities.
- Similar to a class in object-oriented languages.
- Attribute property of (the entities of) an
entity set. - Attributes are simple values, e.g. integers or
character strings.
11E/R Diagrams
- In an entity-relationship diagram
- Entity set rectangle.
- Attribute oval, with a line to the rectangle
representing its entity set.
12Example
- Entity set Beers has two attributes, name and
manf (manufacturer). - Each Beers entity has values for these two
attributes, e.g. (Bud, Anheuser-Busch)
13Relationships
- A relationship connects two or more entity sets.
- It is represented by a diamond, with lines to
each of the entity sets involved.
14Example
15Relationship Set
- The current value of an entity set is the set
of entities that belong to it. - Example the set of all bars in our database.
- The value of a relationship is a set of lists
of currently related entities, one from each of
the related entity sets.
16Example
- For the relationship Sells, we might have a
relationship set like
Bar Beer Joes Bar Bud Joes Bar Miller Sues
Bar Bud Sues Bar Petes Ale Sues Bar Bud Lite
17Case Movie Database
- We want to create a movie database which will
allow our users to find information about movies - Each movie has a title, a production year, lenght
in minutes, whether it is color or b/w and an
owner, a studio - We have adresses for the studios and the actors
18EntityName
Draw a model of the Movies database using these
symbols
Relationship
AttriAbute
AttributeName
19Multiway Relationships
- Sometimes, we need a relationship that connects
more than two entity sets. - Suppose that drinkers will only drink certain
beers at certain bars. - Our three binary relationships Likes, Sells, and
Frequents do not allow us to make this
distinction. - But a 3-way relationship would.
20Example
name
addr
name
manf
Bars
Beers
license
Preferences
Drinkers
name
addr
21A Typical Relationship Set
Bar Drinker Beer Joes Bar Ann Miller Sues
Bar Ann Bud Sues Bar Ann Petes Ale Joes
Bar Bob Bud Joes Bar Bob Miller Joes
Bar Cal Miller Sues Bar Cal Bud Lite
22Case Movie Database
- In each movie there are actors who are contracted
by the studios - Add this relationship to your model
23Many-Many Relationships
- Focus binary relationships, such as Sells
between Bars and Beers. - In a many-many relationship, an entity of either
set can be connected to many entities of the
other set. - E.g., a bar sells many beers a beer is sold by
many bars.
24In Pictures
many-many
25Many-One Relationships
- Some binary relationships are many -one from one
entity set to another. - Each entity of the first set is connected to at
most one entity of the second set. - But an entity of the second set can be connected
to zero, one, or many entities of the first set.
26In Pictures
many-one
27Example
- Favorite, from Drinkers to Beers is many-one.
- A drinker has at most one favorite beer.
- But a beer can be the favorite of any number of
drinkers, including zero.
28One-One Relationships
- In a one-one relationship, each entity of either
entity set is related to at most one entity of
the other set. - Example Relationship Best-seller between entity
sets Manfs (manufacturer) and Beers. - A beer cannot be made by more than one
manufacturer, and no manufacturer can have more
than one best-seller (assume no ties).
29In Pictures
one-one
30Representing Multiplicity
- Show a many-one relationship by an arrow entering
the one side. - Show a one-one relationship by arrows entering
both entity sets. - Rounded arrow exactly one, i.e., each entity
of the first set is related to exactly one entity
of the target set.
31Example
Likes
Drinkers
Beers
Favorite
32Example
- Consider Best-seller between Manfs and Beers.
- Some beers are not the best-seller of any
manufacturer, so a rounded arrow to Manfs would
be inappropriate. - But a beer manufacturer has to have a best-seller.
33In the E/R Diagram
Best- seller
Manfs
Beers
34Case Movie Database
- Add arrows to your diagram so it reflects the
kind of relations between the entities
35Attributes on Relationships
- Sometimes it is useful to attach an attribute to
a relationship. - Think of this attribute as a property of tuples
in the relationship set.
36Example
Sells
Bars
Beers
price
Price is a function of both the bar and the
beer, not of one alone.
37Equivalent Diagrams Without Attributes on
Relationships
- Create an entity set representing values of the
attribute. - Make that entity set participate in the
relationship.
38Example
Sells
Bars
Beers
Note convention arrow from multiway
relationship all other entity sets together
determine a unique one of these.
Prices
price
39Roles
- Sometimes an entity set appears more than once in
a relationship. - Label the edges between the relationship and the
entity set with names called roles.
40Example
41Example
Relationship Set Buddy1 Buddy2 Bob
Ann Joe Sue Ann Bob Joe
Moe
Buddies
1
2
Drinkers
42Case Movie Database
- The actors can be contracted either by the studio
producing the movie or by another studio who
rents the actor to the producing studio - We would like to record what the actor is paid
for appearing in a movie - Update your model to reflect the new facts
43Subclasses
- Subclass special case fewer entities more
properties. - Example Ales are a kind of beer.
- Not every beer is an ale, but some are.
- Let us suppose that in addition to all the
properties (attributes and relationships) of
beers, ales also have the attribute color.
44Subclasses in E/R Diagrams
- Assume subclasses form a tree.
- I.e., no multiple inheritance.
- Isa triangles indicate the subclass relationship.
- Point to the superclass.
45Example
Beers
name
manf
isa
Ales
color
46Case Movie Database
- For some movies like cartoons we have a different
kind of actor, voices. - Design a subclass to reflect this fact
ISA
47E/R Vs. Object-Oriented Subclasses
- In OO, objects are in one class only.
- Subclasses inherit from superclasses.
- In contrast, E/R entities have representatives in
all subclasses to which they belong. - Rule if entity e is represented in a subclass,
then e is represented in the superclass.
48Example
Beers
name
manf
isa
Ales
color
49Keys
- A key is a set of attributes for one entity set
such that no two entities in this set agree on
all the attributes of the key. - It is allowed for two entities to agree on some,
but not all, of the key attributes. - We must designate a key for every entity set.
50Keys in E/R Diagrams
- Underline the key attribute(s).
- In an Isa hierarchy, only the root entity set has
a key, and it must serve as the key for all
entities in the hierarchy.
51Example name is Key for Beers
Beers
name
manf
isa
Ales
color
52Example a Multi-attribute Key
dept
number
hours
room
Courses
- Note that hours and room could also serve as a
- key, but we must select only one key.
53Case Movie Database
54Weak Entity Sets
- Occasionally, entities of an entity set need
help to identify them uniquely. - Entity set E is said to be weak if in order to
identify entities of E uniquely, we need to
follow one or more many-one relationships from E
and include the key of the related entities from
the connected entity sets.
55Example
- name is almost a key for football players, but
there might be two with the same name. - number is certainly not a key, since players on
two teams could have the same number. - But number, together with the team name related
to the player by Plays-on should be unique.
56In E/R Diagrams
name
name
number
Plays- on
Players
Teams
- Double diamond for supporting many-one
relationship. - Double rectangle for the weak entity set.
57Weak Entity-Set Rules
- A weak entity set has one or more many-one
relationships to other (supporting) entity sets. - Not every many-one relationship from a weak
entity set need be supporting. - The key for a weak entity set is its own
underlined attributes and the keys for the
supporting entity sets. - E.g., (player) number and (team) name is a key
for Players in the previous example.
58Case Movie Database
- We would like to record which camera crews shot a
particular movie - Camera crews are numbered within each studio
- Add these facts to your diagram
59Design Techniques
- Avoid redundancy.
- Limit the use of weak entity sets.
- Dont use an entity set when an attribute will do.
60Avoiding Redundancy
- Redundancy occurs when we say the same thing in
two or more different ways. - Redundancy wastes space and (more importantly)
encourages inconsistency. - The two instances of the same fact may become
inconsistent if we change one and forget to
change the other.
61Example Good
name
name
addr
ManfBy
Beers
Manfs
This design gives the address of each
manufacturer exactly once.
62Example Bad
name
name
addr
ManfBy
Beers
Manfs
manf
This design states the manufacturer of a beer
twice as an attribute and as a related entity.
63Example Bad
name
manf
manfAddr
Beers
This design repeats the manufacturers address
once for each beer and loses the address if there
are temporarily no beers for a manufacturer.
64Entity Sets Versus Attributes
- An entity set should satisfy at least one of the
following conditions - It is more than the name of something it has at
least one nonkey attribute. - or
- It is the many in a many-one or many-many
relationship.
65Example Good
name
name
addr
ManfBy
Beers
Manfs
- Manfs deserves to be an entity set because of
the nonkey attribute addr. - Beers deserves to be an entity set because it is
the many of the many-one relationship ManfBy.
66Example Good
name
manf
Beers
There is no need to make the manufacturer an
entity set, because we record nothing about
manufacturers besides their name.
67Example Bad
name
name
ManfBy
Beers
Manfs
Since the manufacturer is nothing but a name, and
is not at the many end of any relationship, it
should not be an entity set.
68Dont Overuse Weak Entity Sets
- Beginning database designers often doubt that
anything could be a key by itself. - They make all entity sets weak, supported by all
other entity sets to which they are linked. - In reality, we usually create unique IDs for
entity sets. - Examples include social-security numbers,
automobile VINs etc.
69When Do We Need Weak Entity Sets?
- The usual reason is that there is no global
authority capable of creating unique IDs. - Example it is unlikely that there could be an
agreement to assign unique player numbers across
all football teams in the world.
70Break
71How to translate ER Model to Relational Model
72Concepts
- Relational Model is made up of tables
- A row of table a relational
instance/tuple - A column of table an attribute
- A table a schema/relation
- Cardinality number of rows
- Degree number of columns
73Example
Attribute
Cardinality 2
tuple/relational instance
SID Name Major GPA
1234 John CS 2.8
5678 Mary EE 3.6
4 Degree
A Schema / Relation
74From ER Model to Relational Model
- So how do we convert an ER diagram into a
table?? Simple!! - Basic Ideas
- Build a table for each entity set
- Build a table for each relationship set if
necessary (more on this later) - Make a column in the table for each attribute in
the entity set - Indivisibility Rule and Ordering Rule
- Primary Key
75Example Strong Entity Set
SID
Name
SSN
Name
Advisor
Student
Professor
Dept
Major
GPA
SID Name Major GPA
1234 John CS 2.8
5678 Mary EE 3.6
SSN Name Dept
9999 Smith Math
8888 Lee CS
76Representation of Weak Entity Set
- Weak Entity Set Cannot exists alone
- To build a table/schema for weak entity set
- Construct a table with one column for each
attribute in the weak entity set - Remember to include discriminator
- Augment one extra column on the right side of the
table, put in there the primary key of the Strong
Entity Set (the entity set that the weak entity
set is depending on) - Primary Key of the weak entity set
Discriminator foreign key
77Example Weak Entity Set
Age
SID
Name
Name
Student
Children
owns
Major
GPA
Primary key of Children is Parent_SID Name
Age Name Parent_SID
10 Bart 1234
8 Lisa 5678
78Representation of Relationship Set
- --This is a little more complicated--
- Unary/Binary Relationship set
- Depends on the cardinality and participation of
the relationship - Two possible approaches
- N-ary (multiple) Relationship set
- Primary Key Issue
- Identifying Relationship
- No relational model representation necessary
79Representing Relationship SetUnary/Binary
Relationship
- For one-to-one relationship w/out total
participation - Build a table with two columns, one column for
each participating entity sets primary key. Add
successive columns, one for each descriptive
attributes of the relationship set (if any). - For one-to-one relationship with one entity set
having total participation - Augment one extra column on the right side of the
table of the entity set with total participation,
put in there the primary key of the entity set
without complete participation as per to the
relationship.
80Example One-to-One Relationship Set
Degree
SID
Name
ID Code
Student
Major
study
Major
GPA
Primary key can be either SID or Maj_ID_Co
SID Maj_ID Co S_Degree
9999 07 1234
8888 05 5678
81Example One-to-One Relationship Set
Condition
SID
Name
S/N
11 Relationship
Student
Laptop
Have
Major
GPA
Brand
SID Name Major GPA LP_S/N Hav_Cond
9999 Bart Economy -4.0 123-456 Own
8888 Lisa Physics 4.0 567-890 Loan
Primary key can be either SID or LP_S/N
82Representing Relationship SetUnary/Binary
Relationship
- For one-to-many relationship w/out total
participation - Same thing as one-to-one
- For one-to-many/many-to-one relationship with one
entity set having total participation on many
side - Augment one extra column on the right side of the
table of the entity set on the many side, put
in there the primary key of the entity set on the
one side as per to the relationship.
83Example Many-to-One Relationship Set
Semester
SID
Name
SSN
N1 Relationship
Advisor
Student
Professor
Major
GPA
Name
Dept
SID Name Major GPA Pro_SSN Ad_Sem
9999 Bart Economy -4.0 123-456 Fall 2006
8888 Lisa Physics 4.0 567-890 Fall 2005
Primary key of this table is SID
84Representing Relationship SetUnary/Binary
Relationship
- For many-to-many relationship
- Same thing as one-to-one relationship without
total participation. - Primary key of this new schema is the union of
the foreign keys of both entity sets. - No augmentation approach possible
85Representing Relationship SetN-ary Relationship
- Intuitively Simple
- Build a new table with as many columns as there
are attributes for the union of the primary keys
of all participating entity sets. - Augment additional columns for descriptive
attributes of the relationship set (if necessary) - The primary key of this table is the union of all
primary keys of entity sets that are on many
side - That is it, we are done.
86Example N-ary Relationship Set
P-Key1
D-Attribute
A-Key
E-Set 1
P-Key2
A relationship
Another Set
E-Set 2
P-Key3
E-Set 3
P-Key1 P-Key2 P-Key3 A-Key D-Attribute
9999 8888 7777 6666 Yes
1234 5678 9012 3456 No
Primary key of this table is P-Key1 P-Key2
P-Key3
87Representing Relationship SetIdentifying
Relationship
- This is what you have to know
- You DONT have to build a table/schema for the
identifying relationship set once you have built
a table/schema for the corresponding weak entity
set - Reason
- A special case of one-to-many with total
participation - Reduce Redundancy
88Representing Composite Attribute
- Relational Model Indivisibility Rule Applies
- One column for each component attribute
- NO column for the composite attribute itself
SSN
Name
Professor
SSN Name Street City
9999 Dr. Smith 50 1st St. Fake City
8888 Dr. Lee 1 B St. San Jose
Address
Street
City
89Representing Multivalue Attribute
- For each multivalue attribute in an entity
set/relationship set - Build a new relation schema with two columns
- One column for the primary keys of the entity
set/relationship set that has the multivalue
attribute - Another column for the multivalue attributes.
Each cell of this column holds only one value.
So each value is represented as an unique tuple - Primary key for this schema is the union of all
attributes
90Example Multivalue attribute
SID
Name
The primary key for this table is Student_SID
Children, the union of all attributes
Children
Student
Major
GPA
Stud_SID Children
1234 Johnson
1234 Mary
5678 Bart
5678 Lisa
5678 Maggie
SID Name Major GPA
1234 John CS 2.8
5678 Homer EE 3.6
91Representing Class Hierarchy
- Two general approaches depending on disjointness
and completeness - For non-disjoint and/or non-complete class
hierarchy - create a table for each super class entity set
according to normal entity set translation
method. - Create a table for each subclass entity set with
a column for each of the attributes of that
entity set plus one for each attributes of the
primary key of the super class entity set - This primary key from super class entity set is
also used as the primary key for this new table
92Example
SSN
Name
Person
SID
Status
Gender
ISA
Student
Major
GPA
SSN Name Gender
1234 Homer Male
5678 Marge Female
SSN SID Status Major GPA
1234 9999 Full CS 2.8
5678 8888 Part EE 3.6
93Case Movie Database
- Convert your E/R diagram to relational tables
94Relational Algebra
95Relational Algebra
- Relational Algebra is
- the formal description of how a relational
database operates - the mathematics which underpin SQL
operations. - Operators in relational algebra are not
necessarily the same as SQL operators, even if
they have the same name.
96Terminology
- Relation - a set of tuples.
- Tuple - a collection of attributes which describe
some real world entity. - Attribute - a real world role played by a named
domain. - Domain - a set of atomic values.
- Set - a mathematical definition for a collection
of objects which contains no duplicates.
97Operators - Write
- INSERT - provides a list of attribute values for
a new tuple in a relation. This operator is the
same as SQL. - DELETE - provides a condition on the attributes
of a relation to determine which tuple(s) to
remove from the relation. This operator is the
same as SQL. - MODIFY - changes the values of one or more
attributes in one or more tuples of a relation,
as identified by a condition operating on the
attributes of the relation. This is equivalent to
SQL UPDATE.
98Operators - Retrieval
- There are two groups of operations
- Mathematical set theory based relations UNION,
INTERSECTION, DIFFERENCE, and CARTESIAN PRODUCT. - Special database operations SELECT (not the
same as SQL SELECT), PROJECT, and JOIN.
99Relational SELECT
- SELECT is used to obtain a subset of the tuples
of a relation that satisfy a select condition. - For example, find all employees born after 1st
Jan 1950 - SELECT dob gt 01/JAN/1950 (employee)
100Relational PROJECT
- The PROJECT operation is used to select a subset
of the attributes of a relation by specifying the
names of the required attributes. - For example, to get a list of all employees
surnames and employee numbers - PROJECT surname,empno (employee)
101SELECT and PROJECT
SELECT and PROJECT can be combined together. For
example, to get a list of employee numbers for
employees in department number 1
102Set Operations - semantics
- Consider two relations R and S.
- UNION of R and Sthe union of two relations is a
relation that includes all the tuples that are
either in R or in S or in both R and S. Duplicate
tuples are eliminated. - INTERSECTION of R and Sthe intersection of R and
S is a relation that includes all tuples that are
both in R and S. - DIFFERENCE of R and Sthe difference of R and S
is the relation that contains all the tuples that
are in R but that are not in S.
103SET Operations - requirements
- For set operations to function correctly the
relations R and S must be union compatible. Two
relations are union compatible if - they have the same number of attributes
- the domain of each attribute in column order is
the same in both R and S.
104UNION Example
105INTERSECTION Example
106DIFFERENCE Example
107CARTESIAN PRODUCT
- The Cartesian Product is also an operator which
works on two sets. It is sometimes called the
CROSS PRODUCT or CROSS JOIN. - It combines the tuples of one relation with all
the tuples of the other relation.
108CARTESIAN PRODUCT Example
109JOIN Operator
- JOIN is used to combine related tuples from two
relations - In its simplest form the JOIN operator is just
the cross product of the two relations. - As the join becomes more complex, tuples are
removed within the cross product to make the
result of the join more meaningful. - JOIN allows you to evaluate a join condition
between the attributes of the relations on which
the join is undertaken. - The notation used is
- R JOIN join condition S
110JOIN Example
111Natural Join
- Invariably the JOIN involves an equality test,
and thus is often described as an equi-join. Such
joins result in two attributes in the resulting
relation having exactly the same value. A
natural join will remove the duplicate
attribute(s). - In most systems a natural join will require that
the attributes have the same name to identify the
attribute(s) to be used in the join. This may
require a renaming mechanism. - If you do use natural joins make sure that the
relations do not have two attributes with the
same name by accident.
112OUTER JOINs
- Notice that much of the data is lost when
applying a join to two relations. In some cases
this lost data might hold useful information. An
outer join retains the information that would
have been lost from the tables, replacing missing
data with nulls. - There are three forms of the outer join,
depending on which data is to be kept. - LEFT OUTER JOIN - keep data from the left-hand
table - RIGHT OUTER JOIN - keep data from the right-hand
table - FULL OUTER JOIN - keep data from both tables
113OUTER JOIN Example 1
114OUTER JOIN Example 2
115Semistructured data
116title year length filmType studioName starName
Star Wars 1977 124 color Fox Carrie Fisher
Star Wars 1977 124 color Fox Mark Hamill
Star Wars 1977 124 color Fox Harrison Ford
Mighty Ducks 1991 104 color Disney Emilio Estevez
Wayne's World 1992 95 color Paramount Dana Carvey
Wayne's World 1992 95 color Paramount Mike Meyers
117Root
star
movie
star
starIn
mh
sw
cf
starOf
year
name
title
city
address
address
name
street
Star Wars
1977
Mark Hamill
Oak
street
Brentwood
Carrie Fisher
street
city
city
starOf
Maple
Hollywood
Locust
Malibu
starIn
118Multidimensional data
119(No Transcript)
120End of Part 1