Gitte Christensen - PowerPoint PPT Presentation

About This Presentation
Title:

Gitte Christensen

Description:

Managing external data Part 1 Design of Databases Gitte Christensen Dyalog Ltd – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 120
Provided by: Gitte4
Category:

less

Transcript and Presenter's Notes

Title: Gitte Christensen


1
Managing external data Part 1 Design of
Databases
  • Gitte Christensen
  • Dyalog Ltd

2
Purpose
  • To give you a crash course in data analysis and
    databases
  • After part 1 Design of Databases you will be able
    to analyse and organise data based on a
    requirement spec or use case.
  • After part 2 Database programming you will be
    able to use relational data in your APL
    applications
  • After part 3 Database Implementation you will be
    able to choose between different storage methods
    based on structure and use of data and
    performance considerations

3
Agenda
  • The Relational Model
  • Entity/Relation model
  • Convert E/R to table structure
  • Relational Algebra
  • Semistructured data
  • Multidimensional data

4
Data Models
  • A Database models some portion of the real world.
  • Data Model is link between users view of the
    world and bits stored in computer.
  • We will concentrate on the Relational Model

5
Data Models
  • A data model is a collection of concepts for
    describing data.
  • A database schema is a description of a
    particular collection of data, using a given data
    model.
  • The relational model of data is the most widely
    used model today.
  • Main concept relation, basically a table with
    rows and columns.
  • Every relation has a schema, which describes the
    columns, or fields.

6
Levels of Abstraction
Users
  • Views describe how users see the data.
  • Conceptual schema defines logical structure
  • Physical schema describes the files and indexes
    used.
  • (sometimes called the ANSI/SPARC model)

7
Data Independence
  • A Simple Idea Applications should be insulated
    from how data is structured and stored.
  • Logical data independence Protection from
    changes in logical structure of data.
  • Physical data independence Protection from
    changes in physical structure of data.

8
Entity-Relationship Model
9
Purpose of E/R Model
  • The E/R model allows us to sketch database
    designs.
  • Kinds of data and how they connect.
  • Not how data changes.
  • Designs are pictures called entity-relationship
    diagrams.
  • Later convert E/R designs to relational DB
    designs.

10
Entity Sets
  • Entity thing or object.
  • Entity set collection of similar entities.
  • Similar to a class in object-oriented languages.
  • Attribute property of (the entities of) an
    entity set.
  • Attributes are simple values, e.g. integers or
    character strings.

11
E/R Diagrams
  • In an entity-relationship diagram
  • Entity set rectangle.
  • Attribute oval, with a line to the rectangle
    representing its entity set.

12
Example
  • Entity set Beers has two attributes, name and
    manf (manufacturer).
  • Each Beers entity has values for these two
    attributes, e.g. (Bud, Anheuser-Busch)

13
Relationships
  • A relationship connects two or more entity sets.
  • It is represented by a diamond, with lines to
    each of the entity sets involved.

14
Example
15
Relationship Set
  • The current value of an entity set is the set
    of entities that belong to it.
  • Example the set of all bars in our database.
  • The value of a relationship is a set of lists
    of currently related entities, one from each of
    the related entity sets.

16
Example
  • For the relationship Sells, we might have a
    relationship set like

Bar Beer Joes Bar Bud Joes Bar Miller Sues
Bar Bud Sues Bar Petes Ale Sues Bar Bud Lite
17
Case Movie Database
  • We want to create a movie database which will
    allow our users to find information about movies
  • Each movie has a title, a production year, lenght
    in minutes, whether it is color or b/w and an
    owner, a studio
  • We have adresses for the studios and the actors

18
EntityName
Draw a model of the Movies database using these
symbols
Relationship
AttriAbute
AttributeName
19
Multiway Relationships
  • Sometimes, we need a relationship that connects
    more than two entity sets.
  • Suppose that drinkers will only drink certain
    beers at certain bars.
  • Our three binary relationships Likes, Sells, and
    Frequents do not allow us to make this
    distinction.
  • But a 3-way relationship would.

20
Example
name
addr
name
manf
Bars
Beers
license
Preferences
Drinkers
name
addr
21
A Typical Relationship Set
Bar Drinker Beer Joes Bar Ann Miller Sues
Bar Ann Bud Sues Bar Ann Petes Ale Joes
Bar Bob Bud Joes Bar Bob Miller Joes
Bar Cal Miller Sues Bar Cal Bud Lite
22
Case Movie Database
  • In each movie there are actors who are contracted
    by the studios
  • Add this relationship to your model

23
Many-Many Relationships
  • Focus binary relationships, such as Sells
    between Bars and Beers.
  • In a many-many relationship, an entity of either
    set can be connected to many entities of the
    other set.
  • E.g., a bar sells many beers a beer is sold by
    many bars.

24
In Pictures
many-many
25
Many-One Relationships
  • Some binary relationships are many -one from one
    entity set to another.
  • Each entity of the first set is connected to at
    most one entity of the second set.
  • But an entity of the second set can be connected
    to zero, one, or many entities of the first set.

26
In Pictures
many-one
27
Example
  • Favorite, from Drinkers to Beers is many-one.
  • A drinker has at most one favorite beer.
  • But a beer can be the favorite of any number of
    drinkers, including zero.

28
One-One Relationships
  • In a one-one relationship, each entity of either
    entity set is related to at most one entity of
    the other set.
  • Example Relationship Best-seller between entity
    sets Manfs (manufacturer) and Beers.
  • A beer cannot be made by more than one
    manufacturer, and no manufacturer can have more
    than one best-seller (assume no ties).

29
In Pictures
one-one
30
Representing Multiplicity
  • Show a many-one relationship by an arrow entering
    the one side.
  • Show a one-one relationship by arrows entering
    both entity sets.
  • Rounded arrow exactly one, i.e., each entity
    of the first set is related to exactly one entity
    of the target set.

31
Example
Likes
Drinkers
Beers
Favorite
32
Example
  • Consider Best-seller between Manfs and Beers.
  • Some beers are not the best-seller of any
    manufacturer, so a rounded arrow to Manfs would
    be inappropriate.
  • But a beer manufacturer has to have a best-seller.

33
In the E/R Diagram
Best- seller
Manfs
Beers
34
Case Movie Database
  • Add arrows to your diagram so it reflects the
    kind of relations between the entities

35
Attributes on Relationships
  • Sometimes it is useful to attach an attribute to
    a relationship.
  • Think of this attribute as a property of tuples
    in the relationship set.

36
Example
Sells
Bars
Beers
price
Price is a function of both the bar and the
beer, not of one alone.
37
Equivalent Diagrams Without Attributes on
Relationships
  • Create an entity set representing values of the
    attribute.
  • Make that entity set participate in the
    relationship.

38
Example
Sells
Bars
Beers
Note convention arrow from multiway
relationship all other entity sets together
determine a unique one of these.
Prices
price
39
Roles
  • Sometimes an entity set appears more than once in
    a relationship.
  • Label the edges between the relationship and the
    entity set with names called roles.

40
Example
41
Example
Relationship Set Buddy1 Buddy2 Bob
Ann Joe Sue Ann Bob Joe
Moe
Buddies
1
2
Drinkers
42
Case Movie Database
  • The actors can be contracted either by the studio
    producing the movie or by another studio who
    rents the actor to the producing studio
  • We would like to record what the actor is paid
    for appearing in a movie
  • Update your model to reflect the new facts

43
Subclasses
  • Subclass special case fewer entities more
    properties.
  • Example Ales are a kind of beer.
  • Not every beer is an ale, but some are.
  • Let us suppose that in addition to all the
    properties (attributes and relationships) of
    beers, ales also have the attribute color.

44
Subclasses in E/R Diagrams
  • Assume subclasses form a tree.
  • I.e., no multiple inheritance.
  • Isa triangles indicate the subclass relationship.
  • Point to the superclass.

45
Example
Beers
name
manf
isa
Ales
color
46
Case Movie Database
  • For some movies like cartoons we have a different
    kind of actor, voices.
  • Design a subclass to reflect this fact

ISA
47
E/R Vs. Object-Oriented Subclasses
  • In OO, objects are in one class only.
  • Subclasses inherit from superclasses.
  • In contrast, E/R entities have representatives in
    all subclasses to which they belong.
  • Rule if entity e is represented in a subclass,
    then e is represented in the superclass.

48
Example
Beers
name
manf
isa
Ales
color
49
Keys
  • A key is a set of attributes for one entity set
    such that no two entities in this set agree on
    all the attributes of the key.
  • It is allowed for two entities to agree on some,
    but not all, of the key attributes.
  • We must designate a key for every entity set.

50
Keys in E/R Diagrams
  • Underline the key attribute(s).
  • In an Isa hierarchy, only the root entity set has
    a key, and it must serve as the key for all
    entities in the hierarchy.

51
Example name is Key for Beers
Beers
name
manf
isa
Ales
color
52
Example a Multi-attribute Key
dept
number
hours
room
Courses
  • Note that hours and room could also serve as a
  • key, but we must select only one key.

53
Case Movie Database
  • Add keys to your diagram

54
Weak Entity Sets
  • Occasionally, entities of an entity set need
    help to identify them uniquely.
  • Entity set E is said to be weak if in order to
    identify entities of E uniquely, we need to
    follow one or more many-one relationships from E
    and include the key of the related entities from
    the connected entity sets.

55
Example
  • name is almost a key for football players, but
    there might be two with the same name.
  • number is certainly not a key, since players on
    two teams could have the same number.
  • But number, together with the team name related
    to the player by Plays-on should be unique.

56
In E/R Diagrams
name
name
number
Plays- on
Players
Teams
  • Double diamond for supporting many-one
    relationship.
  • Double rectangle for the weak entity set.

57
Weak Entity-Set Rules
  • A weak entity set has one or more many-one
    relationships to other (supporting) entity sets.
  • Not every many-one relationship from a weak
    entity set need be supporting.
  • The key for a weak entity set is its own
    underlined attributes and the keys for the
    supporting entity sets.
  • E.g., (player) number and (team) name is a key
    for Players in the previous example.

58
Case Movie Database
  • We would like to record which camera crews shot a
    particular movie
  • Camera crews are numbered within each studio
  • Add these facts to your diagram

59
Design Techniques
  1. Avoid redundancy.
  2. Limit the use of weak entity sets.
  3. Dont use an entity set when an attribute will do.

60
Avoiding Redundancy
  • Redundancy occurs when we say the same thing in
    two or more different ways.
  • Redundancy wastes space and (more importantly)
    encourages inconsistency.
  • The two instances of the same fact may become
    inconsistent if we change one and forget to
    change the other.

61
Example Good
name
name
addr
ManfBy
Beers
Manfs
This design gives the address of each
manufacturer exactly once.
62
Example Bad
name
name
addr
ManfBy
Beers
Manfs
manf
This design states the manufacturer of a beer
twice as an attribute and as a related entity.
63
Example Bad
name
manf
manfAddr
Beers
This design repeats the manufacturers address
once for each beer and loses the address if there
are temporarily no beers for a manufacturer.
64
Entity Sets Versus Attributes
  • An entity set should satisfy at least one of the
    following conditions
  • It is more than the name of something it has at
    least one nonkey attribute.
  • or
  • It is the many in a many-one or many-many
    relationship.

65
Example Good
name
name
addr
ManfBy
Beers
Manfs
  • Manfs deserves to be an entity set because of
    the nonkey attribute addr.
  • Beers deserves to be an entity set because it is
    the many of the many-one relationship ManfBy.

66
Example Good
name
manf
Beers
There is no need to make the manufacturer an
entity set, because we record nothing about
manufacturers besides their name.
67
Example Bad
name
name
ManfBy
Beers
Manfs
Since the manufacturer is nothing but a name, and
is not at the many end of any relationship, it
should not be an entity set.
68
Dont Overuse Weak Entity Sets
  • Beginning database designers often doubt that
    anything could be a key by itself.
  • They make all entity sets weak, supported by all
    other entity sets to which they are linked.
  • In reality, we usually create unique IDs for
    entity sets.
  • Examples include social-security numbers,
    automobile VINs etc.

69
When Do We Need Weak Entity Sets?
  • The usual reason is that there is no global
    authority capable of creating unique IDs.
  • Example it is unlikely that there could be an
    agreement to assign unique player numbers across
    all football teams in the world.

70
Break
71
How to translate ER Model to Relational Model
72
Concepts
  • Relational Model is made up of tables
  • A row of table a relational
    instance/tuple
  • A column of table an attribute
  • A table a schema/relation
  • Cardinality number of rows
  • Degree number of columns

73
Example
Attribute
Cardinality 2
tuple/relational instance
SID Name Major GPA
1234 John CS 2.8
5678 Mary EE 3.6
4 Degree
A Schema / Relation
74
From ER Model to Relational Model
  • So how do we convert an ER diagram into a
    table?? Simple!!
  • Basic Ideas
  • Build a table for each entity set
  • Build a table for each relationship set if
    necessary (more on this later)
  • Make a column in the table for each attribute in
    the entity set
  • Indivisibility Rule and Ordering Rule
  • Primary Key

75
Example Strong Entity Set
SID
Name
SSN
Name
Advisor
Student
Professor
Dept
Major
GPA
SID Name Major GPA
1234 John CS 2.8
5678 Mary EE 3.6
SSN Name Dept
9999 Smith Math
8888 Lee CS
76
Representation of Weak Entity Set
  • Weak Entity Set Cannot exists alone
  • To build a table/schema for weak entity set
  • Construct a table with one column for each
    attribute in the weak entity set
  • Remember to include discriminator
  • Augment one extra column on the right side of the
    table, put in there the primary key of the Strong
    Entity Set (the entity set that the weak entity
    set is depending on)
  • Primary Key of the weak entity set
    Discriminator foreign key

77
Example Weak Entity Set
Age
SID
Name
Name
Student
Children
owns
Major
GPA
Primary key of Children is Parent_SID Name
Age Name Parent_SID
10 Bart 1234
8 Lisa 5678
78
Representation of Relationship Set
  • --This is a little more complicated--
  • Unary/Binary Relationship set
  • Depends on the cardinality and participation of
    the relationship
  • Two possible approaches
  • N-ary (multiple) Relationship set
  • Primary Key Issue
  • Identifying Relationship
  • No relational model representation necessary

79
Representing Relationship SetUnary/Binary
Relationship
  • For one-to-one relationship w/out total
    participation
  • Build a table with two columns, one column for
    each participating entity sets primary key. Add
    successive columns, one for each descriptive
    attributes of the relationship set (if any).
  • For one-to-one relationship with one entity set
    having total participation
  • Augment one extra column on the right side of the
    table of the entity set with total participation,
    put in there the primary key of the entity set
    without complete participation as per to the
    relationship.

80
Example One-to-One Relationship Set
Degree
SID
Name
ID Code
Student
Major
study
Major
GPA
Primary key can be either SID or Maj_ID_Co
SID Maj_ID Co S_Degree
9999 07 1234
8888 05 5678
81
Example One-to-One Relationship Set
Condition
SID
Name
S/N
11 Relationship
Student
Laptop
Have
Major
GPA
Brand
SID Name Major GPA LP_S/N Hav_Cond
9999 Bart Economy -4.0 123-456 Own
8888 Lisa Physics 4.0 567-890 Loan
Primary key can be either SID or LP_S/N
82
Representing Relationship SetUnary/Binary
Relationship
  • For one-to-many relationship w/out total
    participation
  • Same thing as one-to-one
  • For one-to-many/many-to-one relationship with one
    entity set having total participation on many
    side
  • Augment one extra column on the right side of the
    table of the entity set on the many side, put
    in there the primary key of the entity set on the
    one side as per to the relationship.

83
Example Many-to-One Relationship Set
Semester
SID
Name
SSN
N1 Relationship
Advisor
Student
Professor
Major
GPA
Name
Dept
SID Name Major GPA Pro_SSN Ad_Sem
9999 Bart Economy -4.0 123-456 Fall 2006
8888 Lisa Physics 4.0 567-890 Fall 2005
Primary key of this table is SID
84
Representing Relationship SetUnary/Binary
Relationship
  • For many-to-many relationship
  • Same thing as one-to-one relationship without
    total participation.
  • Primary key of this new schema is the union of
    the foreign keys of both entity sets.
  • No augmentation approach possible

85
Representing Relationship SetN-ary Relationship
  • Intuitively Simple
  • Build a new table with as many columns as there
    are attributes for the union of the primary keys
    of all participating entity sets.
  • Augment additional columns for descriptive
    attributes of the relationship set (if necessary)
  • The primary key of this table is the union of all
    primary keys of entity sets that are on many
    side
  • That is it, we are done.

86
Example N-ary Relationship Set
P-Key1
D-Attribute
A-Key
E-Set 1
P-Key2
A relationship
Another Set
E-Set 2
P-Key3
E-Set 3
P-Key1 P-Key2 P-Key3 A-Key D-Attribute
9999 8888 7777 6666 Yes
1234 5678 9012 3456 No
Primary key of this table is P-Key1 P-Key2
P-Key3
87
Representing Relationship SetIdentifying
Relationship
  • This is what you have to know
  • You DONT have to build a table/schema for the
    identifying relationship set once you have built
    a table/schema for the corresponding weak entity
    set
  • Reason
  • A special case of one-to-many with total
    participation
  • Reduce Redundancy

88
Representing Composite Attribute
  • Relational Model Indivisibility Rule Applies
  • One column for each component attribute
  • NO column for the composite attribute itself

SSN
Name
Professor
SSN Name Street City
9999 Dr. Smith 50 1st St. Fake City
8888 Dr. Lee 1 B St. San Jose
Address
Street
City
89
Representing Multivalue Attribute
  • For each multivalue attribute in an entity
    set/relationship set
  • Build a new relation schema with two columns
  • One column for the primary keys of the entity
    set/relationship set that has the multivalue
    attribute
  • Another column for the multivalue attributes.
    Each cell of this column holds only one value.
    So each value is represented as an unique tuple
  • Primary key for this schema is the union of all
    attributes

90
Example Multivalue attribute
SID
Name
The primary key for this table is Student_SID
Children, the union of all attributes
Children
Student
Major
GPA
Stud_SID Children
1234 Johnson
1234 Mary
5678 Bart
5678 Lisa
5678 Maggie
SID Name Major GPA
1234 John CS 2.8
5678 Homer EE 3.6
91
Representing Class Hierarchy
  • Two general approaches depending on disjointness
    and completeness
  • For non-disjoint and/or non-complete class
    hierarchy
  • create a table for each super class entity set
    according to normal entity set translation
    method.
  • Create a table for each subclass entity set with
    a column for each of the attributes of that
    entity set plus one for each attributes of the
    primary key of the super class entity set
  • This primary key from super class entity set is
    also used as the primary key for this new table

92
Example
SSN
Name
Person
SID
Status
Gender
ISA
Student
Major
GPA
SSN Name Gender
1234 Homer Male
5678 Marge Female
SSN SID Status Major GPA
1234 9999 Full CS 2.8
5678 8888 Part EE 3.6
93
Case Movie Database
  • Convert your E/R diagram to relational tables

94
Relational Algebra
95
Relational Algebra
  • Relational Algebra is
  • the formal description of how a relational
    database operates
  • the mathematics which underpin SQL
    operations.
  • Operators in relational algebra are not
    necessarily the same as SQL operators, even if
    they have the same name.

96
Terminology
  • Relation - a set of tuples.
  • Tuple - a collection of attributes which describe
    some real world entity.
  • Attribute - a real world role played by a named
    domain.
  • Domain - a set of atomic values.
  • Set - a mathematical definition for a collection
    of objects which contains no duplicates.

97
Operators - Write
  • INSERT - provides a list of attribute values for
    a new tuple in a relation. This operator is the
    same as SQL.
  • DELETE - provides a condition on the attributes
    of a relation to determine which tuple(s) to
    remove from the relation. This operator is the
    same as SQL.
  • MODIFY - changes the values of one or more
    attributes in one or more tuples of a relation,
    as identified by a condition operating on the
    attributes of the relation. This is equivalent to
    SQL UPDATE.

98
Operators - Retrieval
  • There are two groups of operations
  • Mathematical set theory based relations UNION,
    INTERSECTION, DIFFERENCE, and CARTESIAN PRODUCT.
  • Special database operations SELECT (not the
    same as SQL SELECT), PROJECT, and JOIN.

99
Relational SELECT
  • SELECT is used to obtain a subset of the tuples
    of a relation that satisfy a select condition.
  • For example, find all employees born after 1st
    Jan 1950
  • SELECT dob gt 01/JAN/1950 (employee)

100
Relational PROJECT
  • The PROJECT operation is used to select a subset
    of the attributes of a relation by specifying the
    names of the required attributes.
  • For example, to get a list of all employees
    surnames and employee numbers
  • PROJECT surname,empno (employee)

101
SELECT and PROJECT
SELECT and PROJECT can be combined together. For
example, to get a list of employee numbers for
employees in department number 1
102
Set Operations - semantics
  • Consider two relations R and S.
  • UNION of R and Sthe union of two relations is a
    relation that includes all the tuples that are
    either in R or in S or in both R and S. Duplicate
    tuples are eliminated.
  • INTERSECTION of R and Sthe intersection of R and
    S is a relation that includes all tuples that are
    both in R and S.
  • DIFFERENCE of R and Sthe difference of R and S
    is the relation that contains all the tuples that
    are in R but that are not in S.

103
SET Operations - requirements
  • For set operations to function correctly the
    relations R and S must be union compatible. Two
    relations are union compatible if
  • they have the same number of attributes
  • the domain of each attribute in column order is
    the same in both R and S.

104
UNION Example
105
INTERSECTION Example
106
DIFFERENCE Example
107
CARTESIAN PRODUCT
  • The Cartesian Product is also an operator which
    works on two sets. It is sometimes called the
    CROSS PRODUCT or CROSS JOIN.
  • It combines the tuples of one relation with all
    the tuples of the other relation.

108
CARTESIAN PRODUCT Example
109
JOIN Operator
  • JOIN is used to combine related tuples from two
    relations
  • In its simplest form the JOIN operator is just
    the cross product of the two relations.
  • As the join becomes more complex, tuples are
    removed within the cross product to make the
    result of the join more meaningful.
  • JOIN allows you to evaluate a join condition
    between the attributes of the relations on which
    the join is undertaken.
  • The notation used is
  • R JOIN join condition S

110
JOIN Example
111
Natural Join
  • Invariably the JOIN involves an equality test,
    and thus is often described as an equi-join. Such
    joins result in two attributes in the resulting
    relation having exactly the same value. A
    natural join will remove the duplicate
    attribute(s).
  • In most systems a natural join will require that
    the attributes have the same name to identify the
    attribute(s) to be used in the join. This may
    require a renaming mechanism.
  • If you do use natural joins make sure that the
    relations do not have two attributes with the
    same name by accident.

112
OUTER JOINs
  • Notice that much of the data is lost when
    applying a join to two relations. In some cases
    this lost data might hold useful information. An
    outer join retains the information that would
    have been lost from the tables, replacing missing
    data with nulls.
  • There are three forms of the outer join,
    depending on which data is to be kept.
  • LEFT OUTER JOIN - keep data from the left-hand
    table
  • RIGHT OUTER JOIN - keep data from the right-hand
    table
  • FULL OUTER JOIN - keep data from both tables

113
OUTER JOIN Example 1
114
OUTER JOIN Example 2
115
Semistructured data
116
title year length filmType studioName starName
Star Wars 1977 124 color Fox Carrie Fisher
Star Wars 1977 124 color Fox Mark Hamill
Star Wars 1977 124 color Fox Harrison Ford
Mighty Ducks 1991 104 color Disney Emilio Estevez
Wayne's World 1992 95 color Paramount Dana Carvey
Wayne's World 1992 95 color Paramount Mike Meyers
117
Root
star
movie
star
starIn
mh
sw
cf
starOf
year
name
title
city
address
address
name
street
Star Wars
1977
Mark Hamill
Oak
street
Brentwood
Carrie Fisher
street
city
city
starOf
Maple
Hollywood
Locust
Malibu
starIn
118
Multidimensional data
119
(No Transcript)
120
End of Part 1
Write a Comment
User Comments (0)
About PowerShow.com