Database Theory - PowerPoint PPT Presentation

1 / 186
About This Presentation
Title:

Database Theory

Description:

Database Theory Jason Fan – PowerPoint PPT presentation

Number of Views:223
Avg rating:3.0/5.0
Slides: 187
Provided by: Fan82
Category:

less

Transcript and Presenter's Notes

Title: Database Theory


1
Database Theory
  • Jason Fan

2
Outline
  • Basic Concepts
  • Database and Database Users (Chapter 1)
  • Database System Concepts and Architecture
    (Chapter 2)
  • Database Design
  • Database Design Process (Chapter 16)
  • Entity-Relationship (ER) Modeling (Chapter3)
  • Functional Dependencies and Normalization for
    Relational Database (Chapter 14)
  • Relational Design Algorithms (Chapter 15)
  • Relational Data Model Mapping (Chapter 9)
  • Relational Database
  • The Relational Data Model (Chapter 7)
  • Relational Algebra (Chapter 7)
  • SQL A Relational Database Language (Chapter 8)
  • Relational Calculus (Chapter 9)
  • Database Implementation
  • Transaction Processing (Chapter 19)
  • Concurrency Control (Chapter 20)
  • Database Recovery (Chapter 21)
  • Advanced Topics

3
Database and Database Users
  • Basic Concepts
  • Main Characteristics of Database Technology
  • Classes of Database Users
  • Additional Database Characteristics
  • When not to use a DBMS

4
Basic Concepts
  • Database
  • A collection of related data.
  • Data
  • Known facts that can be recorded and have
    implicit meaning.
  • Mini-world
  • Some part of the real world about which data is
    stored in database.
  • Database Management System (DBMS)
  • A software package to facilitate the creation and
    maintenance of a computerized database.
  • Database System
  • The DBMS software together with the data itself.

5
Main Characteristics of Database Technology
  • Self-contained nature of a database system
  • A DBMS catalog stores the description
    (meta-data) of the database. This allows the DBMS
    software to work with different databases.
  • Insulations between program and data
  • Data abstractions
  • A data model is used to hide storage details and
    present the user with a conceptual view of the
    database.
  • Program-data independence
  • Allows changing data storage structures without
    having to change the DBMS access programs.
  • Program-operation independence
  • Allows changing operation implementation without
    having to change the DBMS access programs.
  • Support of multiple views of data

6
Additional Characteristics of Database Technology
  • Controlling data redundancy
  • Restricting unauthorized access to data.
  • Providing persistent storage for program objects
    and data structure.
  • Providing multiple interfaces to different
    classes of users.
  • Representing complex relationships among data.
  • Enforcing integrity constraints on the database.
  • Providing backup and recovery services.
  • Potential for enforcing standards.
  • Flexibility to change data structures.
  • Reduced application development time.
  • Availability of up-to-date information.
  • Economies of scale.

7
Classes of Database Users
  • Workers on the scene Persons whose job involves
    daily use of a large database.
  • Database administrators (DBAs) Responsible for
    managing the database system.
  • Database designers Responsible for designing
    the database.
  • End users Access the database for querying ,
    updating , generating reports, etc.
  • Casual end users Occasional users.
  • Parametric (or naive) end users They use
    pre-programmed canned transactions to interact
    continuously with the database. For example, bank
    tellers or reservation clerks.
  • Sophisticated end users Use full DBMS
    capabilities for implementing complex
    applications.
  • System Analysts/Application programmers Design
    and implement canned transactions for Parametric
    users.
  • Workers behind the scene Persons whose job
    involves design , development , operation, and
    maintenance of the DBMS software and system
    environment.
  • DBMS designers and implementers Design and
    implement the DBMS software package itself.
  • Tool developers Design and implement tools that
    facilitate the use of DBMS software. Tools
    include design tools , performance tools ,
    special interfaces , etc.
  • Operators and maintenance personnel Work on
    running and maintaining the hardware and software
    environment for the database system.

8
When not to Use a DBMS
  • Main costs of using a DBMS
  • High initial investment and possible need for
    additional hardware.
  • Overhead for providing generality, security,
    recovery, integrity, and concurrency control.
  • When DBMS may be unnecessary
  • If the database and applications are simple, well
    defined and not expected to change.
  • If there are stringent real-time requirements
    that may not be met because of the DBMS overhead
  • If access to data by multiple users is not
    required.

9
Database System Concepts and Architecture
  • Data Models
  • Three-Schema Architecture
  • Data Independence
  • DBMS Languages
  • DBMS Interfaces
  • DBMS Architecture
  • Database System Utilities
  • Classification of DBMS

10
Data Models
  • Data Model
  • A set of concepts to describe the structure of a
    database, and certain constraints that the
    database should obey.
  • Data Model Operations
  • Operations for specifying database retrievals and
    updates by referring to the concepts of data
    model.
  • Categories of data models
  • Conceptual (high-level, semantic) data models
    Provide concepts that are close to the way many
    users perceive data. (Also called entity-based or
    object-based data models)
  • Physical (low-level, internal) data models
    Provide concepts that describe the details of how
    data is stored in the computer.
  • Implementation (record-oriented) data models
    Provide concepts that fall between above two,
    balancing user views with some computer storage
    details.

11
Data Models
  • Database Schema
  • The description of database. Includes description
    of database structure and the constraints that
    should hold on the database.
  • Database catalog
  • Stores database schema.
  • Schema Diagram
  • A diagrammatic display of ( some aspects of) a
    database schema.
  • Database Instance
  • The actual data stored in a database at a
    particular moment in time. Also called database
    state (or occurrence)
  • The database schema changes very infrequently.
    The database state changes every time the
    database is updated. Schema is also called
    intension, whereas the state is called extension.

12
Three Schema Architecture
  • Internal schema at the internal level to describe
    data storage structures and access paths.
    Typically uses a physical data model.
  • Conceptual schema at the conceptual level to
    describe the structure and constraints for the
    whole database. Uses a conceptual or an
    implementation data model .
  • External schemas at the external level to
    describe the various user views. Usually uses the
    same data model as the conceptual level.
  • Mappings transform requests and results between
    levels.

13
Database System Architecture
External View
External View
External Level
external/conceptual mapping
Conceptual Schema
Conceptual Level
conceptual/internal Mapping
Internal Schema
Internal Level
Stored Databases
14
Data Independence
  • Logical Data Independence
  • The capacity to change the conceptual schema
    without having to change the external schemas and
    their application programs.
  • Physical Data Independence
  • The capacity to change the internal schema
    without having to change the conceptual schema.
  • When a schema at a lower level is changed, only
    the mappings between this schema and higher level
    schemas need to be changed in a DBMS that fully
    supports data independence.
  • Mappings create overhead

15
Database System Languages
  • Data Definition Language (DDL)
  • Used by the DBA and database designers to
    specify the conceptual schema of a database. In
    many DBMSs, the DDL is also used to define
    internal and external schemas(views). In some
    DBMSs, separate storage definition language (SDL)
    and view definition language (VDL) are used to
    define internal and external schemas.
  • Data Manipulation Language (DML)
  • Used to specify database retrievals and updates.
  • High-level (nonprocedural) DML can be used on its
    own to specify database operations.
  • Low-level (procedural) DML retrieves a record at
    a time and must be embedded in a general-purpose
    programming language.
  • When DML is embedded in a general-purpose
    programming language (host language), it is
    called data sublanguage.
  • When DML is used in a stand-alone interactive
    manner, it is called query language

16
DBMS Interfaces
  • Stand-alone query language interfaces.
  • Programmer interfaces for embedding DML in
    programming languages
  • Pre-compiler Approach
  • Procedure (Subroutine) Call Approach
  • Menu-based
  • Graphic-based
  • Forms-based
  • Natural language
  • Combination of above
  • Parametric interfaces using function keys
  • Report generation languages
  • Interfaces for DBA
  • Creating accounts, granting authorizations
  • Setting system parameters
  • Changing schemas or access path

17
Database System Utilities
  • Loading data stored in files into a database.
  • Backing up the database periodically on tape.
  • Reorganizing database file structures.
  • Generating Report.
  • Performance monitoring.
  • Sorting files.
  • User monitoring.
  • Data compression.

18
Classification of DBMSs
  • Based on the data model used
  • Relational
  • Multidimensional
  • Network
  • Hierarchical.
  • Object-oriented
  • Semantic
  • Entity-Relationship
  • Other Classifications
  • Single-user vs. multi-user
  • Centralized vs. distributed
  • Homogeneous vs. Heterogeneous
  • OLTP vs. OLAP

19
Database Design
  • Goals of Database Design
  • Satisfy the information content requirements of
    the specified users and applications
  • Provide natural and easy-to-understand
    structuring of information
  • Support processing requirements and any
    performance objectives
  • Database Design Process
  • Requirement collection and analysis
  • Conceptual database design
  • Choice of DBMS
  • Data model mapping (Logical database design)
  • Physical database design
  • Database system implementation and tuning

20
Requirement Collection and Analysis
  • The major application areas and user groups that
    will use the database or whose work will be
    affected by it are identified. Key individuals
    and committees within each group are chosen to
    carry out subsequent steps of requirement
    analysis
  • Existing documentations concerning the
    applications is studied and analyzed.
  • The current operating environment and planned use
    of the information is studied.
  • Written responses to sets of questions are
    sometimes collected from potential users or user
    groups. Key individuals may be interviewed to
    help in assessing the worth of information and in
    setting up of priorities.

21
Conceptual Database Design
  • Conceptual Schema Design
  • Choice of high-level conceptual data model such
    as ER model and dimensional model
  • Approaches to conceptual schema design
  • centralized schema design approach
  • view integration approach
  • Strategies for conceptual schema design
  • top-down strategy
  • bottom-up strategy
  • inside-out strategy
  • mixed strategy
  • Transaction Design

22
Physical Database Design
  • Criteria for guiding the physical database design
  • Response time
  • Space utilization
  • Transaction throughput
  • Physical database design in relational database
  • Factors that influent the physical database
    design
  • Analyzing the database queries and transactions
  • Analyzing the expected frequencies of invocation
    of queries and transactions
  • Analyzing the time constraints for queries and
    transactions
  • Analyzing the expected frequencies of update
    operations
  • Analyzing the uniqueness constraints on
    attributes
  • Physical database design decisions
  • Indexing
  • De-normalization
  • Storage design

23
Database Tuning in Relational Database
  • Goals
  • Make application run fast
  • lower the response time of queries and
    transactions
  • improve the overall throughput of transactions
  • Tuning indexes
  • Some queries may take too long for lack of an
    index
  • Some indexes may not get utilized
  • Some indexes may causing excessive overhead
  • Tuning database design
  • De-normalization
  • Table partition
  • Duplicate attributes
  • Tuning queries

24
Automated Design Tools
  • Database Design Tools
  • Erwin
  • Rational Rose
  • Power Designer
  • Schema Diagram Notation
  • UML (Unified Modeling Language)
  • IDEF1X (Integration Definition for Information
    Modeling)
  • IE (Information Engineering)
  • CHEN's ERD Notation

25
Entity-Relationship (ER) Modeling
  • Example Database Application (COMPANY)
  • ER Model Concepts
  • Entities and Attributes
  • Entity Types, Value Sets, and Key Attributes
  • Relationships and Relationship Types
  • Structural Constraints and Roles
  • Weak Entity Types
  • ER Diagrams Notation
  • Relationships of Higher Degree
  • Enhanced ER Modeling

26
Example of COMPANY Database
  • Requirements for the COMPANY Database
  • The company is organized into departments. Each
    department has a name, number, and a employee who
    manages the department. We keep track of the
    start date of the department manager. A
    department may have several locations.
  • Each department controls a number of projects.
    Each project has a name, number, and is located
    at a single location.
  • We store each employee's social security number,
    address, salary, sex and birth date. Each
    employee works for one department but may work on
    several projects. We keep track of the number of
    hours per week that an employee currently works
    on each project. We also keep track of the direct
    supervisor of each employee.
  • each employee may have a number of dependents.
    For each dependent, we keep their name, sex,
    birth date, and relationship to the employee.

27
ER Model Concepts Entities and Attributes
  • Entities
  • Entities are specific objects or things in the
    mini-world that are represented in the database
    for example, the EMPLOYEE John Smith, the
    Research DEPARTMENT, the ProductX PROJECT.
  • Attributes
  • Attributes are properties used to describe an
    entity for example, an EMPLOYEE entity may have
    a Name, SSN, Address, Sex, BirthDate. A specific
    entity will have a value for each of its
    attributes for example a specific employee
    entity may have Name 'John Smith', SSN
    '123456789', Address '731 Fondren , Houston,
    TX', Sex 'M', BirthDate '09-JAN-55'.
  • Attribute Types
  • Simple each entity has a single atomic value for
    the attribute for example SSN or Sex.
  • Composite Attribute may be composed of several
    components for example Name(FirstName,
    MiddleName, LastName). Composition may form a
    hierarchy where some components are themselves
    composite.
  • Multi-Valued An entity may have multiple values
    for that attribute for example Color of a CAR or
    PreviousDegrees of a STUDENT. Denoted as Color
    or PreviousDegrees.

28
ER Model Concept Entity Types and Key Attributes
  • Entity Type
  • Entity type defines a set of entities that have
    the same attributes. For example, the EMPLOYEE
    entity type or the PROJECT entity type.
  • Key Attribute
  • An attribute of an entity type for which each
    entity must have a unique value is called a key
    attribute of the entity type. For example, SSN of
    EMPLOYEE.
  • A key attribute may be composite. For example,
    VehicleRegistrationNumber is a key of the CAR
    entity type with components(Number, State).
  • An entity type may have more than one key. For
    example, the CAR entity type may have two keys
    VehicleIdentificationNumber and
    VehicleRegistrationNumber(Number, State).
  • Domains (Value Sets) of Attributes
  • Each simple attribute of an entity type is
    associates with a domain, which specifies the set
    of values the may be assigned to that attribute
    for each individual entity.

29
ER Model Concepts Relationships and Relationship
Types
  • Relationship
  • A relationship relates two or more distinct
    entities with a specific meaning for example,
    EMPLOYEE John Smith works on the ProductX PROJECT
    or EMPLOYEE Franklin Wong manages the Research
    DEPARTMENT.
  • Relationship Type
  • Relationship of the same type are grouped or
    typed into a relationship type. For example, the
    WORKS_ON relationship type in which EMPLOYEEs and
    PROJECTs participate, or the MANAGEs relationship
    type in which EMPLOYEEs and DEPARTMENTs
    participate.
  • More than one relationship type can exist with
    the same participating entity types for example,
    MANAGES and WORKS_FOR are distinct relationships
    between EMPLOYEE and DEPARTMENT participate.
  • The degree of a relationship type
  • The degree of a relationship type is the number
    of participating entity types. binary
    relationships, ternary relationship, n-ary
    relationship

30
ER Model Concepts Structural Constraints and
roles
  • A relationship can relate two entities of the
    same entity type for example, a SUPERVISION
    relationship type relates one EMPLOYEE ( in the
    role of supervisee) to another EMPLOYEE ( in the
    role of supervisor). This is called a recursive
    relationship type.
  • A relationship type can have attributes for
    example, HoursPerWeek of WORKS_ON its value for
    each relationship instance describes the number
    of hours per week that an EMPLOYEE works on a
    PROJECT.
  • Structural constraints on relationships
  • Cardinality ratio ( of a binary relationship)
    11, 1N, N1, or MN.
  • Participation constraint (on each participating
    entity type) total (called existence dependency)
    or partial.

31
ER Model Concepts Weak Entity Types
  • An entity type that does not have a key attribute
  • A weak entity type must participate in an
    identifying relationship type with an owner or
    identifying entity type
  • Entities are identified by the combination of a
    partial key of the weak entity type and the key
    of the identifying entity type.
  • Example
  • Suppose that a DEPENDENT entity is identified by
    the dependents first name and birth date, and
    the specific EMPLOYEE that the dependent is
    related to. DEPENDENT is a weak entity type with
    EMPLOYEE as its identifying entity type via the
    identifying relationship type DEPENDENT_OF.

32
Conceptual Design of COMPANY Database
  • Entity types
  • DEPARTMENT
  • PROJECT
  • EMPLOYEE
  • DEPENDENT
  • Relationship types
  • Manage (11)
  • Work_for (1n)
  • Supervision (1n)
  • Controls (1n)
  • Works_on (mn)
  • Has_dependent (1n)

33
ER Diagram of COMPANY Database
34
Alternative Notation for Relationship Structural
Constraints
  • Specified on each participation of an entity type
    E in a relationship type R.
  • Specifies that each entity e in E participates in
    at least min and at most max relationship
    instances in R.
  • Default(no constraint) min 0, max n.
  • Must have min ? max, min ? 0, max ? 1.
  • Examples
  • A department has exactly one manager and an
    employee can manage at most one department.
  • Specify (1,1) for participation of DEPARTMENT in
    MANAGES
  • Specify (0,1) for participation of EMPLOYEE in
    MANAGES
  • An employee can work for exactly one department
    but a department can have any number of
    employees.
  • Specify (1,1) for participation of EMPLOYEE in
    WORKS_FOR
  • Specify (0,n) for participation of DEPARTMENT in
    WORKS_FOR

35
ER Diagram of COMPANY Database
Works_for
0N
11
Manages
11
01
StartDate
36
Enhanced Entity-Relationship and Object Modeling
  • Subclass, Superclass and Inheritance
  • Specialization and Generalization
  • Disjoin/Overlapping
  • Total/Partial
  • Union/Categories

37
The Relational Data Model
  • Relational Model Concepts
  • Characteristics of Relations
  • Relational Integrity Constraints
  • Domain Constraints
  • Key Constraints
  • Entity Integrity Constraints
  • Referential Integrity Constraints
  • Update Operations on Relations

38
Relational Model Concepts
39
Relational Model Concepts
  • Relation ( informally).
  • A table of values. Each column in the table has a
    column header called an attribute. Each row is
    called a tuple.
  • Formal relational concepts.
  • Domain A set of atomic (indivisible) values.
  • Attribute A name to suggest the meaning that a
    domain plays in a particular relation. Each
    attribute Ai has a domain Dom(Ai).
  • Relation schema A relation name R and a set of
    attributes Ai that define the relation. Denoted
    by R(A1, A2, ... ,an). For example student(name,
    SSN, BirthDate, Addr).
  • Relational Database Schema A set S of relation
    schemas that belong to the same database. S is
    the name of the database. S R1, R2, ...,Rn.
  • Degree of a relation its number of attributes n.
  • Tuple t of R(A1, A2,....,An) a (ordered) set of
    values t lt v1, v2, ..., vngt where each value vi
    is an element of Dom(Ai). Also called a n-tuple.
  • Relation instance r(r) A set of tuples r(r)
    t1, t2,...,Tm, or alternatively r(r) ?dom(a1) ?
    dom(a2) ? ... ? dom(an).

40
Characteristics of Relations
  • The tuples are not considered to be ordered, even
    though they appear to be in the tabular form.
  • We will consider the attributes in R(A1, A2,
    ...,An) and the values in t lt v1, v2, .., vngt
    to be orderd.( However, a more general
    alternative definition of relation does not
    require this ordering).
  • All values are considered atomic (indivisible). A
    special null value is used to represent values
    that are unknown or inapplicable to certain
    tuples.
  • Notation
  • We refer to component values of a tuple t by
    tAi vi (the value of attribute Ai for tuple
    t)
  • Similarly, tAu, Av, ..., Aw refer to the
    sub-tuple of t containing the values of
    attributes Au, Av, ..., Aw, respectively.

41
Relational Constraints
42
Relational Constraints
  • Constraints are conditions that must hold on all
    valid relation instances. There are three main
    types of constraints
  • Domain Constraints
  • Values of each attribute must be atomic.
  • Key Constraints
  • Superkey of R A set of attributes SK of R such
    that no two tuples in any valid relation instance
    r(R) will have the same value for SK. That is,
    for any distinct tuples t1 and t2 in r(R), t1SK
    ? t2SK
  • Key (candidate key) of R A minimal superkey
    that is, a superkey K such that removal of any
    attribute form K results in a set of attributes
    that is not a superkey.
  • Example The CAR relation schema
  • CAR(State, Reg, SerialNo, make, Model, Year) has
    two keys Key1 State, Reg, Key2 SerialNo
    which are also superkeys. SerialNo, Make is a
    superkey but not a key.
  • If a relation has several candidate keys, one is
    chosen arbitrarily to be the primary key. The
    primary key attributes are underlined.

43
Relational Constraints
  • Entity Integrity
  • The primary key attributes PK of each relation
    schema R in S can not have null values in any
    tuple of r(R). This is because primary key values
    are used to identify the individual tuples. tPK
    ? null for any tuple t in r(R).
  • Referential Integrity
  • Referential integrity constraint is used to
    specify a relationship among tuples in two
    relations the referencing relation and
    referenced relation. It involves two relations.
    Tuples in the referencing relation R1 have
    attributes FK (called foreign key attributes)
    that reference the primary key attributes PK of
    the referenced relation R2. A tuple t1 in R1 is
    said to reference a tuple t2 in R2 if t1FK t2
    PK. A referential integrity constraint can be
    displayed in a relational database schema as a
    directed arc from R1.FK to R2.

44
Operations
20.00
25.00
18.00
22.00
45
Update Operations on Relations
  • Update Operations
  • INSERT a tuple
  • DELETE a tuple
  • MODIFY a tuple
  • Integrity constraints should not be violated by
    the update operations.
  • Insert operation could violate any constraint.
  • Delete operation could violate referential
    constraints.
  • Modify a primary key or foreign key attribute is
    equivalent to delete one tuple and insert
    another. Modify other attributes cause no
    problems.
  • Several update operations may have to be grouped
    together.
  • Updates may propagate to cause other updates
    automatically. This may be necessary to maintain
    integrity constraints.
  • In case of integrity violation, several actions
    can be taken
  • cancel the operation that causes the violation
  • perform the operation but inform the user of
    violation
  • trigger additional updates so the violation is
    corrected
  • execute a user-specified error-correction routine

46
Data Model Mapping
  • ER-to-Relational Mapping
  • EER-to-Relational Mapping

47
Relational Model of COMPANY Database
48
ER-to-Relational Mapping
  • STEP 1 For each regular (strong) entity type E
    in the ER schema, create a relation R that
    includes all the simple attributes of E. Include
    only the simple component attributes of a
    composite attribute. Choose one of the key
    attributes of E as primary key for R. If the
    chosen key of E is composite, the set of simple
    attributes that form it will together form the
    primary key of R.
  • STEP 2 For each weak entity type W in the ER
    schema with owner entity type E, create a
    relation R, and include all simple attributes (or
    simple components of composite attributes) of W
    as attributes of R. In addition, include as
    foreign key attributes of R the primary key
    attribute(s) of the relation(s) that correspond
    to the owner entity type(s) this takes care of
    the identifying relationship type of W. The
    primary key of R is the combination of the
    primary key(s) of the owner(s) and the partial
    key of the weak entity type W, if any.
  • STEP 3 For each binary 11 relationship type R
    in the ER schema, identify the relations S and T
    that correspond to the entity types participating
    in R. Choose one of the relationsS, sayand
    include as foreign key in S the primary key of T.
    It is better to choose an entity type with total
    participation in R in the role of S. Include all
    the simple attributes (or simple components of
    composite attributes) of the 11 relationship
    type R as attributes of S.
  • STEP 4 For each regular binary 1N relationship
    type R, identify the relation S that represents
    the participating entity type at the N-side of
    the relationship type. Include as foreign key in
    S the primary key of the relation T that
    represents the other entity type participating in
    R. Include any simple attributes (or simple
    components of composite attributes) of the 1N
    relationship type as attributes of S.

49
ER-to-Relational Mapping
  • STEP 5 For each binary MN relationship type R,
    create a new relation S to represent R. Include
    as foreign key attributes in S the primary keys
    of the relations that represent the participating
    entity types their combination will form the
    primary key of S. Also include any simple
    attributes of the MN relationship type (or
    simple components of composite attributes) as
    attributes of S. Notice that we cannot represent
    an MN relationship type by a single foreign key
    attribute in one of the participating
    relationsas we did for 11 or 1N relationship
    typesbecause of the MN cardinality ratio.
  • STEP 6 For each multivalued attribute A, create
    a new relation R. This relation R will include an
    attribute corresponding to A, plus the primary
    key attribute Kas a foreign key in Rof the
    relation that represents the entity type or
    relationship type that has A as an attribute. The
    primary key of R is the combination of A and K.
    If the multivalued attribute is composite, we
    include its simple components.
  • STEP 7 For each n-ary relationship type R, where
    n gt 2, create a new relation S to represent R.
    Include as foreign key attributes in S the
    primary keys of the relations that represent the
    participating entity types. Also include any
    simple attributes of the n-ary relationship type
    (or simple components of composite attributes) as
    attributes of S. The primary key of S is usually
    a combination of all the foreign keys that
    reference the relations representing the
    participating entity types. However, if the
    cardinality constraints on any of the entity
    types E participating in R is 1, then the primary
    key of S should not include the foreign key
    attribute that references the relation E
    corresponding to E. This concludes the mapping
    procedure.

50
EER-to-Relational Mapping
  • STEP 8 Convert each specialization with m
    subclasses S1, S2, . . ., Sm and (generalized)
    superclass C, where the attributes of C are k,
    a1, . . ., an and k is the (primary) key, into
    relation schemas using one of the four following
    options
  • Option 8A Create a relation L for C with
    attributes Attrs(L) k, a1, . . ., an and
    PK(L) k. Create a relation Li for each subclass
    Si, 1 1 i 1 m, with the attributes Attrs(Li)
    kD attributes of Si and PK(Li) k.
  • Option 8B Create a relation Li for each subclass
    Si, 1 1 i 1 m, with the attributes Attrs(Li)
    attributes of SiD k, a1, . . ., an and PK(Li)
    k.
  • Option 8C Create a single relation L with
    attributes Attrs(L) k, a1, . . ., an D
    attributes of S1 D . . . D attributes of Sm D
    t and PK(L) k. This option is for a
    specialization whose subclasses are disjoint, and
    t is a type (or discriminating) attribute that
    indicates the subclass to which each tuple
    belongs, if any. This option has the potential
    for generating a large number of null values.
  • Option 8D Create a single relation schema L with
    attributes Attrs(L) k, a1, . . ., an D
    attributes of S1 D . . . D attributes of Sm D
    t1, t2, . . ., tm and PK(L) k. This option is
    for a specialization whose subclasses are
    overlapping (but will also work for a disjoint
    specialization), and each ti, 1 1 i 1 m, is a
    Boolean attribute indicating whether a tuple
    belongs to subclass Si.

51
ER-to-Relational Mapping
ER Model Relational Model
Entity Entity Relation
11 and 1N relationship type Foreign key
MN relationship type Relationship relation and two foreign keys
N-ary relationship type Relationship relation and n foreign keys
Simple attribute Attribute
Composite attribute Set of component attributes
Multi-valued attributes Relation and foreign keys
Value set (Domain) Domain
Key attribute Primary (or candidate) key
52
The Relational Algebra
  • Relational algebra is a collection of operations
    to manipulate relations
  • Query result is in the form of a relation
  • Relational Operations
  • SELECT ?
  • PROJECT ? operations
  • Sequences of operations and renaming of
    attributes
  • Set operations
  • UNION ?
  • INTERSECTION ?
  • DIFFERENCE ?
  • CARTESIAN PRODUCT ?
  • JOIN operations
  • Other relational operations
  • DIVISION
  • OUTER JOIN
  • AGGREGATE FUNCTIONS.

53
Relational Operations
  • SELECT operation (denoted by ?)
  • Selects the tuples (rows) from a relation R that
    satisfy a certain selection condition c
  • Form of the operation ?c(R)
  • The condition c is an arbitrary Boolean
    expression on the attributes of R
  • Resulting relation has the same attributes as R
  • Resulting relation includes each tuple in r(R)
    whose attribute values satisfy condition c
  • Examples
  • ? DNO 4 (EMPLOYEE)
  • ? SALARY gt 30333(EMPLOYEE)
  • ? (( DNO 4 AND SALARY gt 25000 ) OR DNO 5)
    (EMPLOYEE)

54
Relational Operations
  • Examples
  • DNO 4 (EMPLOYEE)
  • Jennifer S Wallace 987654321 1941-06-20
    Berry, Bellaire, TX F 43000.00 888665555 4
  • Ahmad V Jabbar 987987987 1969-03-29
    Dallas, Huston, TX M 25000.00 987654321 4
  • Alicia J Zelaya 999887777 1968-07-19
    Castle, Spring, TX F 25000.00 987654321 4
  • SALARY gt 30333(EMPLOYEE)
  • Franklin T Wong 333445555 1955-12-08 638
    Voss, Huston, TX M 40000.00 888665555 5
  • Ramesh K Narayan 666884444 1962-09-15 975 Fire
    Oak, Humble, TX M 38000.00 333445555 5
  • James E Borg 888665555 1937-11-10 450
    Stone, Huston, TX M 55000.00 null 1
  • Jennifer S Wallace 987654321 1941-06-20 291
    Berry, Bellaire, TX F 43000.00 888665555 4
  • (( DNO 4 AND SALARY gt 25000 ) OR DNO 5)
    (EMPLOYEE)
  • Franklin T Wong 333445555 1955-12-08 638
    Voss, Huston, TX M 40000.00 888665555 5
  • Ramesh K Narayan 666884444 1962-09-15 975 Fire
    Oak, Humble, TX M 38000.00 333445555 5
  • Jennifer S Wallace 987654321 1941-06-20 291
    Berry, Bellaire, TX F 43000.00 888665555 4

55
Relational Operations
  • PROJECT operation(denoted by ?)
  • Keeps only certain attributes (columns) from a
    relation R specified in an attribute list L
  • Form of operation ? L(R)
  • Resulting relation has only those attributes of R
    specified in L
  • The PROJECT operation eliminates duplicate tuples
    in the resulting relation so that it remains a
    mathematical set ( no duplicate elements)
  • Example
  • ? FNAME,LNAME, SALARY(EMPLOYEE)
  • ? SEX, SALARY(EMPLOYEE)

56
Relational Operations
  • Example
  • FNAME,LNAME, SALARY(EMPLOYEE)
  • John Smith 30000.00
  • Franklin Wong 40000.00
  • Joice English 25000.00
  • Ramesh Narayan 38000.00
  • James Borg 55000.00
  • Jennifer Wallace 43000.00
  • Ahmad Jabbar 25000.00
  • Alicia Zelaya 25000.00
  • SEX, SALARY(EMPLOYEE)
  • F 25000.00
  • F 43000.00
  • M 25000.00
  • M 30000.00
  • M 38000.00
  • M 40000.00
  • M 55000.00

57
Relational Operations
  • Sequence of operations
  • Several operations can be combined to form a
    relational algebra expression (query)
  • Example
  • Retrieve the names and salaries of employees who
    work in department 4.
  • FNAME, LNAME, SALARY(? DNO4 (EMPLOYEE))
  • Jennifer Wallace 43000.00
  • Ahmad Jabbar 25000.00
  • Alicia Zelaya 25000.00
  • Alternatively we specify explicit intermediate
    relations for each step
  • DEPT4_EMPS ? ?DNO 4 (EMPLOYEE)
  • R ? ? FNAME, LNAME, SALLRY(DEPT4_EMPS)
  • Attributes can optionally be renamed in the
    resulting left-hand side relation(this may be
    required for some operations that will be
    presented later)
  • DEPT4_MPS ? ? DNO 4 (EMPLOYEE)
  • R(FIRSTNAME,LASTNAME,SALARY) ?
  • ? FNAME,LNAME,SALARY(DEPT4_EMPS)

58
Relational Operations
  • Set Operations
  • UNION R1 ? R2,
  • INTERSECTION R1 ? R2
  • SET DIFFERENCE R1 ? R2
  • CARTESIAN PRODUCT R1 ? R2
  • For ?, ?, ?, the operand relations R1(A1, A2,
    ...,An) and R2(B1, B2, ...,Bn) must have the same
    number of attributes, and the domains of
    corresponding attributes must be compatible that
    is dom(Ai) dom(Bi) for i 1,2,..,n. This
    condition is called union compatibility.
  • The resulting relation for ?, ? or ? has the
    same attribute names as the first operand
    relation R1(by convention).

59
Relational Operations
  • Cartesian product
  • R(A1, A2,....,Am,B1,B2,...,Bn) ? R1(A1,
    A2,....,Am) ? R2(B1,B2,...,Bn)
  • A tuple t exists in R for each combination of
    tuples t1 from R1 and t2 from R2 such that tA1,
    A2,....,Am t1 and tB1,B2,...,Bn t2
  • If R1 has n1 tuples and R2 has n2 tuples, then R
    will have n1n2 tuples
  • CARTESIAN PRODUCT is a meaningless operation on
    its own. It can combine related tuples from two
    relations if followed y the appropriate SELECT
    operation.
  • Example
  • Combine each DEPARTMENT tuple with the EMPLOYEE
    tuple of the manager.
  • DEP_EMP ? DEPARTMENT ? EMPLOYEE
  • DEPT_MANAGER ? ?MGRSSN SSN(DEP_EMP)
  • James E Borg 888665555 Headquarters 1
    888665555 1981-06-19
  • Jennifer S Wallace 987654321 Administration 4
    987654321 1995-01-01
  • Franklin T Wong 333445555 Research 5
    333445555 1988-05-22

60
Relational Operations
  • JOIN operation
  • THETA JOIN
  • Similar to CARTESIAN PRODUCT followed by a
    SELECT. The condition c is called the join
    condition.
  • R(A1, A2,....,Am,B1,B2,...,Bn) ? R1(A1,
    A2,....,Am) c R2(B1,B2,...,Bn)
  • c is in the form of ltconditiongt AND ltconditiongt
    AND . . . AND ltconditiongt , where each condition
    is of the form Ai ? Bj, Ai is an attribute of R,
    Bj is an attribute of S, Ai and Bj have the same
    domain, and ? (theta) is one of the comparison
    operators , lt, ?, gt, ?, ?.
  • EQUIJOIN
  • The condition c uses only operator ''.
  • The attributes appear in condition c are called
    join attributes
  • Examples Retrieve each DEPARTMENTs name and its
    managers name
  • T ? DEPARTMENT MGRSSN SSN EMPLOYEE
  • RESULT ? ? DNAME,FNAME,LNAME (T)

61
Relational Operations
  • JOIN operations
  • NATURAL JOIN()
  • In an EQUIJOIN R ? R1 cR2, the join attributes
    of R2 appear redundantly in the result relation
    R. In a NATURAL JOIN, the redundant join
    attributes of R2 are eliminated from R. The
    equality condition is implied and need not be
    specified.
  • R ? R1 ( join attributes of R1), (join
    attributes of R2) R2
  • If the join attributes have the same names in
    both relations, they need not be specified and we
    can write R? R1R2.
  • Examples
  • Retrieve each EMPLOYEEs name and the name of the
    DEPARTMENT he/she works for
  • T ? EMPLOYEE (DNO),(DNUMBER)DEPARTMENT
  • RESULT ? ? FNAME, LNAME, DNAME (T)
  • retrieve each EMPLOYEEs name and the name of
    his/ser SUPERVISOR
  • SUPERVISOR (SUPERSSN,SFN,SLN) ? ? SSN,FNAME,LNAME
    (EMPLOYEE)
  • T ? EMPLOYEESUPERVISOR
  • RESULT ? ? FNAME,LNAME,SFN,SLN(T)

62
Relational Operations
  • Complete Set of Relational Algebra Operations
  • All the operations discussed so far can be
    described as a sequence of only the operations
    SELECT, PROJECT, UNION, SET DIFFERENCE, and
    CARTESIAN PRODUCT.
  • Hence, the set ?, ?, ?, -, ? is called a
    complete set of relational algebra operations.
    Any query language equivalent to these operations
    is called relationally complete.
  • For database applications, additional operations
    are needed that were not part of the original
    relational algebra. These include
  • Aggregate functions and grouping
  • OUTER JOIN and OUTER UNION.

63
Relational Operations
  • Aggregate Functions
  • Functions such as SUM, COUNT, AVERAGE, MIN, MAX
    are often applied to sets of values or sets of
    tuples in database applications
  • lt grouping attributesgt ? ltfunction listgt (R)
  • The grouping attributes are optional
  • Example 1 retrieve the average salary of all
    employees ( no grouping needed)
  • R(AVGSAL) ? ? AVERAGE SALARY (EMPLOYEE)
  • 35125.000000
  • Example 2 For each department, retrieve the
    department number, the number of employees , and
    the average salary ( in the department)
  • R(DNO,NUMEMPS,AVGSAL) ?
  • DNO ? COUNT SSN,AVERAGE SALARY (EMPLOYEE)
  • 1 1 55000.000000
  • 4 3 31000.000000
  • 5 4 33250.000000

64
Relational Operations
  • OUTER JOIN
  • In a regular EQUIJOIN or NATURAL JOIN operation,
    tuples in R1 or R2 that do not have matching
    tuples in other relation do not appear in the
    result. Some queries require all tuples in R1 (or
    R2 or both) to appear in the result. When no
    matching tuples are found, nulls are placed for
    the missing attributes
  • LEFT OUTER JOIN R1 R2 lets every tuple in R1
    appear in the result
  • RIGHT OUTER JOIN R1 R2 lets every tuple in
    R2 appear in the result
  • FULL OUTER JOIN R1 R2 lets every tuple in
    R1 or R2 appear in the result

65
SQL - A Relational Database Language
  • Basic Concepts
  • Data Definition in SQL
  • Retrieval Queries in SQL
  • Specifying Updates in SQL
  • Relational Views in SQL
  • Creating Indexes in SQL
  • Embedding SQL in a Programming Language
  • Recent Advances in SQL

66
Basic Concept
  • Catalog A collection of schemas
  • Schema A collections of tables and other
    constructs such as constraints.
  • Table Represents a relation. It includes base
    tables and views.
  • Column Represents an attribute.
  • Name Space Hierarchy catalog -gt schema -gt table
    -gt column
  • Qualified Name
  • catalog_name.schema_name.table_name.column_name

67
Data Definition in SQL
  • CREATE TABLE
  • Specifies a new base relation by giving it a
    name and specifying each of its attributes and
    their data types (INTEGER, FLOAT , DECIMAL (i,j),
    CHAR(n), VARCHAR(n)). A constraint NOT NULL may
    be specified on an attribute .
  • Example
  • CREATE TABLE DEPARTMENT
  • ( DNAME VARCHAR(15) NOT NULL ,
  • DNUMBER INT NOT NULL ENIQUE,
  • MGRSSN CHAR(9) NOT NULL,
  • MGRSTARTDATE DATETIME,
  • PRIMARY KEY(DNUMBER),
  • FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE
  • )

68
Data Definition in SQL
  • DROP TABLE
  • Used to remove a relation (base table) and its
    definition. The relation can no longer be used in
    queries , updates or any other commands since
    its description no longer exists.
  • Example
  • DROP TABLE DEPENDENT
  • ALTER TABLE
  • Used to add an attribute to one of the base
    relations. The new attribute will have NULLs in
    all the tuples of the relation right after the
    command is executed hence, the NOT NULL
    constraint is not allowed for such an attribute.
  • Example
  • ALTER TABLE EMPLOYEE ADD JOB VARCHAR(12)
  • The database users must still enter a value for
    the new attribute JOB for each EMPLOYEE tuple.
    This can be done using the UPDATE command.

69
DDL for COMPANY Database
  • CREATE TABLE EMPLOYEE
  • ( FNAME VARCHAR(15) NOT NULL,
  • MINIT CHAR,
  • LNAME VARCHAR(15) NOT NULL,
  • SSN CHAR(9) NOT NULL,
  • BDATE DATETIME,
  • ADDRESS VARCHAR(30),
  • SEX CHAR,
  • SALARY DECIMAL(19,2),
  • SUPERSSN CHAR(9),
  • DNO INT NOT
    NULL,
  • PRIMARY KEY(SSN),
  • FOREIGN KEY (SUPERSSN) REFERENCES
    EMPLOYEE(SSN),
  • )

70
DDL for COMPANY Database
  • CREATE TABLE DEPARTMENT
  • ( DNAME VARCHAR(15) NOT
    NULL ,
  • DNUMBER INT
    NOT NULL UNIQUE,
  • MGRSSN CHAR(9)
    NOT NULL,
  • MGRSTARTDATE DATETIME,
  • PRIMARY KEY(DNUMBER),
  • FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE
  • )
  • ALTER TABLE EMPLOYEE
  • ADD FOREIGN KEY (DNO)
  • REFERENCES DEPARTMENT(DNUMBER)

71
DDL for COMPANY Database
  • CREATE TABLE DEPT_LOCATIONS
  • (DNUMBER INT NOT NULL,
  • DLOCATION VARCHAR(15) NOT NULL,
  • PRIMARY KEY (DNUMBER, DLOCATION),
  • FOREIGN KEY (DNUMBER) REFERENCES
    DEPARTMENT(DNUMBER)
  • )
  • CREATE TABLE PROJECT
  • (PNAME VARCHAR(15) NOT NULL,
  • PNUMBER INT NOT NULL,
  • PLOCATION VARCHAR(15),
  • DNUM INT NOT NULL,
  • PRIMARY KEY (PNUMBER),
  • FOREIGN KEY (DNUM) REFERENCES DEPARTMENT(DNUMBER)
  • )

72
DDL for COMPANY Database
  • CREATE TABLE WORKS_ON
  • (ESSN CHAR(9) NOT NULL,
  • PNO INT NOT NULL,
  • HOURS DECIMAL(3,1) NOT NULL,
  • PRIMARY KEY (ESSN, PNO),
  • FOREIGN KEY (ESSN) REFERENCES EMPLOYEE(SSN),
  • FOREIGN KEY (PNO) REFERENCES PROJECT(PNUMBER)
  • )
  • CREATE TABLE DEPENDENT
  • (ESSN CHAR(9)
    NOT NULL,
  • DEPENDENT_NAME VARCHAR(15) NOT NULL,
  • SEX CHAR,
  • BDATE DATETIME,
  • RELATIONSHIP VARCHAR(8),
  • PRIMARY KEY (ESSN,DEPENDENT_NAME),
  • FOREIGN KEY (ESSN) REFERENCES EMPLOYEE(SSN)
  • )

73
Basic Queries in SQL
  • SQL has one basic statement for retrieving
    information from a database the SELECT statement
  • This is not the same as the SELECT operation of
    the relational algebra
  • Important distinction between SQL and the formal
    relational model SQL allows a table (relation)
    to have two or more tuples that are identical in
    all their attribute values.
  • SQL relations can be constrained to be sets by a
    key constraint, or by using the DISTINCT option
    in the SELECT statement.

74
The SELECT Statement
  • Basic form of the SQL SELECT statement is called
    a mapping or a SELECT-FROM-WHERE block
  • SELECT ltattribute-listgt
  • FROM lttable listgt
  • WHERE ltconditiongt
  • where ltattribute listgt is a list of attribute
    names whose values are to be retrieved by the
    query. lt table listgt is al list of the relation
    names required to process the query. lt conditiongt
    is a conditional (Boolean) expression that
    identifies the tuples to be retrieved by the
    query
  • Basic SQL queries correspond to using the SELECT,
    PROJECT, and JOIN operations of the relational
    algebra.

75
Sample Basic Queries
  • Query 0 Retrieve the birth date and address of
    the employee whose name is 'John B. Smith'.
  • Q0 SELECT BDATE, ADDRESS
  • FROM EMPLOYEE
  • WHERE FNAME 'John' AND MINIT 'B' AND LNAME
    'Smith'
  • Query 1 Retrieve the name and address of all
    employees who work for the 'Research' department.
  • Q1 SELECT FNAME, LNAME, ADDRESS
  • FROM EMPLOYEE, DEPARTMENT
  • WHERE DNAME'Research' AND DNUMBER DNO
  • Query 2 For every project located in 'Stafford'
    , list the project number, the controlling
    department number, and the department manager's
    last name, address and birth date.
  • Q2 SELECT PNUMBER, DNUM,LNAME, BDATE, ADDRESS
  • FROM PROJECT, DEPARTMENT, EMPLOYEE
  • WHERE DNUMDNUMBER AND MGRSSN SSN AND
    PLOCATION 'Stafford'
  • Q2x SELECT PNUMBER, DNUM,LNAME, BDATE, ADDRESS
  • FROM DEPARTMENT JOIN PROJECT ON (DNUMDNUMBER)

  • JOIN EMPLOYEE ON (MGRSSN SSN)
  • WHERE PLOCATION 'Stafford'

76
The SELECT Statement Aliases
  • ALIASES
  • Some queries need to refer to the same relation
    twice. In this case, aliases are given to the
    relation name
  • Query 8 For each employee, retrieve the
    employees name, and the name of his/her
    immediate supervisor.
  • Q8 SELECT E.FNAME, E.LNAME, S.FNAME, S.LNAME
  • FROM EMPLOYEE E, EMPLOYEE S
  • WHERE E.SUPERSSN S.SSN
  • Renaming attributes
  • Q8A SELECT E.FNAME as "Employee First Name",
    E.LNAME as "Employee Last Name",
  • S.FNAME as
    "Supervisor First Name" , S.LNAME as "Supervisor
    Last Name"
  • FROM EMPLOYEE E, EMPLOYEE S
  • WHERE E.SUPERSSN S.SSN
  • Q8B (SQL Server) SELECT E.LNAME ', '
    E.FNAME as "Employee Name",

  • S.LNAME ', ' S.FNAME as "Supervisor Name"
  • FROM EMPLOYEE E,
    EMPLOYEE S
  • WHERE E.SUPERSSN
    S.SSN
  • In Q8, the alternate relation names E and S are
    called aliases for the EMPLOYEE relation
  • We can think of E and S as two different copies
    of the EMPLOYEE relation E represents employee
    in the role of supervisees and S represents
    employees in the role of supervisors
  • Aliasing can also be used in any SQL query for
    convenience

77
The SELECT Statement Unspecified WHERE-clause
  • Unspecified WHERE-clause
  • A missing WHERE-clause indicates no condition
    hence, all tuples of the relations in the
    FROM-clause are selected. This is equivalent to
    the condition WHERE TRUE
  • Query 9 Retrieve the ssn values of all
    employees.
  • Q9 SELECT SSN
  • FROM EMPLOYEE
  • If more than one relation is specified in the
    FROM-clause and there is no join condition,
    then the CARTESIAN PRODUCT of tuples is selected
  • Q10 SELECT SSN, DNAME
  • FROM EMPLOYEE, DEPARTNEMT
  • It is extremely important not to overlook
    specifying any selection and join conditions in
    the WHERE-clause otherwise, incorrect and very
    large relations may result

78
The SELECT Statement DISTINCT and
  • Use of
  • To retrieve all the attribute values of the
    selected tuples, a is used, which stands for
    all the attributes.
  • Q1C SELECT
  • FROM EMPLOYEE
  • WHERE DNO 5
  • Q1D SELECT
  • FROM EMPLOYEE , DEPARTMENT
  • WHERE DNAME 'Research' AND DNO DNUMBER
  • Tables as Set
  • SQL does not treat a relation as a set duplicate
    tuples can appear. To eliminate duplicate tuples,
    the keyword DISTINCT is used.
  • Q11 SELECT SALARY
  • FROM EMPLOYEE
  • Q11A SELECT DISTINCT SALARY
  • FROM EMPLOYEE

79
The SELECT Statement Set Operations
  • Set Operations
  • SQL has directly incorporated some set
    operations. There is a union operation (UNION),
    and in some versions of SQL there are set
    difference (MINUS) and intersection (INTERSECT)
    operations
  • The resulting relations of these set operations
    are sets of tuples duplicate tuples are
    eliminated from tuples
  • The set operations apply only to union compatible
    relations the two relations must have the same
    attributes and the attributes must appear in the
    same order

80
The SELECT Statement Set Operations
  • Set Operations
  • Example
  • Query 4 Make a list of all project numbers for
    projects that involve an employee whose last name
    is 'Smith' as a worker or as a manager of the
    department that controls the project
  • Q4 (SELECT DISTINCT PNAME
  • FROM PROJECT, DEPARTMENT, EMPLOYEE
  • WHERE DNUMDNUMBER AND MGRSSN SSN AND LNAME
    'Smith')
  • UNION
  • (SELECT PNAME
  • FROM PROJECT, WORKS_ON, EMPLOYEE
  • WHERE PNUMBER PNO AND ESSN SSN AND
  • LNAME 'Smith')

81
The SELECT Statement Substring Comparision
  • Substring Comparison
  • The LIKE comparison operator is used to compare
    partial strings
  • Two reserved characters are used '' (
Write a Comment
User Comments (0)
About PowerShow.com