Class 4: Data Relatives - PowerPoint PPT Presentation

1 / 123
About This Presentation
Title:

Class 4: Data Relatives

Description:

Before a detailed examination of Relational Databases and relationship ... Can help eliminate homonyms and synonyms. A system catalog can be generated by some DMBS. ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 124
Provided by: glenndo
Category:

less

Transcript and Presenter's Notes

Title: Class 4: Data Relatives


1
Class 4Data Relatives
2
Overview of Class
  • Review of Current Knowledge
  • Review of Concepts for Project
  • Introduction to Relational Databases
  • Introduction to Mapping and Performance Tuning

3
Motivation for Studying Data Relatives
  • To provide an appreciation for how Relational
    Databases have influenced database design
  • To understand how relationship assignment or
    mapping is used in database design

4
Understanding Needed
  • Before a detailed examination of Relational
    Databases and relationship assignment is done, it
    is useful to review our acquired knowledge of
    databases thus far.
  • Moreover, a summary review will help in the
    evolution of the class project.
  • Accordingly, we begin with a summary of major
    concepts covered, along with a general
    introduction to relational databases.

5
Review of Key Database Concepts
  • The following concepts will be reviewed, to help
    us get a solid footing on our progress so far,
    facilitate development of the class project, and
    ease our introduction to relational databases
  • Database Models
  • Relationships, Types
  • Models Hierarchical, Network, Relational
    (introduction)

6
Further Database Concepts
  • Additional concepts also be reviewed
  • Tables
  • Keys
  • Operators
  • Data Dictionaries
  • E-R Modeling
  • Cardinality
  • Dependence

7
Database Models
  • A database model is a set of logical constructs
  • Database models represent a data stricture and
    relationships within the database
  • Two phases of database models are
  • Conceptual
  • Implementation

8
Conceptual Model
  • Conceptual models focus on the logical nature of
    data
  • Conceptual models are concerned with what is
    represented in the database
  • There are three representations of a conceptual
    model, each of which describes a relationship
    between entities

9
One to Many Relationship
  • The first representation is the one to many
    (labeled 1M)

10
Many to Many Relationship
  • The second representation is the one to many
    (labeled MN)

11
One to One Relationship
  • The third representation is the one to many
    (labeled 11)

12
Implementation Model
  • The other major type of database model is the
    Implementation Model
  • Implementation models emphasize how the data will
    be represented.
  • The implementation model illustrates what is
    modeled.

13
Database Architectures
  • Database architectures that can be easily
    represented with an implementation model, include
  • Hierarchical Model
  • Network Model
  • Relational Model

14
Hierarchical Model
  • The hierarchical model was developed in 1969.
    The fundamental characteristic of a hierarchical
    model is the one way, top down, parent child
    relationship
  • The hierarchical model can have 1M relationship,
    where one parent can have many children, but one
    child can have only one parent.

15
Hierarchical Example
  • Model Illustration

16
Network Model
  • The network model was developed in 1971 to
    address inconsistencies in the hierarchical
    model.
  • The network model can have a MN relationship,
    with many children per parent, and more than one
    parent possible for each child.

17
Network Model Components
  • Schema
  • conceptual view of the database from the DBAs
    perspective
  • Subschema
  • view of the database from the application
    software
  • DML (Data Manipulation Language)
  • Defines the data characteristics
  • Defines the data structure (record layout)

18
DML Components
  • Schema DDL (Data Definition Language)
  • Used to define schema
  • Sets up record types
  • Sets up attributes
  • Subschema DDL
  • Used by the development language to define
    database components
  • DML
  • Used to manipulate database contents

19
Network Structure
  • Two record types
  • owner (parent)
  • member (child)

20
Network Example
  • Model Illustration

21
Relational Model
  • Developed in 1970 by E.F. Codd
  • The relational model supported the development of
    a Relational Database Management System (RDBMS)
  • The multidimensional model developed by E.F. Codd
    preceded by many years the technology required to
    support the design.

22
Relational Definitions
  • Table
  • contain data
  • is a logical construct
  • Relation
  • is established by a common characteristic
  • characteristic must exist in each table
  • characteristic must be identical
  • usually called a primary key

23
Relational Database Model
  • Three major components of the relational database
    model include
  • Entities
  • Attributes
  • Entity Sets
  • A basic data set in Codds theory is the table.
    A table contains related entities. It can also
    be called a relation.

24
Table Characteristics
  • It is a logical construct
  • Tables consist of tuples and rows
  • Each tuple and attribute intersection contains a
    single value
  • Type of attributes support include
  • numeric
  • character
  • date
  • logical

25
More Table Characteristics
  • Each tuple must have a primary key
  • Each attribute must have a unique name
  • An attributes range of values is known as its
    domain

26
Table Details
  • Tables consist of columns and rows
  • Tuples are
  • the rows in a table
  • represent an entity
  • Entity attribute
  • is represented by the column
  • is consistent across the entities

27
Naming Conventions for Tables
  • Tables should
  • be 8 characters long
  • have meaningful names
  • Columns should
  • be 10 characters long
  • start with an alpha
  • be descriptive

28
Types of Keys
  • Superkey
  • is an attribute that uniquely identifies an
    entity
  • Candidate key
  • is a potential key

29
Other Types of Keys
  • Primary key
  • is an attribute that uniquely identifies an
    entity
  • is used to access data
  • Secondary Key
  • is also called an Alternate Key
  • is used to provide alternate access path
  • could be used to create a primary key

30
Other Types of Keys
  • Foreign Key
  • is an attribute used to provide a link to another
    table
  • Composite Key
  • contains more than one attribute
  • also known as a Concatenated Key

31
Relational Operators
  • Select
  • selects tuples (records) based on a specified
    value for an attribute
  • provides a horizontal subset of a table
  • Project
  • provides a list of all values for an attribute
    (field), and vertical subset of a table
  • Join
  • combines data from two or more tables

32
Relational Operators
  • Intersect
  • produces a listing that contains all tuples that
    are common to both tables
  • attribute characteristics must be identical
  • Union
  • combines all tuples from two tables
  • attribute characteristics must be identical
  • Difference
  • produces all tuples in one table that are not
    contained in another table

33
Relational Operators
  • Product
  • produces a list of all possible pairs from two
    tables
  • I.E. Table 1 100 Tuples
  • I.E. Table 2 5 Tuples
  • Divide
  • requires the use of one single-column table and
    one two-column table

34
Relational Database Classification
  • Fully Relational
  • supports all eight relational algebra-functions
  • enforces entity and referential integrity rules
  • Relational Complete
  • supports all eight relational algebra-functions
  • does not support the integrity rules

35
Relational Database Classification
  • Minimally Relational
  • Supports Select, Project, Join
  • Tabular
  • Supports ONLY Select, Project, Join
  • Requires all access paths be defined by the DBA

36
Data Dictionary
  • Used to provide a detailed description of all
    tables and attributes found in the database
  • Can help eliminate homonyms and synonyms
  • A system catalog can be generated by some DMBS.

37
E-R Model
  • Two types of ER Models
  • Global E-R Model
  • Application (local) E-R Model
  • Is used to develop a common vision of the
    requirements
  • Can define processing and constraint requirements
  • Can be used by the DBA to implement the database

38
E-R Components
  • Entities
  • represented by a rectangle

39
E-R Components
  • Attributes
  • represented by an oval with a line to the entity
  • Single value attribute represented by a line
  • Multi valued attribute represented by a double
    line
  • Derived attribute represented by a -- line

40
E-R Components
  • Relationship
  • An association between entities
  • Types of relationships include
  • Optional
  • Mandatory
  • Degree
  • The number of associated entities

41
E-R Model
  • The E-R model is a simplified picture of the
    relationship
  • between entities
  • between entities and the attributes

42
Relationships Weve Seen
  • 11 Relationship
  • 1M Relationship
  • MN Relationship

43
Unary Relationship
44
Binary Relationship
45
Ternary Relationship
46
Index
  • An index is composed of an index key and a series
    of pointers
  • An index key is a reference point

47
Data Model
  • A data model is a diagram of real world data
    structures
  • The modeling process takes us through a number of
    phases
  • Conceptual Data Models
  • Internal Data Models
  • External Data Models
  • Physical Data Models

48
Conceptual Data Model
  • Represents the global or enterprise view of data
  • Identifies and describes data objects
  • Can be thought of as a database blueprint
  • Is hardware independent
  • Is software independent

49
Internal Data Model
  • Adapts the conceptual model to the DBMS selected
    by the analyst
  • Is hardware independent
  • Is software dependent
  • changes in software will change database
    requirements and characteristics
  • This is a critical model for developers of a
    hierarchical or network DBMS

50
External Data Model
  • Is software development specialists view of the
    internal model
  • Is hardware independent
  • is DBMS dependent
  • Once the DBMS is selected, the external model can
    be developed

51
Physical Model
  • Describes the way the data will be stored
  • describes the tables
  • describes access paths
  • Is hardware dependent
  • The model can be dependent on the type of storage
    media used
  • Is software dependent
  • Is DBMS dependent

52
Cardinality
  • Cardinality is referred to in relationships as
    recording the number of entities occurrences.
  • Cardinality records the potential upper and lower
    limits of the entity occurrences

53
Cardinality Example
54
Existence Dependency
  • This occurs when the existence of one entity
    depends solely on the existence of another entity
  • For example, the existence of a child depends on
    whether or not a parent exists.

55
Subtypes and Supertypes
  • Within an entity, it is possible to break down
    and further categorize that entity into other
    sub-entities.
  • We refer to the all encompassing entity and the
    further categorized sub entities and Supertypes
    and Subtypes, respectively.
  • An example of a supertype is a parent, subtypes
    could be a mother, father, or stepparent.

56
Summary Review
  • Thus far, we have reviewed fundamentals of
    database design, including
  • Database Models, Relationships, Types
  • Database Types
  • Hierarchical, Network, Relational Databases
  • Other Database Concepts
  • Tables, Keys, Operators, Data Dictionaries
  • E-R Modeling, Cardinality, Dependence

57
Introduction to Data Relatives
  • At this point, we are familiar with fundamental
    design concepts required to understand the
    development of a database.
  • This prepares us for further exploration of the
    technology that has become increasingly dominant
    in database architecture, the relational database.

58
Introduction to Relational Databases
59
Relational Databases
  • Relational Databases, as weve seen, owe their
    development to the paper by E.F. Codd
  • A fundamental requirement of relational databases
    is that they use the Structure Query Language for
    Data Manipulation, Data Definition and Data
    Control Languages.

60
The Need for Relational Databases
  • As older, larger systems evolved, they became
    very cumbersome and resistant to database
    updates, mostly because of the many programs,
    written for a particular database, that would
    have to be changed along with them.
  • The workaround solution was to simply add more
    databases

61
Problems with Adding Databases
  • As new databases were continually added, all
    kinds of data redundancy crept into information
    systems, increasing the likelihood of errors.
  • Furthermore, large systems became increasingly
    difficult, and correspondingly expensive, to
    maintain.

62
The Relational Database Solution
  • By implementing relational databases,
    administrators were able to add to existing
    databases, while maintaining data integrity, and
    not needing to redesign programs which access the
    data.

63
Relational Database Benefits
  • Relational Databases serve a number of purposes
    including
  • provides greater structure and analytical
    capabilities for measuring data processing
    performance
  • aids in the design of databases by providing a
    design model
  • increases data storage and access efficiency for
    widespread applications

64
Relational Database Theory
  • Database theory concerns itself with three areas
    of data management
  • Data Structure
  • Data Integrity
  • Data Manipulation

65
Data Management - Structure
  • Data is viewed as a data element, known as atomic
    (not possible to break down further)
  • Data elements can be contained in a domain (range
    of valid values)
  • Data in a domain can be related with operators
    (mathematical comparisons)
  • Data element types are attributes
  • Related data attributes comprise a relation. One
    occurrence is a tuple.

66
Data Management - Integrity
  • A base table represents the stored relations of
    data, and use these integrity rules
  • Each relation has a series of candidate keys
    (attributes that ID tuples)
  • Each relation has one candidate key which serves
    as primary (unique) key
  • Any attribute of a relation can be a foreign key
    (primary key of another table)
  • Each relation can have many alternate keys (non
    primary candidate keys)

67
Data Management - Manipulation
  • Once tables representing relations are created,
    data must be accessed using operators.
  • A data access language used to run the operations
    using operators is Structured Query Language
    (SQL).

68
SQL
  • SQL is divided into
  • Data Definition Language
  • concerned with the creation of tables
  • Data Control Language
  • concerned with security and data access control
  • Data Manipulation Language
  • concerned with storing, modifying, and retrieving
    data
  • SQL is executed by simple commands

69
Key SQL Commands
  • Queries begin with SELECT, and must come FROM
    somewhere
  • WHERE adds conditions to refine search
  • To relate two or more tables, use the JOIN command

70
Specific Advantages of Relational Databases
  • Facilitates adhoc, dynamic information queries
  • Generally user friendly, consistent interface for
    data access
  • No navigation is needed to search data, simply
    select relationships for analysis
  • Suited for distributed systems due to data access
    independence.

71
Examples in Text
  • DB2
  • Highly popular system, runs primarily on
    mainframe platforms.
  • Very robust design, handles high rates of
    transaction processing.
  • Informix
  • Geared to UNIX operating systems
  • Allows for manipulation of graphical objects in a
    database environment.

72
Relational Database Review
  • Data Description
  • Tables, tuples, attributes
  • Organization
  • Collections of autonomous files, logically
    related via common field values
  • Data Constructs
  • Relational Data Definition Language

73
Relational Database Navigational Capabilities
  • Search Capability
  • Simple and complex search
  • Cross Reference
  • via the SQL JOIN
  • Mapping
  • Dynamic only
  • Processing
  • Set at a time

74
Relational Databases Summary
  • The dramatic improvements in system technology to
    support distributed, client server architecture,
    vastly greater storage, memory and processing
    capabilities, have made the arrival of relational
    databases on complex networks a reality.
  • Combined with the inherent advantages of
    relational design, it is clear why this
    technology has become so prominent.

75
Introduction to Mapping and Performance Tuning
76
Introduction to Mapping and Performance Tuning
  • Remember that mapping refers to relationship
    assignments and performance tuning relates to the
    physical stage of database implementation.
  • These are critical elements in the design process
    which we will break into 3 phases
  • Conceptual Design
  • Construct Design
  • Physical Design

77
Design Phases
  • Conceptual Design
  • Preliminary Design Phase, an abstract
    representation of a real system
  • Construct Design
  • Defining base constructs and mapping their
    relationships
  • Physical Design
  • Take results of construct design to do a real
    database implementation

78
CONCEPTUAL DESIGN
  • During this phase, we are concerned with
    understanding reality of an information system
    from the perspective of the users.
  • We will focus on
  • Input - user interviews
  • Output - data models
  • Other Uses for Conceptual Design

79
Conceptual Design - Inputs
  • Information gathered to prepare a conceptual
    design is derived through preliminary
    investigation and fact finding. This includes
    accumulation on inputs.

80
Conceptual Design - Input Gathering
  • Input needed for conceptual design is primarily
    accumulated through user interviews. This
    includes
  • Informal Interviews discussions with users and
    participants
  • JAD Sessions group represented from different
    functional areas affected by a system
  • Surveys questionnaires designed to solicit
    feedback on existing systems

81
Conceptual Design - Output
  • The results of these inputs are building blocks
    for completing the conceptual design. This is
    needed for the data models which represent that
    design. Methods we have seen are
  • Informal conceptual design
  • Entity / Relationship Modeling
  • Semantic Data Modeling
  • Object Oriented Modeling

82
Conceptual Design - Other Applications
  • Recall that conceptual design can be utilized for
    a number of other purposes in addition to basic
    design. This included
  • Data Administration
  • The use of CASE tools in automating design
  • Data Repositories

83
CONSTRUCT DESIGN
  • During this phase, we define the base constructs,
    called base construct assignment, and map the
    relationships between them, called relationship
    assignment.
  • Construct design is largely concerned with the
    architecture profile.

84
The Construct Design Process
  • To evaluate this process, we will look at
  • Inputs Needed for Construct Design
  • The Steps in the Construct Design Process
  • Outputs from Construct Design

85
Construct Design - Inputs
  • Input required to progress from the conceptual
    stage to the construct stage is primarily one of
    documentation. This includes
  • Conceptual design documents those documents
    generated from the conceptual design stage
  • Database architecture information parameters
    about the architecture used

86
Construct Design Process
  • The steps included in construct design are
  • Transforming models into constructs
  • Definition of base constructs (assignments)
  • Definition of mapping relationships (logical or
    physical)
  • Normalization (from atomization to 0th, 1st, 2nd,
    and 3rd normal form)

87
Transforming models into constructs
  • The first step is to turn the logically defind
    entities and objects into base construct
    definitions.
  • The second step is to determine how the
    relationships that the model shows are going to
    be capture within the structure of the database

88
Definition of base constructs (assignments)
  • Definition of constructs include the following
  • Names
  • Elements
  • Primary Keys

89
Definition of mapping relationships
  • Defining mappings can be done as either logical
    or physical mappings.
  • Logical mappings are done for the xbase, inverted
    list, and relational database architectures.
  • Physical mappings are done for the hierarchical
    and network architectures.

90
Logical Mappings
  • Examples of logical mappings
  • 1m Foreign key assignment
  • Mn Bridge constructs
  • We use this approach for our project

91
Physical Mappings
  • Examples of physical mappings
  • Hierarchy diagrams
  • Network diagrams

92
Normalization
  • Normalization is a formal approach to applying a
    set of rules used in associating attributes with
    entities. The set of rules is designed to help
    the designer convert the logical entities into a
    conceptual model for the database design.

93
Normalization Process
  • During normalization, the structure of the
    logical model that may translate into undesirable
    properties in the physical model is examined and
    resolved.

94
Why Normalization?
  • The normalization process is used to ensure that
    the conceptual model of the database will work.
    An un-normalized model can be implemented, but it
    will present problems in application development
    and data manipulation operations.

95
Benefits of Normalization
  • A normalized model is more flexible and better
    able to meet a wide range of end-user application
    requirements with minimal structural change. New
    applications are less likely to force a database
    design change.

96
Benefits of Normalization
  • Normalization reduces redundant data, minimizing
    the amount of disk storage required to store the
    data and making it easier to maintain accurate
    and consistent data.

97
Benefits of Normalization
  • A simple and logical design can result in
    increased programmer productivity. Normalization
    reduces the maintenance costs for an application
    because changes to the application are easier.

98
Summary Benefits
  • Greater flexibility
  • Ensures attributes are placed in proper tables
  • Reduces data redundancy
  • Increases programmer productivity
  • Decreases application maintenance costs
  • Maximizes stability of the data model

99
De-Normalizing
  • Although there are benefits, do not
    over-normalize. A normalized data model might not
    meet all design objectives.
  • You can use selective de-normalization in some
    areas of the data model to increase performance.

100
Rules
  • Normalization includes several rules for
    converting the logical design to a more effective
    physical design. These rules are referred to as
    the normal forms.

101
Normal Forms
  • There are several normal forms used to organize
    data. The first three forms
  • first normal form (1NF),
  • second normal form (2NF), and
  • third normal form (3NF)
  • these are the most commonly used.

102
Normalization in Order
  • Each normal form constrains the data to be more
    organized than the previous form.
  • For this reason, each normal form must be
    achieved in turn before the next normal form can
    be applied.

103
First Normal Form
  • An entity relationship is in first normal form if
    there are no repeating groups (domains).
  • Each entity must have a fixed number of
    single-valued attributes.
  • A table that is not in first normal form is less
    flexible and can also waste disk storage space.
    It can also make data searches more difficult.

104
1NF Process
  • To put this order entity into first normal form,
    separate the entity into two.
  • The first entity removes the repeating groups.
  • The second entity has a single copy of the
    attribute group, together with a new primary key
    and a foreign key reference to the first entity.
    Note Example

105
1NF Process
  • Notice that the restriction on the number of
    order items was removed. Only items that are
    ordered now use disk storage space. Also,
    searching for a particular item on order requires
    a search on only one attribute.

106
Second Normal Form
  • An entity relationship is in second normal form
    if it is in first normal form and all its
    attributes depend on the whole primary key.
  • Remember that the primary key is a minimal set of
    attributes that uniquely identify an entity
    instance.

107
Second Normal Form
  • Second normal form requires that every attribute
    must be fully functionally dependent on the
    primary key.
  • Functional dependence means there is a link
    between the values in the different attributes.

108
Third Normal Form
  • An entity relationship is in third normal form
    and
  • if it is in second normal form and all its
    attributes depend on the whole primary key and
    nothing but the primary key.

109
Transitive Dependence
  • This eliminates attributes that not only depend
    on the whole primary key but also on other
    non-key attributes, which in turn depend on the
    whole primary key.
  • This is known as transitive (indirect) dependence.

110
Outputs
  • Output in the construct design phase involves the
    production of a construct layout document.
    Diagrams which can be used to achieve this are
  • Studer Diagrams
  • Hierarchy Diagrams
  • Network Diagrams

111
PHYSICAL DESIGN
  • During this phase, we carry out the actual
    implementation of the design.
  • This also is refereed to as construct tuning
  • We are primarily concerned with the storage
    profile.
  • We do not concern ourselves with this stage in
    the project, but we need to be aware of it.

112
Physical Design - Inputs
  • Documentation needed to follow from the construct
    design phase to the physical design phase
    include
  • Construct Layouts
  • Database Product Information
  • Performance Statistics

113
Inputs
  • Construct Layouts are a deliverable from the
    construct phase and are needed to determine how
    the structure will drive storage requirements
  • Database Product Information will also include
    information on recommended storage requirements,
    minimum specifications, recommended processing
    capacity, and so forth.

114
Performance Statistics
  • Existing performance statistics are critical to
    optimize the database design to the current
    hardware and software environment. These
    statistics include
  • Complexity
  • Volume
  • Transactions

115
Analysis of Statistics
  • Complexity largely referring to the number of
    constructs and cross construct capability
  • Volume derived by multiplying number of records
    per construct, their widths, block size, and
    number of blocks, then adding the product for all
    the constructs
  • Transactions calculations based on transactions
    per given unit of time to determine access rates.
    (see p 489)

116
Physical Design - Processes
  • After gather the necessary information or input,
    a series of processes are required to implement
    the physical design. These are
  • General configuration
  • Direct access optimization
  • Scan access optimization
  • Nonkey search access optimization

117
Continuing Processes in Physical Design
  • Further processes include
  • Cross-construct optimization
  • Grouping constructs
  • Overall system optimization

118
Physical Design Validation
  • After completing the implementation of the
    physical database design, it is necessary to
    verify and validate the success of this design.
  • A series of tests and measures can be taken to
    achieve this, including
  • benchmarking
  • volume testing
  • pressure testing
  • prototyping

119
Benchmarking
  • This involves general testing under normal
    conditions to see what performance statistics can
    be reasonably expected in the future
  • This is useful to see if a system is performing
    in a less than optimal way.

120
Volume Testing
  • Exceeding the maximum designed specification for
    volume is a kind of stress test which tells the
    designer if the system can handle the job.

121
Pressure Testing
  • Similar to volume testing, though pressure
    testing is concerned with high transaction rates,
    rather than large amounts of data over time.
  • Both pressure and volume testing can reveal
    shortcomings in design before they become a
    problem.

122
Prototyping
  • This is merely the development of a small scale
    version of an entire system. This is often done
    before a full system roll out, and can even be
    initiated to test the concept of a system before
    deciding to implement it.

123
Mapping and Performance Tuning Summary
  • Clearly, a rigorous process is required to
    implement a database design successfully.
  • The steps required in the conceptual, construct,
    and physical stages reflect this complexity, and
    underline the need to take great care before
    declaring the implementation a success.
Write a Comment
User Comments (0)
About PowerShow.com