Title: Class 4: Data Relatives
1Class 4Data Relatives
2Overview of Class
- Review of Current Knowledge
- Review of Concepts for Project
- Introduction to Relational Databases
- Introduction to Mapping and Performance Tuning
3Motivation for Studying Data Relatives
- To provide an appreciation for how Relational
Databases have influenced database design - To understand how relationship assignment or
mapping is used in database design
4Understanding Needed
- Before a detailed examination of Relational
Databases and relationship assignment is done, it
is useful to review our acquired knowledge of
databases thus far. - Moreover, a summary review will help in the
evolution of the class project. - Accordingly, we begin with a summary of major
concepts covered, along with a general
introduction to relational databases.
5Review of Key Database Concepts
- The following concepts will be reviewed, to help
us get a solid footing on our progress so far,
facilitate development of the class project, and
ease our introduction to relational databases - Database Models
- Relationships, Types
- Models Hierarchical, Network, Relational
(introduction)
6Further Database Concepts
- Additional concepts also be reviewed
- Tables
- Keys
- Operators
- Data Dictionaries
- E-R Modeling
- Cardinality
- Dependence
7Database Models
- A database model is a set of logical constructs
- Database models represent a data stricture and
relationships within the database - Two phases of database models are
- Conceptual
- Implementation
8Conceptual Model
- Conceptual models focus on the logical nature of
data - Conceptual models are concerned with what is
represented in the database - There are three representations of a conceptual
model, each of which describes a relationship
between entities
9One to Many Relationship
- The first representation is the one to many
(labeled 1M)
10Many to Many Relationship
- The second representation is the one to many
(labeled MN)
11One to One Relationship
- The third representation is the one to many
(labeled 11)
12Implementation Model
- The other major type of database model is the
Implementation Model - Implementation models emphasize how the data will
be represented. - The implementation model illustrates what is
modeled.
13Database Architectures
- Database architectures that can be easily
represented with an implementation model, include - Hierarchical Model
- Network Model
- Relational Model
14Hierarchical Model
- The hierarchical model was developed in 1969.
The fundamental characteristic of a hierarchical
model is the one way, top down, parent child
relationship - The hierarchical model can have 1M relationship,
where one parent can have many children, but one
child can have only one parent.
15Hierarchical Example
16Network Model
- The network model was developed in 1971 to
address inconsistencies in the hierarchical
model. - The network model can have a MN relationship,
with many children per parent, and more than one
parent possible for each child.
17Network Model Components
- Schema
- conceptual view of the database from the DBAs
perspective - Subschema
- view of the database from the application
software - DML (Data Manipulation Language)
- Defines the data characteristics
- Defines the data structure (record layout)
18DML Components
- Schema DDL (Data Definition Language)
- Used to define schema
- Sets up record types
- Sets up attributes
- Subschema DDL
- Used by the development language to define
database components - DML
- Used to manipulate database contents
19Network Structure
- Two record types
- owner (parent)
- member (child)
20Network Example
21Relational Model
- Developed in 1970 by E.F. Codd
- The relational model supported the development of
a Relational Database Management System (RDBMS) - The multidimensional model developed by E.F. Codd
preceded by many years the technology required to
support the design.
22Relational Definitions
- Table
- contain data
- is a logical construct
- Relation
- is established by a common characteristic
- characteristic must exist in each table
- characteristic must be identical
- usually called a primary key
23Relational Database Model
- Three major components of the relational database
model include - Entities
- Attributes
- Entity Sets
- A basic data set in Codds theory is the table.
A table contains related entities. It can also
be called a relation.
24Table Characteristics
- It is a logical construct
- Tables consist of tuples and rows
- Each tuple and attribute intersection contains a
single value - Type of attributes support include
- numeric
- character
- date
- logical
25More Table Characteristics
- Each tuple must have a primary key
- Each attribute must have a unique name
- An attributes range of values is known as its
domain
26Table Details
- Tables consist of columns and rows
- Tuples are
- the rows in a table
- represent an entity
- Entity attribute
- is represented by the column
- is consistent across the entities
27Naming Conventions for Tables
- Tables should
- be 8 characters long
- have meaningful names
- Columns should
- be 10 characters long
- start with an alpha
- be descriptive
28Types of Keys
- Superkey
- is an attribute that uniquely identifies an
entity - Candidate key
- is a potential key
29Other Types of Keys
- Primary key
- is an attribute that uniquely identifies an
entity - is used to access data
- Secondary Key
- is also called an Alternate Key
- is used to provide alternate access path
- could be used to create a primary key
30Other Types of Keys
- Foreign Key
- is an attribute used to provide a link to another
table - Composite Key
- contains more than one attribute
- also known as a Concatenated Key
31Relational Operators
- Select
- selects tuples (records) based on a specified
value for an attribute - provides a horizontal subset of a table
- Project
- provides a list of all values for an attribute
(field), and vertical subset of a table - Join
- combines data from two or more tables
32Relational Operators
- Intersect
- produces a listing that contains all tuples that
are common to both tables - attribute characteristics must be identical
- Union
- combines all tuples from two tables
- attribute characteristics must be identical
- Difference
- produces all tuples in one table that are not
contained in another table
33Relational Operators
- Product
- produces a list of all possible pairs from two
tables - I.E. Table 1 100 Tuples
- I.E. Table 2 5 Tuples
- Divide
- requires the use of one single-column table and
one two-column table
34Relational Database Classification
- Fully Relational
- supports all eight relational algebra-functions
- enforces entity and referential integrity rules
- Relational Complete
- supports all eight relational algebra-functions
- does not support the integrity rules
35Relational Database Classification
- Minimally Relational
- Supports Select, Project, Join
- Tabular
- Supports ONLY Select, Project, Join
- Requires all access paths be defined by the DBA
36Data Dictionary
- Used to provide a detailed description of all
tables and attributes found in the database - Can help eliminate homonyms and synonyms
- A system catalog can be generated by some DMBS.
37E-R Model
- Two types of ER Models
- Global E-R Model
- Application (local) E-R Model
- Is used to develop a common vision of the
requirements - Can define processing and constraint requirements
- Can be used by the DBA to implement the database
38E-R Components
- Entities
- represented by a rectangle
39E-R Components
- Attributes
- represented by an oval with a line to the entity
- Single value attribute represented by a line
- Multi valued attribute represented by a double
line - Derived attribute represented by a -- line
40E-R Components
- Relationship
- An association between entities
- Types of relationships include
- Optional
- Mandatory
- Degree
- The number of associated entities
41E-R Model
- The E-R model is a simplified picture of the
relationship - between entities
- between entities and the attributes
42Relationships Weve Seen
- 11 Relationship
- 1M Relationship
- MN Relationship
43Unary Relationship
44Binary Relationship
45Ternary Relationship
46Index
- An index is composed of an index key and a series
of pointers - An index key is a reference point
47Data Model
- A data model is a diagram of real world data
structures - The modeling process takes us through a number of
phases - Conceptual Data Models
- Internal Data Models
- External Data Models
- Physical Data Models
48Conceptual Data Model
- Represents the global or enterprise view of data
- Identifies and describes data objects
- Can be thought of as a database blueprint
- Is hardware independent
- Is software independent
49Internal Data Model
- Adapts the conceptual model to the DBMS selected
by the analyst - Is hardware independent
- Is software dependent
- changes in software will change database
requirements and characteristics - This is a critical model for developers of a
hierarchical or network DBMS
50External Data Model
- Is software development specialists view of the
internal model - Is hardware independent
- is DBMS dependent
- Once the DBMS is selected, the external model can
be developed
51Physical Model
- Describes the way the data will be stored
- describes the tables
- describes access paths
- Is hardware dependent
- The model can be dependent on the type of storage
media used - Is software dependent
- Is DBMS dependent
52Cardinality
- Cardinality is referred to in relationships as
recording the number of entities occurrences. - Cardinality records the potential upper and lower
limits of the entity occurrences
53Cardinality Example
54Existence Dependency
- This occurs when the existence of one entity
depends solely on the existence of another entity - For example, the existence of a child depends on
whether or not a parent exists.
55Subtypes and Supertypes
- Within an entity, it is possible to break down
and further categorize that entity into other
sub-entities. - We refer to the all encompassing entity and the
further categorized sub entities and Supertypes
and Subtypes, respectively. - An example of a supertype is a parent, subtypes
could be a mother, father, or stepparent.
56Summary Review
- Thus far, we have reviewed fundamentals of
database design, including - Database Models, Relationships, Types
- Database Types
- Hierarchical, Network, Relational Databases
- Other Database Concepts
- Tables, Keys, Operators, Data Dictionaries
- E-R Modeling, Cardinality, Dependence
57Introduction to Data Relatives
- At this point, we are familiar with fundamental
design concepts required to understand the
development of a database. - This prepares us for further exploration of the
technology that has become increasingly dominant
in database architecture, the relational database.
58Introduction to Relational Databases
59Relational Databases
- Relational Databases, as weve seen, owe their
development to the paper by E.F. Codd - A fundamental requirement of relational databases
is that they use the Structure Query Language for
Data Manipulation, Data Definition and Data
Control Languages.
60The Need for Relational Databases
- As older, larger systems evolved, they became
very cumbersome and resistant to database
updates, mostly because of the many programs,
written for a particular database, that would
have to be changed along with them. - The workaround solution was to simply add more
databases
61Problems with Adding Databases
- As new databases were continually added, all
kinds of data redundancy crept into information
systems, increasing the likelihood of errors. - Furthermore, large systems became increasingly
difficult, and correspondingly expensive, to
maintain.
62The Relational Database Solution
- By implementing relational databases,
administrators were able to add to existing
databases, while maintaining data integrity, and
not needing to redesign programs which access the
data.
63Relational Database Benefits
- Relational Databases serve a number of purposes
including - provides greater structure and analytical
capabilities for measuring data processing
performance - aids in the design of databases by providing a
design model - increases data storage and access efficiency for
widespread applications
64Relational Database Theory
- Database theory concerns itself with three areas
of data management - Data Structure
- Data Integrity
- Data Manipulation
65Data Management - Structure
- Data is viewed as a data element, known as atomic
(not possible to break down further) - Data elements can be contained in a domain (range
of valid values) - Data in a domain can be related with operators
(mathematical comparisons) - Data element types are attributes
- Related data attributes comprise a relation. One
occurrence is a tuple.
66Data Management - Integrity
- A base table represents the stored relations of
data, and use these integrity rules - Each relation has a series of candidate keys
(attributes that ID tuples) - Each relation has one candidate key which serves
as primary (unique) key - Any attribute of a relation can be a foreign key
(primary key of another table) - Each relation can have many alternate keys (non
primary candidate keys)
67Data Management - Manipulation
- Once tables representing relations are created,
data must be accessed using operators. - A data access language used to run the operations
using operators is Structured Query Language
(SQL).
68SQL
- SQL is divided into
- Data Definition Language
- concerned with the creation of tables
- Data Control Language
- concerned with security and data access control
- Data Manipulation Language
- concerned with storing, modifying, and retrieving
data - SQL is executed by simple commands
69Key SQL Commands
- Queries begin with SELECT, and must come FROM
somewhere - WHERE adds conditions to refine search
- To relate two or more tables, use the JOIN command
70Specific Advantages of Relational Databases
- Facilitates adhoc, dynamic information queries
- Generally user friendly, consistent interface for
data access - No navigation is needed to search data, simply
select relationships for analysis - Suited for distributed systems due to data access
independence.
71Examples in Text
- DB2
- Highly popular system, runs primarily on
mainframe platforms. - Very robust design, handles high rates of
transaction processing. - Informix
- Geared to UNIX operating systems
- Allows for manipulation of graphical objects in a
database environment.
72Relational Database Review
- Data Description
- Tables, tuples, attributes
- Organization
- Collections of autonomous files, logically
related via common field values - Data Constructs
- Relational Data Definition Language
73Relational Database Navigational Capabilities
- Search Capability
- Simple and complex search
- Cross Reference
- via the SQL JOIN
- Mapping
- Dynamic only
- Processing
- Set at a time
74Relational Databases Summary
- The dramatic improvements in system technology to
support distributed, client server architecture,
vastly greater storage, memory and processing
capabilities, have made the arrival of relational
databases on complex networks a reality. - Combined with the inherent advantages of
relational design, it is clear why this
technology has become so prominent.
75Introduction to Mapping and Performance Tuning
76Introduction to Mapping and Performance Tuning
- Remember that mapping refers to relationship
assignments and performance tuning relates to the
physical stage of database implementation. - These are critical elements in the design process
which we will break into 3 phases - Conceptual Design
- Construct Design
- Physical Design
77Design Phases
- Conceptual Design
- Preliminary Design Phase, an abstract
representation of a real system - Construct Design
- Defining base constructs and mapping their
relationships - Physical Design
- Take results of construct design to do a real
database implementation
78CONCEPTUAL DESIGN
- During this phase, we are concerned with
understanding reality of an information system
from the perspective of the users. - We will focus on
- Input - user interviews
- Output - data models
- Other Uses for Conceptual Design
79Conceptual Design - Inputs
- Information gathered to prepare a conceptual
design is derived through preliminary
investigation and fact finding. This includes
accumulation on inputs.
80Conceptual Design - Input Gathering
- Input needed for conceptual design is primarily
accumulated through user interviews. This
includes - Informal Interviews discussions with users and
participants - JAD Sessions group represented from different
functional areas affected by a system - Surveys questionnaires designed to solicit
feedback on existing systems
81Conceptual Design - Output
- The results of these inputs are building blocks
for completing the conceptual design. This is
needed for the data models which represent that
design. Methods we have seen are - Informal conceptual design
- Entity / Relationship Modeling
- Semantic Data Modeling
- Object Oriented Modeling
82Conceptual Design - Other Applications
- Recall that conceptual design can be utilized for
a number of other purposes in addition to basic
design. This included - Data Administration
- The use of CASE tools in automating design
- Data Repositories
83CONSTRUCT DESIGN
- During this phase, we define the base constructs,
called base construct assignment, and map the
relationships between them, called relationship
assignment. - Construct design is largely concerned with the
architecture profile.
84The Construct Design Process
- To evaluate this process, we will look at
- Inputs Needed for Construct Design
- The Steps in the Construct Design Process
- Outputs from Construct Design
85Construct Design - Inputs
- Input required to progress from the conceptual
stage to the construct stage is primarily one of
documentation. This includes - Conceptual design documents those documents
generated from the conceptual design stage - Database architecture information parameters
about the architecture used
86Construct Design Process
- The steps included in construct design are
- Transforming models into constructs
- Definition of base constructs (assignments)
- Definition of mapping relationships (logical or
physical) - Normalization (from atomization to 0th, 1st, 2nd,
and 3rd normal form)
87Transforming models into constructs
- The first step is to turn the logically defind
entities and objects into base construct
definitions. - The second step is to determine how the
relationships that the model shows are going to
be capture within the structure of the database
88Definition of base constructs (assignments)
- Definition of constructs include the following
- Names
- Elements
- Primary Keys
89Definition of mapping relationships
- Defining mappings can be done as either logical
or physical mappings. - Logical mappings are done for the xbase, inverted
list, and relational database architectures. - Physical mappings are done for the hierarchical
and network architectures.
90Logical Mappings
- Examples of logical mappings
- 1m Foreign key assignment
- Mn Bridge constructs
- We use this approach for our project
91Physical Mappings
- Examples of physical mappings
- Hierarchy diagrams
- Network diagrams
92Normalization
- Normalization is a formal approach to applying a
set of rules used in associating attributes with
entities. The set of rules is designed to help
the designer convert the logical entities into a
conceptual model for the database design.
93Normalization Process
- During normalization, the structure of the
logical model that may translate into undesirable
properties in the physical model is examined and
resolved.
94Why Normalization?
- The normalization process is used to ensure that
the conceptual model of the database will work.
An un-normalized model can be implemented, but it
will present problems in application development
and data manipulation operations.
95Benefits of Normalization
- A normalized model is more flexible and better
able to meet a wide range of end-user application
requirements with minimal structural change. New
applications are less likely to force a database
design change.
96Benefits of Normalization
- Normalization reduces redundant data, minimizing
the amount of disk storage required to store the
data and making it easier to maintain accurate
and consistent data.
97Benefits of Normalization
- A simple and logical design can result in
increased programmer productivity. Normalization
reduces the maintenance costs for an application
because changes to the application are easier.
98Summary Benefits
- Greater flexibility
- Ensures attributes are placed in proper tables
- Reduces data redundancy
- Increases programmer productivity
- Decreases application maintenance costs
- Maximizes stability of the data model
99De-Normalizing
- Although there are benefits, do not
over-normalize. A normalized data model might not
meet all design objectives. - You can use selective de-normalization in some
areas of the data model to increase performance.
100Rules
- Normalization includes several rules for
converting the logical design to a more effective
physical design. These rules are referred to as
the normal forms.
101Normal Forms
- There are several normal forms used to organize
data. The first three forms - first normal form (1NF),
- second normal form (2NF), and
- third normal form (3NF)
- these are the most commonly used.
102Normalization in Order
- Each normal form constrains the data to be more
organized than the previous form. - For this reason, each normal form must be
achieved in turn before the next normal form can
be applied.
103First Normal Form
- An entity relationship is in first normal form if
there are no repeating groups (domains). - Each entity must have a fixed number of
single-valued attributes. - A table that is not in first normal form is less
flexible and can also waste disk storage space.
It can also make data searches more difficult.
1041NF Process
- To put this order entity into first normal form,
separate the entity into two. - The first entity removes the repeating groups.
- The second entity has a single copy of the
attribute group, together with a new primary key
and a foreign key reference to the first entity.
Note Example
1051NF Process
- Notice that the restriction on the number of
order items was removed. Only items that are
ordered now use disk storage space. Also,
searching for a particular item on order requires
a search on only one attribute.
106Second Normal Form
- An entity relationship is in second normal form
if it is in first normal form and all its
attributes depend on the whole primary key. - Remember that the primary key is a minimal set of
attributes that uniquely identify an entity
instance.
107Second Normal Form
- Second normal form requires that every attribute
must be fully functionally dependent on the
primary key. - Functional dependence means there is a link
between the values in the different attributes.
108Third Normal Form
- An entity relationship is in third normal form
and - if it is in second normal form and all its
attributes depend on the whole primary key and
nothing but the primary key.
109Transitive Dependence
- This eliminates attributes that not only depend
on the whole primary key but also on other
non-key attributes, which in turn depend on the
whole primary key. - This is known as transitive (indirect) dependence.
110Outputs
- Output in the construct design phase involves the
production of a construct layout document.
Diagrams which can be used to achieve this are - Studer Diagrams
- Hierarchy Diagrams
- Network Diagrams
111PHYSICAL DESIGN
- During this phase, we carry out the actual
implementation of the design. - This also is refereed to as construct tuning
- We are primarily concerned with the storage
profile. - We do not concern ourselves with this stage in
the project, but we need to be aware of it.
112Physical Design - Inputs
- Documentation needed to follow from the construct
design phase to the physical design phase
include - Construct Layouts
- Database Product Information
- Performance Statistics
113Inputs
- Construct Layouts are a deliverable from the
construct phase and are needed to determine how
the structure will drive storage requirements - Database Product Information will also include
information on recommended storage requirements,
minimum specifications, recommended processing
capacity, and so forth.
114Performance Statistics
- Existing performance statistics are critical to
optimize the database design to the current
hardware and software environment. These
statistics include - Complexity
- Volume
- Transactions
115Analysis of Statistics
- Complexity largely referring to the number of
constructs and cross construct capability - Volume derived by multiplying number of records
per construct, their widths, block size, and
number of blocks, then adding the product for all
the constructs - Transactions calculations based on transactions
per given unit of time to determine access rates.
(see p 489)
116Physical Design - Processes
- After gather the necessary information or input,
a series of processes are required to implement
the physical design. These are - General configuration
- Direct access optimization
- Scan access optimization
- Nonkey search access optimization
117Continuing Processes in Physical Design
- Further processes include
- Cross-construct optimization
- Grouping constructs
- Overall system optimization
118Physical Design Validation
- After completing the implementation of the
physical database design, it is necessary to
verify and validate the success of this design. - A series of tests and measures can be taken to
achieve this, including - benchmarking
- volume testing
- pressure testing
- prototyping
119Benchmarking
- This involves general testing under normal
conditions to see what performance statistics can
be reasonably expected in the future - This is useful to see if a system is performing
in a less than optimal way.
120Volume Testing
- Exceeding the maximum designed specification for
volume is a kind of stress test which tells the
designer if the system can handle the job.
121Pressure Testing
- Similar to volume testing, though pressure
testing is concerned with high transaction rates,
rather than large amounts of data over time. - Both pressure and volume testing can reveal
shortcomings in design before they become a
problem.
122Prototyping
- This is merely the development of a small scale
version of an entire system. This is often done
before a full system roll out, and can even be
initiated to test the concept of a system before
deciding to implement it.
123Mapping and Performance Tuning Summary
- Clearly, a rigorous process is required to
implement a database design successfully. - The steps required in the conceptual, construct,
and physical stages reflect this complexity, and
underline the need to take great care before
declaring the implementation a success.