Title: Relational Data Analysis
1Relational Data Analysis
- Learning outcomes
- understand the process of normalisation
- perform Relational Data Analysis
- recognise the importance of normalised databases
- recognise first, second and third normal forms
- augment and formalise your understanding of
Logical Data Modelling - decide when in the development cycle to perform
normalisation - evaluate the integrity of a data structure.
2Relational Data AnalysisConcepts
- Relations, Tables and Entity Types
- Repeating Groups and Levels
- Functional Dependencies
- Advantages of Normalisation
- Unnormalised Data
- First, Second and Third Normal Forms
- Rationalising 3NF Tables
- Converting 3NF Tables into an LDS
3Relational Data AnalysisRelations and Tables
1970s Edgar Codd IBM
Attribute Names
Primary Key
Columns
Rows
4Relational Data AnalysisTables and Entity Types
- Supplier
- Supplier Number
- Supplier Name
- Supplier Address
- Supplier Tel. Number
- Supplier Contact Name
SUPPLIER (Supplier Number, Supplier Name,
Supplier Address,Supplier Tel. Number, Supplier
Contact Name)
5Relational Data AnalysisRepeating Groups and
Levels
Products Suppliers List Product Number
993201 Product Name Unbranded Blank 3hr Video
Tapes Product Type Code BV Product Type Name
Blanc Video Supplier Supplier
Suppliers Cost
Main Supplier Number Name
Product Ref No Price
Y/N 1463 SRW 3HVHS
54p Y 3628 Videos Are Us
438893 57p N 2327
Bella Sonic 3485VHS/3 53p
N
6Relational Data AnalysisRepeating Groups and
Levels
- PRODUCTS SUPPLIERS
- Product Number
- Product Name
- Product Type Code
- Product Type Name
- Supplier Number
- Supplier Name
- S/P Ref. Number
- Cost Price
- Main Supplier Y/N
level 1 1 1 1 2 2 2 2 2
7Relational Data AnalysisFunctional Dependencies
- An attribute X is said to be functionally
dependent on an attribute Y if each value of Y is
associated with only one value of X - For example, each product is of one product type.
This means that each product number is associated
with only one product type code. Product type
code is therefore functionally dependent on
product number. The opposite is not true as each
product type code may be associated with many
product numbers
8Relational Data AnalysisFunctional Dependencies
- Another way of phrasing functional dependency is
to say that the value of X can be determined from
the value of Y, or that Y functionally determines
X - So product number functionally determines the
value of product type code, or the value of
product type code can be determined from the
value of the product number, i.e. given the value
of a product number we can always establish the
value of the associated product type code.
9Relational Data AnalysisNormalisation
- The process of normalisation involves applying a
series of refinements to groups of data items in
order to produce tables that conform to specified
standards, known as normal forms - Unnormalised tables are converted to First Normal
Form by removing repeating groups into separate
tables. Second and Third Normal Forms are
achieved by reducing and splitting tables so that
the only functional dependencies which exist are
between the primary keys and the remaining
non-key attributes.
10Relational Data AnalysisAdvantages of
Normalisation
- Before describing normalisation in detail it is
worth mentioning some of its advantages briefly.
Data in Third Normal Form (3NF) consists of
tables of closely associated attributes which are
entirely dependent on the key, the whole key,
and nothing but the key - This has the effect of minimising data
duplication across different tables, thereby
resolving many of the problems associated with
data redundancy. In particular it should reduce
the incidence of update anomalies
11Relational Data AnalysisAdvantages of
Normalisation
- Update anomalies is the collective term for
problems with modifying, inserting and deleting
data from a database - These can be illustrated by considering the
unnormalised contents of the list of a
products suppliers again - If we were to implement this data structure as it
stands, and to use it as the only place in which
product details were stored we would encounter
the following problems
12Relational Data AnalysisAdvantages of
Normalisation
- Insertion Anomalies. No new suppliers could be
added to the system without adding a product - Deletion Anomalies. If the last remaining
product for a given supplier were deleted, then
all information on that supplier would be lost - Amendment Anomalies. Any change to a suppliers
details (e.g. to the telephone number) would mean
that every product for that supplier would need
amending to keep it in line.
13Relational Data Analysis UNF to 1NF
Choose data items
Identify keys
Split groups
1NF
UNF
level
- Product Number
- Product Name
- Product Type Code
- Product Type Name
- Supplier Number
- Supplier Name
- S/P Ref. Number
- Cost Price
- Main Supplier Y/N
1 1 1 1 2 2 2 2 2
Product Number Product Name Product Type
Code Product Type Name Product Number Supplier
Number Supplier Name S/P Ref. Number Cost
Price Main Supplier Y/N
Note key
14Relational Data Analysis 1NF to 2NF
Does this attribute depend on the whole of the
primary key?
2NF
1NF
Product Number Product Name Product Type
Code Product Type Name
Product Number Product Name Product Type
Code Product Type Name Product Number Supplier
Number Supplier Name S/P Ref. Number Cost
Price Main Supplier Y/N
Product Number Supplier Number S/P Ref.
Number Cost Price Main Supplier Y/N Supplier
Number Supplier Name
15Relational Data Analysis 2NF to 3NF
Is this attribute dependent on any other non-key
attribute(s)?
Note foreign key
3NF
2NF
Product Number Product Name Product Type
Code Product Type Code Product Type Name
Product Number Product Name Product Type
Code Product Type Name Product Number Supplier
Number S/P Ref. Number Cost Price Main Supplier
Y/N Supplier Number Supplier Name
Product Number Supplier Number S/P Ref.
Number Cost Price Main Supplier Y/N Supplier
Number Supplier Name
16Relational Data AnalysisNaming the 3NF tables
Table Names
3NF
Product Number Product Name Product Type
Code Product Type Code Product Type Name
Product Number Supplier Number S/P Ref.
Number Cost Price Main Supplier Y/N
Supplier Number Supplier Name
PRODUCT PRODUCT TYPE SUPPLIER
PRODUCT SUPPLIER
17Relational Data AnalysisRationalising 3NF tables
- Once we have carried out normalisation on a
number of Functions we will have several sets of
tables in 3NF, which we now rationalise into a
single, larger set - Any tables that share a primary key should be
merged, as should tables with matching candidate
keys - We will also look for attributes which now act as
foreign keys when compared with primary keys in
other 3NF sets. A little care is needed to ensure
that any synonyms or homonyms are identified, as
failure to do so could lead to missing or
spurious merges.
18Relational Data AnalysisConverting 3NF tables
into LDSs
3NF
Product Number Product Name Product Type
Code Product Type Code Product Type Name
Product Number Supplier Number S/P Ref.
Number Cost Price Main Supplier Y/N
Supplier Number Supplier Name
PRODUCT
Product Type Code
PRODUCT TYPE
SUPPLIER PRODUCT
Product Number Product Type Code
Supplier Number
SUPPLIER
Represent each table as an entity type box
List primary and foreign key attributes
Small Keys Grab Large Keys
Supplier Number Product Number
Crows Feet Grab Asterisks
19Relational Data AnalysisComparing LDSs
- We now compare our two data structures (the
extract we just produced using RDA with the
Required System LDM we have produced earlier in
the development and decide whether any
discrepancies are due to errors in Logical Data
Modelling or whether they represent redundant
information resulting from RDA - In practice there may be large numbers of
entities involved in the comparison, so a fair
amount of time is likely to be spent in
identifying corresponding entities in the two
models. Probably the best starting point is to
look for common attributes, in particular common
primary keys or candidate keys.
20Relational Data AnalysisSummary
- Relational Data Analysis (RDA) is based on
material published in the 1970s by Edgar Codd of
IBM, proposing the application of mathematical
set theory and algebra to the organisation of
data - RDA is used to create data model extracts from
collections of individual data items, which can
then be used to enhance or confirm the Required
System LDM and to provide the basis for database
design
21Relational Data AnalysisSummary
- The process of normalisation involves applying a
series of refinements to groups of data items in
order to produce tables that conform to specified
standards, known as normal forms - Data in Third Normal Form (3NF) consists of
tables of closely associated attributes which are
entirely dependent on the key, the whole key,
and nothing but the key. This has the effect of
minimising data duplication across different
tables, thereby resolving many of the problems
associated with data redundancy
22Relational Data AnalysisSummary
- Finally, the only way to learn normalisation is
to practise. - This is specially true in order to understand the
process of rationalisation and how to compare LDSs
23The Place of Relational Data Analysis
Investigation
BAM
RD
DFM
BSO
Specification
WPM
Conceptual Model
External Design
LDM
Decision Structure
Policies and Procedures
User Organisation
RDA
FD
Internal design
Construction