Title: Information Retrieval and Use
1Information Retrieval and Use
- Data Analysis Data Modeling, Relational Data
Analysis and Logical Data Modeling - Geoff Leese September 2009
2Relational Data Analysis
- Captures the detailed knowledge of the meaning of
the data. - Ensures that the data is logically easy to
maintain and extend. - Data inter-dependencies have been identified
- Ambiguities have been resolved.
- Eliminate unnecessary duplication of data.
- Forms the data into optimum groups.
- Validates the Logical Data Model (LDM).
3Logical Data Modelling
- Basic Rules for converting 3NF to a LDM
- Create an entity type for each data relation
- Mark qualifying foreign keys
- Check compound key relations
- Make foreign/primary key relations
4Guidelines for logical modelling
- Entity type names are singular nouns,
descriptive, concise and organisation specific. - Attribute names are unique descriptive nouns of
standard format. - Relationship names are descriptive, precise verb
phrases.
5Simple Master-Detail relationships
- Where a single foreign key of a relation
corresponds to the primary key of another
relation - See next slide for example.
6Simple Master-Detail relationships
Shows SINGLE primary key at MASTER entity
(Organisation) connected to SINGLE foreign key at
DETAIL entity (Contact people)
7Multiple level Master-Detail Relationships
8Identifying Recursive (Unary) Relationships
- Is a relation where a foreign key references the
same relation. - Example Employee
- Employee-number
- Employee-name
- Employee-manager-number
Employee
9Relationships Student/Module
- At this point we need to identify the data items
that describe or identify each entity - Entity attributes are also known as data items
- What are the data items associated with the
following LDS diagram?
10The Student
Entity Type Attribute Name Attribute
Student Student Name Jones Street
Address Leek Road Town Stoke-on-Trent Po
st Code ST4 2DE Telephone 294303
11The Module
- Entity Type Attribute Type Attribute
- Module Module Number CM5111-1
- Module Name SSAT
- Module Leader A Lecturer
- Level 1
- Cats Points 10
12The Data Items
13Identifying occurrences of entities
- Each occurrence of an entity must be uniquely
identified in some way - Imagine the British Gas data base that used only
surnames to identify account holders - There would be 100,000 account holders called
Jones in this country - Even if we used the given names there would still
be considerable duplication - It would be impossible to find the right account
by name alone
14Adding a Primary Key
15Relationships Getting it right
Is this right?
The real situation is surely
16Putting it right Intersection entity
We need a link entity - less ambiguity
17Normalisation - steps
- Start with a set of un-normalised tables
- Entity/attribute list
- Step 1 - remove ambiguity and repeating data
- Step 2 - remove shared data
18Normalisation - step 1
- Break down ALL attributes into smallest
meaningful parts - EG student name becomes student surname, student
firstname, student title - Remove REPEATED information to form a new table
- EG a course may be composed of MANY modules (but
assume that each module is only on one course!) -
so form a MODULE table
19Normalisation - step 2
- Remove SHARED data to form new tables
- EG modules may share tutors - so form a TUTORS
table.
20Normalisation
- FIRST NORMAL FORM - a relation (table) is in 1NF
if it contains atomic values and all repeating
groups have been removed
21Normalisation
- SECOND NORMAL FORM - a relation(table) is in 2NF
if it is in 1NF and every non-key attribute is
fully dependent on the primary key
22Normalisation
- THIRD NORMAL FORM - a relation(table) is in 3NF
if it is in 2NF and every non-key attribute is
not dependent on any other non-key attribute
23Relational Data Analysis Form
- Validates the LDM against the relations.
- Consists of
- Unnormalised Form
- attributes
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Relations
- Attributes
24RDA Form
25Data Dictionary
- lists, for every field in every table
- Tablename
- Fieldname
- Field Type
- Field size (if variable)
- Decimal places (if applicable)
- Description (if required)
- Other significant field properties
26Data Dictionary example
27The domain
- Is the set of items, and the definition thereof
to which an attribute belongs - Define domain once, saves time when defining
attributes belonging to it. - For example - Date of Birth, Course Start Date
and Enrolment Date all belong to the DATE domain
- data type is date/time, format dd/mm/yyyy,
non-unique, non-null.
28Further reading
- Rolland chapters 3 and 4
- Hoffer chapters 10 and 12
- Kendall Kendall chapter 17