Title: Data Modeling and Relational Databases
1Data Modeling and Relational Databases
2File versus Database
3File versus Database
- File a collection of similar records.
- Files are unrelated to each other than for their
use by a common application program - Data storage is built around the applications
that use the files - Database a collection of interrelated files
- Records in one file (or table) are physically
related to records in another file (or table) - Applications are built around the integrated
database
4Pros and Cons of Databases
- Pros
- Data independence from applications increases
adaptability and flexibility - Superior scalability
- Ability to share data across applications
- Less redundancy (total non-redundancy is not
always achievable)
- Cons
- More complex than file technology
- Somewhat slower performance
- Investment in DBMS and database experts
- Need to adhere to design principles to realize
benefits - Increased vulnerability due data consolidation
5Fields Records Files/Tables
Field
(also referred to as attributes)
Record
Field 1
Field 2
Field m
File/Table
Record 1
Record 2
Record n
6Data Modeling
7Data Modeling
- Data modeling a technique for organizing and
documenting a systems data. Sometimes called
database modeling. - Entity relationship diagram (ERD) a data model
utilizing several notations to depict data in
terms of the entities and relationships described
by that data.
8Sample Entity Relationship Diagram (ERD)
Entity
Relation
Unique identifier
9Data Modeling Concepts Entities and Relationships
- Entity - person, place, thing or event about
which we wish to collect data - Relationship - A natural association that exists
between one or more entities. The relationship
may represent an event or affinity that links the
entities.
10Data Modeling Concepts Cardinality
- The minimum and maximum number of occurrences of
one entity that may be related to a single
occurrence of the other entity. -
bidirectional
Or simply is enrolled in
11Cardinality Notations
12Data Modeling Concepts Degree
- Degree the number of entities that participate
in the relationship. - A relationship between two entities is called a
binary relationship. - A relationship between different instances of
the same entity is called a recursive
relationship.
13Data Modeling Concepts Degree
- Relationships may exist between more than two
entities and are called - N-ary relationships.
- The example ERD depicts a ternary relationship.
14Data Modeling Concepts Degree
- Associative entity an entity that inherits its
primary key from more than one other entity
(called parents). - Each part of that concatenated key points to one
and only one instance of each of the connecting
entities.
Associative Entity
15Data Modeling Concepts Foreign Keys
- Foreign key a primary key of an entity that is
used in another entity to identify instances of a
relationship. - A foreign key always matches the primary key in
the another entity - A foreign key may or may not be unique (generally
not) - The entity with the foreign key is called the
child. - The entity with the matching primary key is
called the parent.
16Data Modeling Concepts Foreign Keys
Primary Key
Primary Key
Foreign Key Duplicated from primary key of Major
entity (not unique)
17Resolving Nonspecific Relationships
Some non-specific relations must be resolved
differently
The verb or verb phrase of a many-to-many
relationship sometimes suggests other entities.
Creates another non-specific relationship
18Resolving Nonspecific Relationships
Many-to-many relationships can be resolved with
an associative entity.
19Generalization
Generalization
Supertype
Subtype
20Logical Model Development Stages
- Context Data model
- To establish project scope
- Key-base data model
- Eliminate nonspecific relationships
- Add associative entities
- Include primary and alternate keys
- Precise cardinalities
- Fully attributed data model
- All remaining attributes
- Subsetting criteria
- Normalized data model
21The Context Data Model
Non-specific relationship
Non-specific relationship
22The Key-based Data Model
Associative Entity
Associative Entity
23The Key-based Data Model With Generalization
24The Fully-Attributed Data Model
25What is a Good Data Model?
- A good data model is simple.
- Data attributes that describe any given entity
should describe only that entity. - Each attribute of an entity instance can have
only one value. - A good data model is essentially nonredundant.
- Each data attribute, other than foreign keys,
describes at most one entity. - Look for the same attribute recorded more than
once under different names. - A good data model should be flexible and
adaptable to future needs.
26Normalization
- Normalization a data analysis technique that
organizes data into groups to form nonredundant,
stable, flexible, and adaptive entities.
27Normalization 1NF, 2NF, 3NF
- First normal form (1NF) an entity whose
attributes have no more than one value for a
single instance of that entity - Any attributes that can have multiple values
actually describe a separate entity, possibly an
entity and relationship. - Second normal form (2NF) an entity whose
nonprimary-key attributes are dependent on the
full primary key. - Any nonkey attributes that are dependent on only
part of the primary key should be moved to any
entity where that partial key is actually the
full key. This may require creating a new entity
and relationship on the model. - Third normal form (3NF) an entity whose
nonprimary-key attributes are not dependent on
any other non-primary key attributes. - Any nonkey attributes that are dependent on other
nonkey attributes must be moved or deleted.
Again, new entities and relationships may have to
be added to the data model.
28Relational Databases
29What is a Relational Database?
- The relational database is a type of database
that treats data as if they were stored in two
dimensional tables - This allows data in one table to be related to
data in another table - Tables must have a data field in common to be
related to each other
30Example Database
- Employee Table
- Department Table
31Relational Database Concepts
- Entity - person, place, thing or event about
which we wish to collect data - Business employees, departments
- University students, courses
- Attribute characteristic of an entity
- Employee name, address, SIN, department, salary
- Courses course , section , prof., room
32Concepts Continued
- Record (tuple) a group of related fields
- Domain or range values that an attribute can
have (must be same in every record) - usually numbers text
- Table a group of records of the same type
33Keys
- Primary key (PK) a field in a record that
uniquely identifies that record - can be made up of more than one field if
necessary - in the Employee table the SIN would be the PK
34Foreign Key
- A field that relates records in different tables
to each other - Matches to the PK field of another table
35Entity/Referential Integrity
- Entity integrity the PK field must be unique
and is not allowed to be empty (NULL) - Referential integrity a record of table must
exist if pointed to by a FK in another table
36Referential Integrity Example
37Relational Algebra Operations
- A relational database needs a set of operations
to manipulate data - Operations allow users to make requests to a
database - These operations result in a new table, with the
possibility of having fields from multiple tables
38Relation Algebra Operations
- Select (s) select a subset of records from a
table - Subset is determined by a selection condition
- sltselection conditiongt (T)
- In the ltselection conditiongt use attribute names,
comparison operators, and constant values
39Relation Algebra Operations
- Project (p) selects a subset of columns from
one or more tables - Columns are determined from an attribute list
- p ltattribute listgt (T)
- ltattributegt list can be any of the fields in the
table T.
40Relation Algebra Examples
- sltsalary 100000gt (Employee)
41Relational Algebra Examples
- pltName, Addressgt (Employee)
42Aggregate Functions
- Used to find the SUM, AVERAGE, MAXIMUM, MINIMUM,
and COUNT - COUNT is only used to count the number of records
selected. - ltgrouping attributesgt ltfunction listgt (T)
43Aggregate Function Example
- ltDepartment gt ltCOUNT SIN, AVERAGE Salarygt
(Employee)
44DBMS
- Database Management System
- Responsible for storing, retrieving and managing
data in a database. - Manages connections between computer users and
the database. - Connections allow users to access and/or create
data in the database.
45DBMS
- The DBMS ensures that the rules which are set for
the data are maintained while retrieving and
modifying the data. - This frees programmers from having to always code
these checks - DBMS ensures integrity and security of the data.
Eg. prevents referential integrity.
46DBMS
- Most common management system for databases is
the Relational Database Management System
(RDBMS). - Common RDBMS are
- Oracle PostgreSQL
- IBMs DB2 MySQL
- Microsoft Access would be an example of a single
or small user DBMS.
47Structured Query Language (SQL)
-
- SQL is a standard programming language used to
query a database. Either by retrieving or
modifying data stored in the database.
48History of SQL
- Originally called SEQUEL when it was created by
IBM in the late 70s. - Before becoming a universally accepted query
language
49SQL Used to Define RDBs
- Data Definition Language (DDL)
- Used to declare data schemas
- The domain of values associated with each
attribute. - Integrity constraints.
- The set of indices to be maintained for each
relation. - Security and authorization information for each
relationship
50SQL Commands
- A typical SQL query has the form
- select A1, A2., An
- from R1,r2,.,Rn
- where H
- As represents attributes
- Rs represents relations
- H presents predicates (e.g. R1 lt some value)
51SQL Commands
- SQL is used to create the various tables within
the database. - E.g. of command
- create table customer(customer-name char(20),cus
tomer-street char(30),customer-city char(30),pri
mary key (customer-name))
52SQL Commands
- SQL is also used to create relationships between
these tables - E.g. of command
- create table account(account-number char(10),bra
nch-name char(15),balance integer,primary key
(account-number), foreign key (branch-name)
references branch)