Title: Database Structures
1Database Structures
- Shyh-Kang Jeng
- Department of Electrical Engineering/
- Graduate Institute of Communication Engineering
- National Taiwan University
2Database and Flat File
- Database
- Multidimensional storage system
- Contains internal links between its entries
- Information can be accessible from a variety of
perspectives - Flat file
- One-dimensional storage system
- Presents information from a single point of view
3Example
- Flat file can provide
- only a list of compositions arranged by composers
- Database can present
- all works by a single composer
- all composers who wrote a particular type of
music - all the composers who wrote variations of another
composers work
4File-Oriented Information System
5Database-Oriented Information System
6Schema and Subschema
- Schema
- Description of the entire database structure for
maintenance - Subschema
- Description of portion of the database pertinent
to a users needs - Example University Database
- Schema Student data and faculty data linked
together - Subschema faculty records not including
employment history - Subschema faculty records not including the
linkage between students and advisors
7Concept Layers
8Database Implementation
- Application layer
- Handles the communication with the user
- Determines the overall systems external
characteristics - Database management system (DBMS)
- Accomplished the actual manipulation of the
database
9Separation of Application Software and DBMS
- Simplifies the design process
- Isolate maintenance details in DBMS
- Flexible for distributed database
- Provides a means for controlling access
- Achieves data independence
- If the organization of the database changes, the
application software based on unaltered subschema
does not need to be modified
10Database Models
- Concept view of the database
- The requests are translated into the actions of
the actual data storage system by DBMS routines - The application software can be written as though
the information in a database were stored
according to the conceptual database model rather
than the actual storage system - Relational database model and object-oriented
database model are most common
11Host Language
- General-purpose programming languages used to
write application software - Often extended by routines in DBMS
- Within this programming environment the database
is manipulated as though it were organized
according to the conceptual database model
12Relational Database Model
- Most popular today
- Simple structure
- Stores data in rectangular tables, called
relations - Tuples
- Rows in a relation
- Attributes
- Columns in a relation
13A Relation Containing Employee Information
14A Relation Containing Redundancy
15Problems with Redundancy
- Lack of efficiency
- e.g., multi-assignment of jobs to an employee
- e.g., job shared by more than one employee
- Data integrity
- e.g., deletes a tuple with the only description
of a job - Combines more than one concept into a single
relation
16EMPLOYEE Relation
17JOB Relation
18ASSIGNMENT Relation
JobId
TermDate
EmplId
StartDate
19Finding the Departments in Which Employee 23Y34
Has Worked
20Mini Review
- Based on the EMPLOYEE, JOB, and ASSIGNMENT
relations, answer the following questions - Who is the secretary in the accounting department
with experience in the personnel department? - Who is the floor manager in the sales department?
- What job does G. Jerry Smith currently hold?
21Answers
- G. Jerry Smith
- Cheryl H. Clark
- S26Z
22A Decomposition with Information Loss
23Nonloss Decomposition
- A relation decomposed into smaller relations
without losing information
24Relational Operations
- SELECT
- Select certain tuples possessing certain
characteristics and to place these selected
tuples in a new relation - PROJECT
- Extract columns and place the result in a new
relation - JOIN
- Combine different relations into one
25SELECT Example
26PROJECT Example
27JOIN Example
28JOIN Example
29JOIN Application
30Obtaining List of Employee ID and Working
Department
- NEW1 ? JOIN ASSIGNMENT and JOB where
Assignment.JobId JOB.JobId - NEW2 ? SELECT from NEW1 where ASSIGNMENT.TermDate
- LIST ? PROJECT ASSIGNMENT.EmplId, JOB.Dept from
NEW2
31Mini Review
- Based on the EMPLOYEE, JOB, and ASSIGNMENT
relations, write a sequence of relational
operations to obtain a list of all job titles
within the personnel department
32Answer
- TEMP ? SELECT from JOB
- where Dept PERSONNEL
- LIST ? PROJECT JobTitle from TEMP
33Mini Review
- Based on the EMPLOYEE, JOB, and ASSIGNMENT
relations, write a sequence of relational
operations to obtain a list of employee names
along with the employees department
34Answer
- TEMP1 ? JOIN JOB and ASSIGNMENT
- where JOB.JobId ASSIGNMENT.JobId
- TEMP2 ? SELECT from TEMP1
- where TermDate
- TEMP3 ? JOIN EMPLOYEE and TEMP2
- where EMPLOYEE.EmplId TEMP2.EmplId
- RESULT ? PROJECT Name, Dept from TEMP3
35Relational DBMS
- Include routines to perform SELECT, PROJECT, and
JOIN operations - These routines can be called from the application
software - The application software can be written as though
the data were actually stored in the simple
tabular form of the relational model - Store the relation as an indexed file or utilize
hashing techniques to provide rapid access to
entries
36SQL
- Structured Query Language
- Developed by IBM, Standardized by ANSI
- Example
- select EmplId, Dept
- from ASSIGNMENT, JOB
- where ASSIGNMENT.JobId JOB.JobId
- and ASSIGNMENT.TermDate
37SQL Examples
- Example
- select Name, Address
- from EMPLOYEE
- Example
- select EmplId, Name, Address, SSNum
- from EMPLOYEE
- where Name Cheryl H. Clark
38SQL Examples
- Example
- select EMPLOYEE.Name, ASSIGNMENT.StartDate
- from EMPLOYEE, ASSIGNMENT
- where EMPLOYEE.EmplId ASSIGNMENT.EmplId
39SQL Examples
- Example
- insert into EMPLOYEE
- values(42Z12, Sue A Bert, 33 Fair St.,
444661111) - Example
- delete from EMPLOYEE
- where Name G. Jerry Smith
- Example
- update EMPLOYEE
- set Address 1812 Napoleon Ave.
- where Name Joe E. Baker
40Mini Review
- Based on the EMPLOYEE, JOB, and ASSIGNMENT
relations, write a sequence of SQL statements to
obtain a list of employee names along with the
employees department
41Answer
- select EMPLOYEE.Name, JOB.Dept
- from JOB, ASSIGNMENT, and EMPLOYEE
- where (JOB.JobId ASSIGNMENT.JobId )
- and (ASSIGNMENT.EmplId
- EMPLOYEE.EmplId)
- and (ASSIGNMENT.TermDate )
42Object-Oriented Databases
- Consisting of objects that are linked to each
other to reflect their relationships - Example
- Classes EMPLOYEE, JOB, ASSIGNMENT
- Each of these objects contains methods describing
how the object should respond to messages
regarding its contents and relationships. - Not need to write an exterior procedure
43Object-Oriented Database
44Object-Oriented DBMS
- Maintains links between objects
- Creates pointer system that may be required to
record these associations - Provides permanent storage for the objects
entrusted to it (persistency) - Allows the entire software system to be designed
in the same paradigm - Encapsulates the technicalities of different data
formats - Example intelligent objects, multimedia databases
45Database Integrity
- DBMS for personal use are relatively simple
- The errors can be easily corrected
- Financial loss is limited
- Large, multiuser, commercial databases are with
higher stakes - Cost of incorrect or lost data can be enormous
- DBMS has to maintain the databases integrity
46Commit/Rollback Protocol
- Commit Point
- The point at which all steps in a transaction
have been recorded in the log - Has the information to reconstruct the
transaction since the last backup - Rollback
- Undo an incomplete transaction
- Done when equipment malfunctions or illegal
access or deadlock and so on occur - Cascading rollback problem
47Incorrect Summary Problem
- One transaction is in the middle of transferring
funds from one account to another - Another transaction tries to compute the total
deposits in the bank - Could result in a total that is either too large
or too small depending on the order of
transactions
48Incorrect Summary Problem
Start transferring
Calculating the balance
Finish transferring
T2
T3
T1
49Lost Update Problem
- Exemplified by two transactions, each of which
makes a deduction from the same account - One transaction reads the accounts current
balance at the point when the other has just read
the balance but has not yet calculated the new
balance - Both transactions will base their deductions on
the same initial balance - The effect of one of the deductions will not be
reflected in the database
50Lost Update Problem
Read balance
Read balance
Put back new balance
Put back new balance
T1
T2
51Locking Protocol
- Items within a database that are currently being
used are marked with locks - Shared locks
- Corresponding to transactions not going to alter
the data item - Other transactions are also allowed to view the
data item - Exclusive locks
- Corresponding to transactions going to alter the
data item - Other transactions are not allowed to access the
data item
52Deadlocks and Solution
- Deadlocks occur when two transactions
- Require exclusive access to the same two data
items - Each obtains exclusive access to one of the items
- Each insists on waiting for the other
- Wound-wait protocol
- Younger transaction is forced to release all its
data items and roll back its activities
53Sequential files
- Sequential file file whose contents can only be
read in order - Reader must be able to detect end-of-file (EOF)
- Data can be stored in logical records, sorted by
a key field - Greatly increases the speed of batch updates
54A procedure for merging two sequential files
55Applying the merge algorithm
56The structure of a simple employee file
57Indexed files
- Index list of (key, location) pairs
- Sorted by key values
- location where the record is stored
58Opening an indexed file
59Hashing
- Each record has a key
- The master file is divided into buckets
- A hash function computes a bucket number for each
key value - Each record is stored in the bucket corresponding
to the hash of its key
60Hashing the key field
61The rudiments of a hashing system
62Collisions in Hashing
- Collision when two keys hash to the same bucket
- Major problem when table is over 75 full
- Solution increase number of buckets and rehash
all data
63Data mining
- Data mining a set of techniques for discovering
patterns in collections of data - Relies heavily on statistical analyses
- Data warehouse static data collection to be
mined - Data cube data presented from many perspectives
to enable mining - Raises significant ethical issues when it
involves personal information
64Data mining strategies
- Class description
- Class discrimination
- Cluster analysis
- Association analysis
- Outlier analysis
- Sequential pattern analysis
65Social impact of database technology
- Problems
- Massive amounts of personal data are being
collected - Often without knowledge or meaningful consent of
affected people - Data merging produces new, more invasive
information - Errors are widely disseminated and hard to
correct - Remedies
- Existing legal remedies largely ineffective
- Negative publicity may be more effective
66Exercise
- Review Problems
- 7, 8, 9, 10, 11, 20, 37, 40, 47, 54