Title: DBMS and Data Management
1DBMS and Data Management
2DBMS
- Definition A database management system (DBMS)
is a set of software that are used to define,
store, manipulate and control the data in a
database. - define --- define data types, structures and
constraints. - store --- store data to support efficient access.
- manipulate --- perform retrieval and update
operations using a query language. - control --- control access to data.
- Database System Database DBMS
3Integrity, Security
- Topics
- Integrity Constraints within a DBMS
- Security of Data
4 Database Integrity
DBMS cannot protect against all accidental
errors a date being entered incorrectly, mistyp
ing a name, BUT DBMS should protect against
unreasonable entries, updates, deletions
etc. age gt 120 salary gt max_salary making
employees redundant without removing them from
the payroll data.
5Types of Integrity Constraints
- 1. The data model underlying the DBMS may have
its own integrity constraints. - For example the relational model has two
forms of integrity rule which must be satisfied - Entity Integrity
- Referential Integrity
- 2. There are also a set of constraints regarding
permitted values for one or more attributes. - Domain Constraints
- 3. In addition most DBMS permit additional
(general) constraints known as - user specified integrity constraints
- (aka enterprise constraints, or
business rules) - (These are constraints which the data must
satisfy in order to be meaningful to the
organisation - User actually being data or
database administrator) - Ideally the DBMS should enforce ALL the above
types of constraint - however it often falls to
the application program to protect against
integrity violations.
6- Relational Model Entity Integrity
- 1. a database reflects some part of reality
- 2. unique objects in the "real" world must be
identifiable and unique within the database - 3. objects are identified using identifiers
(keys) made from 1 or more attributes - 4. identifiers must be
- unique (for a given object
type) minimal (have no spurious attributes) - That is the reason why PRIMARY KEY is NEEDED
7- SQL setting the primary key, (with 2
attributes) - Create TABLE t (
- attributeA type,
- attributeB type,
- PRIMARY KEY (attributeA, attributeB),
- )
8Justification a null valued key implies a
missing, or unknown, identity. If the identity
is missing then the object in the real world is
not identifiable. If the identity is unknown
then you cannot know if the object is already in
the database - therefore you cannot tell that if
the identity is unique.
9Relational Model Referential Integrity A
foreign key is an attribute (possibly composite)
of one relation, whose value matches the primary
key of some tuple in a relation. (sometimes
called a posted identifier)
SQL setting the foreign key, Create TABLE t
( attributeA type, attributeB
type, attributeC type, PRIMARY KEY
(attributeA, attributeB), FOREIGN KEY
(attributeC) REFERENCES anothertable(referecedAtt
ribute), )
10- During update or delete
- Restricted
- Update/Delete only if no foreign key
exists - Cascades
- Update/Delete operation cascades to
update/delete the foreign key entries - Nullifies (or set default value)
- Update/Delete primary key values
- Null (or default value) in the foreign
key entry.
11Domain Constraints
Each value appearing in a particular place in the
database should come from a specific domain of
values appropriate to the use of the value.
- These can be defined as
- an explicit set of values
- supplier code domain (S1, S2, S3, S4, S5)
- a range of valid values
- age domain (0 .. 120)
- or a pattern which values must satisfy
- employee code domain ( AAAA9999AAA)
- (4 letters followed by 4 digits followed by 3
letters) - or (sometimes) just a datatype
- size domain integer
-
- or combinations of the above.
12SQL for domain constrain
- Create TABLE t (
- attributeA DATE,
- attributeB VARCHAR2(30), NOT NULL
- attributeC CHAR(5), NOT NULL
- )
13User Specified Integrity Constraints
- Categories of User Specified Integrity
Constraints - Relation rules
- record constraints
- set constraints
- which may be
- static
- dynamic (before and after alterations)
14Relation Rules
What combination of attribute values are
acceptable in a relation?
- Record constraints
- Set constraints
treat the relation one row at a time For all
Vehicle.Type CAR then Vehicle.Wheels
(3,4) Vehicle.Type CYCLE then Vehicle.Wheels
(2,3) Vehicle.Type HOVERCRAFT then
Vehicle.Wheels null This prevents meaningless
combinations of data for an individual object in
the database.
apply to the relation as a whole, that is all
data in the table SUM(employee.salary) lt limit
COUNT(distinct dept.car) lt 10 AVG(staff.overtime)
IN /- 10 of 6
15Relation rules may be Static or DynamicStatic
constraints
must hold between data items at ALL
times. employee.salary lt manager.salary IN
employee, manager WHERE employee.mgr
manager.mgr
Dynamic constraints must hold between database
states before AND after changes to the relation
or relations T1(salary) - T0(salary) gt 0 after
update(salary) at time T1. That is, any change to
salaries must be increasing and the integrity
constraints may relate to several tables.
16Responses to Integrity Violation
Attempts to alter a database in ways which
violate integrity must be prevented Responses
include reject the attempted operation request/
prompt for alternative values abort the
current transaction Integrity checking may be
performed on input on deletion on
update deferred to some convenient time (end of
transaction)
17Deferred Integrity Constraints Checking
Some checks MUST be deferred to a transaction
boundary (the transaction is the unit of
integrity). Consider a pair of
constraints For each project entry there
must be at least one employee. For each
employee there must be a project on which
they are employed
18 Security.
- Why do we need security?
- Many benefits accrue from the use of a DBMS,
centralised data storage and access management
making the contents of the database available to
the user. - However, this centralised storage potentially
means that all users have access to all possible
data. Clearly this is not secure, additional
commands must be available to prevent users
accessing data which requires sensitive
management. - Security features are required to prevent
unauthorised - Disclosure
- Alteration
- Destruction
- of objects in the database.
19The Classical Solution to Security.
- 1. Identify the user.
- access through a login sequence requiring a user
identifier.
- 2. Authenticate the users identity.
- require the user to input a secure password.
- 3. Match the user identifier with a list of
privileges. - 4. Privileges such as
- what type of operation can be performed?
- on what objects in the database?
- 5. Before a transaction is executed check objects
and operations required for the transaction
execution against the privileges granted to the
user.
20The above would use an authority matrix to
determine which users had what rights on what
objects.
In a relational database the above would form
part of the database itself.
21Example Privileges
SQL includes the grant statement (DCL) to grant
privileges on a table or view. GRANT
privilege_list ALL ALL PRIVILEGES ON
object_name TO authorisation_id_list
PUBLIC where privilege_list consists of a comma
separated list whose components come from select
insert delete update column_list references
column_list references gives a user the right
to create foreign keys that reference to the
table concerned. The person who issues the
statement must own the tables or views specified
in the statement. GRANT select, insert ON stock
TO smith GRANT ALL ON product TO fox GRANT update
(stock_code, stock_price) ON stock TO brown GRANT
select ON branch TO PUBLIC
22The reverse of granting access privileges is to
REVOKE privileges. REVOKE privilege_list ALL
ALL PRIVILEGES ON object_name FROM
authorisation_id_list PUBLIC REVOKE select,
insert ON stock FROM smith REVOKE ALL ON product
FROM fox REVOKE update (stock_code,
stock_price) ON stock FROM brown REVOKE select
ON branch FROM PUBLIC
23- In order to limit a users access to data we
could specify a set of views on to a relation. - The views would expose only those parts of the
relation which the user is permitted to see, an
abstraction of the underlying data. - No access privileges are granted on the
underlying table, only on the views.
- We use the External Schema to limit access to the
Conceptual Data.
Consider, A database concerned with personnel
information which includes salary information,
the database is used both by the payroll office
which needs to see information on NI, Salary,
working hours etc. and also by the medical staff
who need to see records of home address and
phone numbers for emergency contacts.
24Clearly we wish to restrict the payroll view to
exclude the medical information, and restrict the
medical view to exclude the financial
information. personnel
(USING SQL) CREATE VIEW medical
AS SELECT Name, Address, Phone,
Emergency_contact FROM personnel
CREATE VIEW payroll AS SELECT Name,
Address, NI, Salary, weekly_hours FROM
personnel GRANT select ON payroll TO
payroll_clerk GRANT select ON medical
TO nurse Will do what we require.
25Notice that the view can be quite
sophisticated CREATE VIEW zwu (name,
address, salary) AS SELECT Name,
Address, Salary FROM personnel WHERE
personnel.Name "Zimin Wu" GRANT select ON zwu
TO Zimin Or how about summarising information
in the payroll file, without making the
individual salaries accessible?
CREATE VIEW salary_summary (total_salary)
AS SELECT sum(Salary) FROM
personnel GRANT select ON
salary_summary TO bank_manager So, the view is a
useful 'access limiting' tool.
26A actual scenario
- Suppose a teacher asks the teaching assistant
(TA)to key in the scores of those students who
have taken the re-sit exam of Database(2701) for
him in a database system. What solution should
the teacher have in order to protect the privacy
of each student? (Hint show only necessary
information and grant proper privileges to the
TA) - Student information Table
- ScoreSID,CID,score,resit,sname
27Transaction and Concurrency Control
- Topics
- Large Scale Data Management
- Lots of Data
- Lots of Users
- Lots of
- What are Transactions?
-
-
- Concurrency Control
- Why?
- How?
28Large Scale Data Management Systems
- To put into perspective the problems of large
scale database management systems consider the
requirements for the implementation of an
electronic cash system (e.g. smart cards) - There are 400 Billion recorded consumer payment
transactions each year. - Person to Person transactions account for a
further 400 Billion. - There are 2 Trillion US dollars worth of sub 10
transactions world-wide per year. - e-cash (Internet payment mechanisms) are
expected to drop to the sub cent level for
one-off per page access. - Banks like to keep accurate, secure records of
ALL financial activities. - Credit card transactions are authorised in
approx 30 secs - no matter how busy the system is
(how many simultaneous attempts to authorise
cards occur). -
29Transaction Processing
- A database transaction represents a real world
task. - A transaction can be defined as a program unit
that has four core properties - Atomicity (a logical unit of work)
- all or nothing, transactions are indivisible
units. - Transactions either completely succeed fail no
partial, or completely successes - Consistency (logical unit of integrity)
- moves database from one consistent state to
another.
30- Independence (logical unit of concurrency)
- transactions execute independently of other
transactions which may be occurring
simultaneously within the database system. -
- Durability (logical unit of recovery)
- failures in the database system cannot cause the
effects of completed transactions to be undone.
Once committed the effects are recoverable
after failure.
31How to represent a Transaction.
- How to represent a Transaction.
- Most DBMS (or Database Languages) have
mechanisms similar to the SQL description which
follows (they do differ from system to system) - A program unit can be one or more statements -
we talk about multi-statement transactions (MST).
A DBMS needs user support to inform it of
the boundaries of a transaction. - Marking Transaction Boundaries
- A transaction begins with the user's first
executable SQL statement.SQL has two simple
commands in the Data Manipulation Language to
allow the user to mark the end of transactions.
32- COMMIT
- marks the end of a transaction - and is used
where the effects of the transaction are to be
made visible to the other transactions in the
database. - ROLLBACK
- also marks the end of a transaction - it is used
where you want to undo the effects of the
transaction (rollback the state of the database)
to the state it was in just before the BEGIN
TRANSACTION was executed. (undoing the whole
transaction)
33The Need for Concurrency Control
- Most databases are used in multi-user mode
resulting in more than one transaction occurring
simultaneously in the system. - Databases frequently have lots of users.
- (Airline booking, Telesales Operations, Credit
Card Mgt.) - Within a collection of transactions individual
statements might be interleaved as well as the
underlying operating system causing processes to
be interleaved.
34- CLASSIC PROBLEMS
- Without some form of concurrency management the
effects of simultaneous transactions may
interfere causing three possible types of
problem - The lost update
- The uncommitted dependency
- Inconsistent analysis.
35The Lost Update Problem
At time after T4 the update performed by
transaction A is lost the stored value of V is
30.
36The Uncommitted Dependency
The effect of the rollback in Transaction B is to
undo the update of V at time T2. However,
Transaction A has fetched the updated value of V
at time T3. If another transaction executed the
same code as A after T4 the result would be
different.
37Inconsistent Analysis
What is the value of the sum total? Is it
right?
38Requirements of Concurrency Control
To avoid these problems transactions must be
serializable.
A non-serial schedule is any sequence of reads
and writes performed by a set of concurrent
transactions. A serial schedule is a schedule
where the read and write operations of each
transaction are executed consecutively without
any interleaved operations from other
transactions.
39There is no interference, but still the two
sequences lead to different results.
40Concurrency Control Locking.
- Basically, locking is simple.
- When a transaction is going to use some record
or table and cannot risk that object changing
then it acquires a lock on the object. - The effect of the lock is to prevent other
transactions from gaining access to that object. - As usual the actual locking operations in most
DBMS are more complex than the above, for example
- locking a whole table may be unduly restrictive
- the use of a single mode locking strategy
prevents shared 'read only' access - Serialisability is not guaranteed with locks
only.
41Most DBMS use
- Locking Modes
- Read Locks
- reading is none destructive so multiple readers
can share a read locked object - writers are prevented from obtaining read locked
objects - Write Locks
- writers are destructive, readers and writers
cannot obtain write locked objects - Some DBMSs (e.g. Oracle) provide for lock
conversion (aka upgrade) automatically as
appropriate
42- Multiple Level
- locks can be acquired at different levels, e.g.
table and row levels in Oracle. - may be escalated in certain DBMSs
- Locking Phase
- Growing Phase
- locks are acquired throughout the transaction
no locks are released until the end of the
growing phase - Shrinking Phase
- after 1st lock is released no additional locks
can be acquired - This strategy ensures serializability.
43- How does INGRES do it?
- Setting Locks
- set automatically by the DBMS
- Types of Locking
- X - exclusive (write) locks, only one
transaction at a time may hold this per
resource. - S - shared (read) locks, multiple transactions
can hold an S lock on this resource, but no
transaction can obtain an X lock on a resource
while any S lock is held on it. - IX/IS - intended X/S locks, taken on tables
before X and S locks are taken at page level. - Locking Levels
- Page level locks, physical device pages used to
limit locks to relevant pieces of a table
(needs the table to be physically distributed in
a predictable manner, e.g. hashed key). - Table level locks, lock the whole table.
44Summary of locking level usage in INGRES
Oracle uses a similar approach, but SELECT
statement (without FOR UPDATE in SELECT) does not
lock data.
45Page Vs Table Locking.
- Lock Escalation
- If a query uses gt 10 pages then a table lock is
taken. This prevents a large number of page locks
being accumulated during execution. - If during execution more than 10 page locks are
acquired on a table then they are promoted to a
table lock, and the accumulated page locks are
dropped.
46Example 1 SELECT FROM emp WHERE name
"Jack" Actions 1) acquire table level IS
lock on emp. 2) acquire page level lock on
page 2 of emp table, assuming "Jack" is on page
2. 3) execute query. 4) drop all
locks. Example 2 BEGIN TRANSACTION SELECT FROM
dept WHERE dname "ADMIN" UPDATE emp SET
salary40000 WHERE position "MGR" END
TRANSACTION Actions 1) acquire table level
IS lock on dept. 2) acquire page level S
lock on page 1 of dept. 3) execute select
query. 4) acquire table level X lock on
emp, (position is not the primary key and is not
indexed so cannot use page locking) 5)
execute update query. 6) drop ALL locks.
47What if you request a lock on a locked
table? Lock Compatibility
- An attempt to obtain a lock on a previously
locked object may result in the requesting
transaction being blocked. - A blocked action is held in a wait state
until the resource requested is freed. - BUT
- this blocking may lead to DEADLOCK!
48 Consider
DEADLOCK!
49- Question
- If the DBMS can detect deadlock can the above
situation be resolved adequately? - Answer Yes (with varying levels of
sophistication). - INGRES aborts (issues a rollback) one of the
transactions arbitrarily and reports an error to
the user. - Oracle reports an error to one of the
transactions and expects the signalled
transation to roll back explicitly to allow the
other transaction to continue. - More elegant systems should rollback one or
other transaction (silently, no need to concern
the user) and reschedule the transaction from
the information in the system log. - Really elegant systems will rollback the
transaction which had performed the least work
at the time the deadlock was detected. - Beware of transaction starvation (Livelock)
solvable with a priority system.
5016.6 Deadlock Handling
- System is deadlocked if there is a set of
transactions such that every transaction in the
set is waiting for another transaction in the
set. - Deadlock prevention protocols ensure that the
system will never enter into a deadlock state. - Some prevention strategies
- Require that each transaction locks all its data
items before it begins execution
(predeclaration). - Impose partial ordering of all data items and
require that a transaction can lock data items
only in the order specified by the partial order
(graph-based protocol).
51More Deadlock Prevention Strategies
- Following schemes use transaction timestamps for
the sake of deadlock prevention alone. - wait-die scheme non-preemptive(???)
- older transaction may wait for younger one to
release data item. Younger transactions never
wait for older ones they are rolled back
instead. - a transaction may die several times before
acquiring needed data item - wound-wait scheme preemptive
- older transaction wounds (forces rollback) of
younger transaction instead of waiting for it.
Younger transactions may wait for older ones. - may be fewer rollbacks than wait-die scheme.
52Deadlock prevention (Cont.)
- Both in wait-die and in wound-wait schemes, a
rolled back transactions is restarted with its
original timestamp. - Older transactions thus have precedence over
newer ones, and starvation is hence avoided. - Timeout-Based Schemes
- a transaction waits for a lock only for a
specified amount of time. After that, the wait
times out and the transaction is rolled back. - thus deadlocks are not possible
- simple to implement but starvation is possible.
- Also difficult to determine good value of the
timeout interval.
53Deadlock Detection
- Deadlocks can be described as a wait-for graph,
which consists of a pair G (V,E), - V is a set of vertices (all the transactions in
the system) - E is a set of edges each element is an ordered
pair Ti ?Tj. - If Ti ? Tj is in E, then there is a directed
edge from Ti to Tj, implying that Ti is waiting
for Tj to release a data item. - When Ti requests a data item currently being held
by Tj, then the edge Ti Tj is inserted in the
wait-for graph. This edge is removed only when Tj
is no longer holding a data item needed by Ti. - The system is in a deadlock state if and only if
the wait-for graph has a cycle. Must invoke a
deadlock-detection algorithm periodically(????)
to look for cycles.
54Deadlock Detection (Cont.)
Wait-for graph with a cycle
Wait-for graph without a cycle
55Deadlock Recovery
- When deadlock is detected
- Some transaction will have to rolled back (made a
victim) to break deadlock. - Select that transaction as victim that will incur
minimum cost. - Rollback -- determine how far to roll back
transaction - Total rollback Abort the transaction and then
restart it. - More effective to roll back transaction only as
far as necessary to break deadlock. - Starvation happens if same transaction is always
chosen as victim. - Include the number of rollbacks in the cost
factor to avoid starvation