Title: An Introduction to DBMS Technology
1An Introduction to DBMS Technology
Yelena Yesha Olga Streltchenko
2Presentation Overview
- Database functionality
- Relational Data Model
- SQL
- Web-Database Connectivity
3Transaction
- A transaction is an exchange of
- Information
- Goods
- Services
- Currency/currencies
- Transaction properties
- Atomicity a transaction must be all or nothing.
- Consistency a transaction takes the system from
one consistent state to another. - Isolation.
- Durability.
4Databases and Information Economy
- Shift from computation to information
- corporate computing
- personal computing and the Internet
- scientific computing.
- Growing importance of transaction-orientation and
information retrieval. - The database field concentrates on the efficient
management of large amounts of persistent,
reliable shared data.
5Database versus File System
- Database management systems (DBMS) is a software
that provides transaction support by implementing
- Data independence
- Data access efficiency
- Concurrency control
- Data integrity
- Reliability
- Security
- Data distribution and heterogeneity.
6Data Independence and Access Efficiency
- DBMS allows to avoid rewriting all access
routines every time the data format changes or
data is added/modified/deleted. - insulate applications from data storage details.
- Logical independence protection from changes in
logical structure of data. - Physical independence protection from changes
in physical structure of data. - DBMS maintains data structures and implements
algorithms allowing to avoid linear search - indexing search in O(log n)
- fast access even on complex data queries.
7Concurrency Control and Data Integrity
- Interleaving actions of different applications
boosts performance. - DBMS insures semantically correct access to the
same data by concurrent applications - two programs accessing the same data at the same
time can result in an inconsistent update - implement sharing in a controlled manner.
- Data semantics may require certain constraints to
be satisfied. - DBMS guarantees that application programs comply
with the constraints when adding/modifying the
data.
8Reliability and Security
- DBMS provides techniques for recovery from
software and hardware failures - guarantee survival of the data across
catastrophes. - DBMS prevents unauthorized users from
accessing/modifying data or denying service to
other users.
9Data Distribution and Heterogeneity
- Centralization is the enemy of scalability
- a vast number of modern applications are
distributed. - Data sharing in a distributed environment is a
challenge. - Heterogeneity applies to networks, hardware,
operating systems, programming languages, data
formats, etc. - Distributed applications must mask the
differences. - Need distributed data management.
10Categories of Data Models
- High-level or conceptual
- entities, attributes, relationships.
- Representational or implementation or logical
- relational, network hierarchical,
object-oriented, object-relational. - Physical or low-level
- data storage.
11Levels of Abstraction in a Database
- Schema versus Instance
- schema description of the data that
- captures data types, relationships, constraints
on the data - meta-data (data about data), knowledge, e.g.,
Employees(EmpName, EmpNo, Dept, Sal) - is independent of any application program
- changes infrequently
- instance set of records/tuples/rows for that
schema, the actual data in the database at a
given time - time-varying
- e.g., ltJane, 201, Shoe, 1Mgt,ltSusan, 302, Toy, 1Mgt
123-schema Architecture
- Physical level description of a database
- how things are stored on disk
- files, record structures,
- indices,
- data structures for disk blocks,
- methodology for dealing with too long records,
etc. - Conceptual level description of a database
- The description of application data (its schema)
using one of the traditional data models.
133-Schema Architecture (cont'd)
- View-level description of a database
- What users of a particular application see
- their own customized schema, e.g., for payroll,
for the ticket agent, for a simulation program. - Multiple levels
- helps with data independence
- helps with maintenance.
- Many views, single logical and physical schema.
- Levels of abstraction give data independence.
14The Entity-Relational Model
- Entity a distinguishable object.
- Entity set a set of entities all of the same
type. - Attribute a single property of an entity
- simple vs composite
- single-valued vs multi-valued
- stored vs derived
- null values.
- Domain set of values permitted for that
attribute.
15The E-R Model (2)
- Relationship an association between two or more
entities. - Relationship set a set of relationships all of
the same type - There is no correct schema for a batch of data.
Which schema is best depends on the application. - Many basic data modelling choices depend on an
understanding of the application.
16Data Model
- Data model notation for describing data, plus a
set of operations used to manipulate that data. - a set of primitives for defining the structure of
a DB - a set of operations for specifying the retrievals
and updates on a DB - relational, hierarchical, network,
object-oriented.
17The Relational Model (Codd 1970)
- The relational data model is the most important
data model currently existing. - Value-oriented, i.e., allows operations on
relations whose results are relations, thus
enables to combine operations. - As opposed to object-oriented models, in which
- Operations cannot be applied to the result of
other operations - The result of an operation may be a new data
type, and operations may not be available for
this type.
18Definitions Domain Relation
- A domain is a set of atomic values.
- A relation is a finite subset of the cartesian
product of a finite list of domains - relation is a set of tuples
- order of tuples is irrelevant and
- no relation has 2 identical tuples
- each tuple value is atomic
- no composite attributes
- no multi-valued attributes.
19Relational Model (contd)
- Everything is represented by relations
- Formally Given sets D1, D2, ....Dn (not
necessarily distinct), a relation R ? D1 X D2 X
...X Dn - Di 's are the domains and n is the arity
(degree) of R - elements of R are called tuples
- number of tuples in R is the cardinality of R
- relational data model helps to view a relation as
a table - Observe the following properties
20Relational Model (contd)
- Everything is represented by relations
- Given sets D1, D2, ....Dn (not necessarily
distinct), a relation R ? D1 X D2 X ...X Dn - Di 's are the domains and n is the arity (degree)
of R - elements of R are called tuples
- number of tuples in R is the cardinality of R.
- Relational data model helps to view a relation as
a table - each row represents a tuple (record)
- each column represents an attribute (field).
- Properties
- no two rows are identical
- the ordering of tuples is unimportant
- the ordering of columns is important.
21Keys
- Let R be a relation schema and K ? R.
- K is a superkey of R if it can uniquely identify
any tuple in any r(R). There are no tuples t and
t' such that tK t'K. - K is a candidate key if K is a minimal superkey.
There is no K' ? K such that K' is also a
superkey of r(R) - A primary key is one of the candidate keys,
remaining candidate keys are alternate keys - Every relation has a key.
- A key is a property of a relation schema, not a
relation.
22Integrity Constraints
- Relational database schema is a set of relation
schemas and a set of integrity constraints - Integrity constraint condition that must be true
for any instance of a database. - Integrity constraints are expected to hold on
every database instance of the schema - Integrity constraints
- Structural
- key constraint uniqueness of keys
- entity integrity constraint no primary key value
can be null - referential integrity constraint reference from
relation R to relation S must refer to an
existing tuple of S - Semantic.
23Foreign Keys
- A set of attributes (FK) of R is a foreign key if
- the attributes in FK have the same domain as the
primary key (PK) attributes of another relation
S, and - for each instance of R, the values of FK occur as
a value of PK for some instance in S, or is null. - In the relational model, the only way an entity
can reference another entity is through the value
of the primary key of the second entity. - Foreign keys don't have to be unique or non-null,
but if one component is null, then all components
must be null.
24E-R to Relations (I.e., Defining Relations)
- Done using DDL (Data Definition Language)
- Name whole database schema
- Declare domains for attributes
- Define relations
- name
- attribute names and domains
- primary and other keys
- foreign keys
25Translating from E-R
- Represent entity set E by a relation whose
attributes are all the E-R attributes of E. Then
each tuple represents one entity of E. - To represent relation R between entity sets E1,
, Ek, create relation R with key attributes of
E1, , key attributes of Ek, as attributes
(rename duplicates). Each tuple of the relation
represents one combination of entities that are
related to one another. - You might have some redundant relations, which
you can delete.
26Schema Normalization
- Formal theory of database design
- based on grouping attributes in a particular way
using attribute dependencies to achieve good
schemas - 1NF, 2NF, 3NF, BCNF, 4NF,
- Goal
- dont store redundant information
- can represent everything (otherwise, the schema
is useless!)
27Query and Update Languages
- DDL data definition language
- used by DBA
- to define schemas, create views, create indices
- DML data manipulation language
- used by sophisticated casual user
- to query data or
- update data
28Relational Query Languages
- Query languages allow manipulation and retrieval
of data from a database. - Relational model supports simple, powerful query
languages - strong formal foundation based on logic
- allows for optimization.
- Two mathematical languages form the basis for
relational languages (e.g., SQL) and for
implementation - Relational Algebra More operational, useful for
representing execution plans - Relational Calculus Lets users describe what
they want, rather than how to compute it
(non-operational, declarative). - Basic operations
- selection, projection, cross-product,
set-difference, union, intersection, join,
division
29SQL
- SQL (Structured Query Language) is the query
language for the System R developed at IBM San
Jose Astraham, Gray, Lindsay, Selinger .. - SQL is now the query language for IBM's DB2 and
the de-facto standard on most commercial RDBMS. - SQL is a comprehensive language providing
statements for data definition, query and update.
Hence it is both DDL and DML.
30SQL (contd)
- SQL allows to create views, it can be embedded in
a general-purpose programming language (C or
PASCAL). - SQL has one basic statement for retrieving data
from the database - the SELECT statement
SELECT ltattribute listgt
FROM lttable listgt
WHERE ltconditiongt - Standards
- SQL or SQL1 (ANSI 1986)
- SQL2 or SQL-92 (ANSI 1992)
- SQL3 underway extends SQL with OO and other
concepts.
31SQL Data Types
- Numeric
- Integers of various ranges INTEGER (or INT),
SMALLINT - Real numbers of various precision FLOAT, REAL,
DOUBLE PRECISION - Formatted numbers DECIMAL(i,j) or DEC(i,j) or
NUMERIC(i,j). - Character Strings
- Fixed length n CHAR(n) or CHARACTER(n)
- Variable length of maximum n VARCHAR(n) or CHAR
VARYING(n) (default n 1). - Bit strings
- Fixed length n BIT(n)
- Varying length of maximum n VARBIT(n) or BIT
VARYING(n).
32SQL Data Types (contd)
- Date Time SQL2
- DATE (10 positions) YYYY-MM-DD
- TIME (8 positions) HHMMSS
- TIMESTAMPdate, time with 6 fractions of seconds
and optional time zone TIME(i) defines i decimal
fractions of seconds - (81i positions) HHMMSSddd...d
- TIME WITH TIME ZONE includes the displacement
from standard universal time zone 1300 to
-1259 (6 additional positions)
HHMMSS/-HHMM - INTERVAL Year/Month or Day/TIME
33Data Definition Language
- DDL is used to define the (schema of) database
- to create a database schema
- to create a domain
- to create, drop. alter a table
- to create, remove an index defunct in SQL2
- to create or drop a view
- to define integrity constraints
- to define access privileges to users (Oracle
CONNECT, RESOURCE, DBA) - to GRANT or REVOKE privileges ON/TO object/user
- SQL2 supports multiple schemas
- CREATE SCHEMA name AUTHORIZATION user
- CREATE SCHEMA EMPLOYEE AUTHORIZATION yesha
34SQL Schema
- EMP(Name,SSN,DNO,BirthPlace)
- DEPT(DName,DNO,MGRSSN)
- PROJECT(PName,PNO,PLocation,DNum)
- WORKSON(ESSN,PNO,Hours)
- CREATE SCHEMA 'COMPANY'
- CREATE TABLE EMP
- (
- EName name_dom NOT NULL,
- SSN CHAR(9) NOT NULL,
- DNO INTEGER NOT NULL,
- BirthPlace city_dom,
- PRIMARY KEY(SSN),
- FOREIGN KEY (DNO) REFERENCES DEPT (DNO)
- )
35Drop
- DROP command can be used to remove
- a schema
- DROP SCHEMA Company CASCADE
- DROP SCHEMA Company RESTRICT
- CASCADE option removes everything tuples,
tables, domains, ... - RESTRICT option removes the schema if it has no
elements in it - a table
- DROP TABLE EMP CASCADE
- DROP SCHEMA EMP RESTRICT
- CASCADE option removes the table and all
references to it - RESTRICT option removes the table if it is not
referenced
36Alter
- The ALTER allows to
- alter the domain of an attribute
- ALTER TABLE Student
- ALTER GPA NUMBER(4,2)
- set or drop default value of an attribute
- ALTER TABLE Student
- ALTER GPA DROP DEFAULT
- ALTER TABLE Student
- ALTER GPA SET DEFAULT 0.00
- add a new attribute to a relation
- ALTER TABLE Student
- ALTER Admission DATE
- drop an attribute (not in SQL1)
- ALTER TABLE Student
- DROP GPA CASCADE/RESTRICT
37Data Manipulation Language
- SELECT
- tuple queries
- aggregate queries
- INSERT
- DELETE
- UPDATE
- CREATE VIEW
38Data Query SELECT
- Used for retrieval.
- Used to specify subqueries for retrieval and for
the other operations. - Not the sigma or select operator of the
relational algebra. - The general form of a SELECT statement
- SELECT ltattribute listgt
- FROM lttable listgt
- WHERE ltconditiongt
- GROUP BY ltattribute listgt
- HAVING ltconditiongt
- ORDER BY ltattribute,ASC/DESC pairgt
39Relational Operators in SQL
- Projection SELECT A,B FROM R
- Selection SELECT FROM R WHERE F
- The WHERE condition can be a Boolean combination
of conditions. - Product of two tables, A X B
- SELECT R.?, S.?
- FROM R, S
40Data Update
- Examples
- Create a new instance of an entity explicitly
with all details - INSERT table VALUES (v1, v2, ..., vn) inserts a
tuple (v1, v2, ..., vn) into table - Unspecified columns get their default value if
available,NULL if allowed (and no default is
defined), (fails otherwise) - INSERT table(c1, c2, cm) VALUES (v1, v2, ..., vm)
inserts a tuple with m columns set to (v1, v2,
..., vm).
41Data Update (contd)
- Examples
- UPDATE table SET c1 v1, ..., cm vm WHERE
expr - Update all instances matching some local
criterion UPDATE table SET ? WHERE ? - Remove instances matching some local criterion
DELETE table WHERE ? - Remove instances matching a non-local criterion
- DELETE table FROM table, table2 WHERE AND ?
-
42SQL Views
- An SQL view is a table derived from other tables
- base (physical) tablesultimately depend on these
defining tables - other views
- Views are
- usually virtual
- sometimes materialized
43View Specification
- Views are specified with the paradigm
- CREATE VIEW view
- AS SELECT
- The SELECT part is the defining query
- In a view definition, we can
- define column names of view
- use aggregation (GROUP BY)
44View Resolution
- Views are
- computed by a sort of macro expansion when
neededview resolution - query modification compute fresh
- view materialization store
- always up to date modifications to the defining
tables are automatically reflected in the view
45Updating Views
- A view tuple may have as its source a combination
of base tuples, because of - joins
- aggregations
- Therefore, updating a view
- is nontrivial to determine
- may potentially lead to more than one update on
the defining relations - is often not allowed
46Updating Views (2)
- Restricted kinds of views may be updated. The
subselect must have - only one defining table
- columns including a candidate key of the table
- only column names, not expressions
- no joins, e.g., no correlative subqueries
- no DISTINCT keyword
- no aggregates
47CHECK OPTION
- Specified for updatable views
- Checks whether an INSERT or UPDATE to a view
would immediately cause the tuple to disappear
from the view - the new tuple should satisfy the WHERE clause of
the view definition
48SQL Constraints
- SQL allows declarative constraints such as
- CREATE ASSERTION assertion-name
- CHECK (enhanced subselect query)
- The DBMS checks if any ASSERTION is ever violated