Title: Files systems centralized DBS distributed DBS
1Introduction
- Files systems -gt centralized DBS -gt distributed
DBS - Review of DB technologies
- Why DDBS? Pros and Cons
- What is a DDBS?
- What are provided in a DDBS?
- What are the main design issues (components) of a
DDBS?
2From File Systems to DBS
- File Systems -
- What is a file system?
- A file contains data for a program
- When do I need a file?
- I want to store some data for later use by a
program - Main()
- Const int DATA_SIZE 100
- Int data_arrayDATA_SIZE
- Ifstream data_file (numbers.dat)
- Int i
-
-
3Database Applications
- Why do you use a database for your application?
- Database system provides facilities for better
management and control of data - It is easier to access the data comparing to a
file system - In the development of a database application, you
start with data modeling first (define the
databases) - NOTE In file systems, you may start with the
program first
4From File Systems to DBS
- Each application (program) has its own data
descriptions (file) - Why you design a new file? You need to write a
program - New applications mean new files may have to be
created - Files for different applications may have
different formats and the programs may be written
in different languages - Disadvantages
- Low degree of data sharing
- High degree of data redundancy - same (similar)
data (information) may be stored in different
files - Data isolation (replicated) data is scattered
in various files in different formats. Difficult
to write program to retrieve the data (update
(data consistency) problem) - Security problem (lack of a centralized
controller)
5File Systems
6Database Applications
- Database applications
- contains a collection of interrelated data and a
set of programs that allow users to access and
modify data - provide an abstract view of the data (data
abstraction) - hide the details on how the data
are stored and maintained - sharing of data and modeling of the real world
information (entity and entity relationships) - Why you need to design a database? You want to
model the information about the application - The system provides
- the definitions of the structures of the data
- mechanisms for manipulation of data (information)
- safety information for recovery and security
7Database Management
8Database application example
- Begin
- input(flight_no, date, customer_name)
- EXEC SQL SELECT STSOLD, CAP
- INTO temp1, temp2
- FROM FLIGHT
- WHERE FNO flight_no AND
DATE date -
-
-
- END
9Data Abstraction
- Separate application from data (information)
- Three levels to hide how the information is
stored and manipulated from applications - Physical level (lower level)
- describe how the data are actually stored in the
disk (physical format) - Conceptual level
- describe what data are actually stored and the
relationships that exist among data (relations) - View level
- describe only part of the entire database which
the user is interested. Different users may have
different views (created by applications)
10ANSI/SPARC Architecture
11File Systems? Databases Systems?
- Which one you prefer to use for your applications?
Currently, database systems are widely used for
various computer applications. The reasons are
Better organization of real-world information
(data modeling on real world entities), better
definition, control and manipulation of data, and
share of data, etc.. A tradeoff between
processing cost and efficiency in information
management
12Relational Database
- Data modeling techniques relational database,
object-oriented database, hierarchical DB, etc. - Relational DB
- databases are tables (regular shape, fixed no. of
attributes) - relation R defined over n sets D1, D2, , Dn is
a set of n-tuples ltd1, d2, , dngt such that
d1?D1, d2?D2, , dn?Dn - a table consists of rows and columns
- a row is a record (tuple) and a column is an
attribute (field) - a table may be defined with a key (keys)
- a key uniquely identifies a tuple in a relation
- the records in a table are unordered (why??)
13Example
14Relational DBS
- Normalization
- break a large bad table into several smaller
good tables - what is a good table (relation)? (from
transaction management viewpoints) - No repetition, update, insertion and deletion
anomaly - a step-by-step reversible process of replacing a
given collection of relations by successive
collections in which relations have a
progressively simpler and more regular structure - Integrity rules
- are constraints that define consistent states on
database, i.e., referential integrity
(relationships between records)
15Referential Integrity
Customer Table
Sale Table
16Relational Data Language
- Manipulation on data
- Relational Algebra and relational calculus
- Relational Algebra
- consist of a set of operators that operate on
relations - each operator takes one or two relations as
operands to produce a result relation - operators select, project, Cartesian product,
union and set difference (join, semi-join,
natural join, ...) - Relational Calculus
- specify the formal description of the result
without specifying how to obtain them, i.e.,
using SQL - specify a condition or an action on a domain
(data set)
17SAMPLE SQL
- SELECT EMP.ENAME
- FROM EMP, ASG, PROJ
- WHERE EMP.ENO ASG.ENO
- AND ASG.PNO PROJ.PRO
- AND PROJ.PNAME CAD/CAM
- UPDATE PAY
- SET SAL 25000
- WHERE PAY.TITLE Programmer
18Distributed Computing
- Non-centralized computing (not parallel
computing) - A number of (autonomous) processing elements (not
necessarily homogeneous) that are interconnected
by a computer network (may be a mobile network) - The processes cooperate with each other to
perform their assigned tasks - Require the support of network (delay in
communication)
Example Begin Read A site 1 Read B site
2 C A B write C site 3 End
site 1
A
C
B
site 2
site 3
19What are the main differences between DS and
DDBS?
- Distributed database system is one of the topics
under Distributed Systems - Distributed Systems normally deal with the lower
level communications between processes - Distributed database systems provides procedure
and facilities to support higher level database
applications - Processes -gt transactions
- Concentrate on the upper middleware layer and the
interactions with applications (not on
communication)
20DS Vs DDBS
- A distributed system consists of four levels
operating systems, network, middleware and
applications
Applications Middleware Operating
System Network (physical)
Distributed Database Systems
Distributed Systems
21Motivation of Distributed Database Systems
22Motivation of Distributed Database Systems
- Centralization
- put the components of a system at a centralized
site - Integration
- How different system components (or processing
elements) are joint together. They need not to be
resided at the same site, I.e., in a DDBS, the
system components are distributed at several
sites - They need to work together (and depend on each
other) to support the applications - Degree of integration
- Coupling how closely the components (of a DDBS)
are related - weak and strong coupling
- Synchronization how the actions of the
components are related - synchronous (mostly strong coupling) Vs
asynchronous
23Motivation of Distributed Database Systems
Site 1 Site 2 Site 3 Site 4
Application
Middleware (Oracle, Sybase, DB2, etc..)
Network and Operating systems
24Motivation of Distributed Database Systems
Site 1 Site 2 Site 3 Site 4
Application
Middleware
Network and Operating systems
25From Centralized DBS to DDBS
- What are distributed in a DDBS
- Processing Logic and functions
- Data
- Control
- Distribution of data partition a database into
fragments and distribute the fragments at several
sites (nodes) - Distribution of processing a transaction may
access data items located at several sites (a
transaction may have several processes at several
sites) - Distribution of control several sites work
together to management a transaction and control
the data
26Why Distributed Database Systems
- Organization reasons
- many applications are distributed in nature,
i.e., banking systems, air-flight booking
systems, and Internet applications - Interconnection of existing databases (degree of
sharing) - integration of databases of different
applications by a network - Incremental growth (expansibility)
- the growth of an applications may require to
create sub-systems at other sites - Autonomy
- each site may have its own database for its own
applications (transactions)
27Benefits of Distributed Database Systems
- Performance considerations (in addition to data
correctness) - distribution of workload to several sites (load
balancing) - reduce response time (the time duration between
completion time and submission time of a
request) - processing delay communication delays
- attempt to reduce the access delay and access
cost for data items at remote sites - Process the transaction (request) at local site
- Reliability and availability
- duplication of information at several sites
- higher reliability even with site failures
28Data Locality
Local Site
Remote Site
B
A
Application 01
90 (A) 10 (B)
29Cost of Distributed Database Systems
- Greater complexity
- Management of distributed data and higher cost to
maintain data integrity (data correctness) - database partition, consistency and mutual
consistency (the replicated data items have the
same value) - Management of distributed transactions
- processing of transactions in a distributed
environment - atomicity of distributed transactions
(all-or-none) - Impact of network
- message loss and errors, disconnection and
partition - Security
- security measures become more complicated and
have to be added at each site
30What is a Distributed Database System?
- An informal definition
- A distributed database (DDB) is a collection of
data items which belong logically to the same
system but are distributed at several sites over
a computer network - A distributed database system (DDBS) is
- A software ( or system) that manages the DDB.
- It provides access mechanism that makes this
distribution transparent to users. (Different
degrees of transparency)
31Implicit Assumptions
- Data
- stored at multiple sites
- Distributed database
- is a database, not a collection of files
- data items are logically related as exhibited in
the users access patterns - Processors
- at different sites are interconnected by a
computer network, not multiprocessors (what are
the differences? Communication delays) - DDBS
- Each site is a full-fledged DBS (each site shares
the responsibility in data management and
processing applications - not a remote file system
32What are NOT DDBSs?
- Not just a timesharing computer system
- Not a loosely or tightly coupled multiprocessor
system - Not a database system resides at one of the nodes
of a network of computers this is a centralized
database on a network node
33Centralized DBS on a Network
34Distributed DBS Environment
35Applications (conventional)
- Manufacturing especially multi-plant
manufacturing - Banking systems
- Corporate management information system (MIS)
- Airline reservation systems
- Hotel chains
- Any organization which has a decentralized
organization structure
36New database applications
- CAD/CAM design (object based database)
- Project management - workflow (long transactions)
- Knowledge discovery (data mining)
- Real-time applications (flight navigation and
Military command and control) - telephone management systems (location
management) - System monitoring (stock trading systems)
- E-commerce and Internet applications
- Sensor systems
37Distributed DBS Promises
- Transparent management of distributed,
fragmented, and replicated data (fragmentation
divides a large table into several smaller
tables) - Transparency
- The separation of the higher level semantics of a
system from the lower level implementation issues - Data transparency
- The applications do not know how the data are
represented physically - Location transparency
- The applications do not know what are locations
of the required data - Replication transparency
- The applications do not know whether the required
data are replicated or not
38Distributed DBS Promises
- Fragmentation transparency
- The applications do not know whether the required
data are fragmented or not - Horizontal fragmentation selection
- Vertical fragmentation projection
- Hybrid (both horizontal and vertical)
- Improved reliability/availability through
distributed transactions and data replication - Improved performance (locality) (get the data at
its local site) - Easier and more economical system expansion
(distribution)
39Transparent Access
40Distributed Database User View
What are the advantages of defining a DDBS in
this way?
41Distributed DBS - Reality
42Distributed DBS Issues
- Distributed Database Design
- How to distribute the database (fragments)
- Replicated non-replicated database distribution
- A related problem in directory management
(central vs. distributed) - Query Processing
- Convert transactions to data manipulation
instructions - Optimization problem (query optimization)
- Mincost data transmission local processing
43Distributed DBS Issues
- Concurrency Control and Deadlock Resolution
- Synchronization of concurrent accesses
- Consistency and isolation of transactions
effects - Deadlock management (I.e., in Two Phase Locking)
- Reliability Recovery (commitment, logging and
checkpointing) - How to make the system resilient to failures
- Atomicity and durability
44Relationship Between Issues
Transaction Processing
45Related Issues
- Operating System Support
- Operating system with proper support for database
operations - Open Systems and Interoperability
- Distributed Multi-database Systems
- More portable scenario
46Database Technology Timeline
Simple Data Management
Global Enterprise Management
Early 80s
Late 80s
Early - Mid 90s
Late 90s - 21st C
EarlyRelational
Client-server Relational
Enterprise -capable Relational
Internet Computing
Pre- relational
Packaged Vertical Applications
Data Warehouse Hi-end OLTP
Simple OLTP
Active Database
Middleware (messaging, queues, events) Java,
CORBA, Web interfaces
Scaleable OLTP, parallel query, partitioning,
cluster support, row-level locking, high
availability
Simple transactions, on-line backup recovery
Support for all types of data, extensibility,
objects
Stored procedures, triggers
47Current State of DDBSs
- These applications require
- Large users/transactions
- High performance
- High availability (7x24 operations)
- Scalability
- High levels of security
- Administrative support
- Good utilities
48References
- Ozsu Ch1, Ch2 (overview)
- Ceri Ch1