Distributed, Parallel and Internet Databases - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Distributed, Parallel and Internet Databases

Description:

a DDBS is usually a single application distributed over various sites ... Optimal size depends on range of cost-effective granule sizes, and workload characteristics ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 22
Provided by: DavidN161
Category:

less

Transcript and Presenter's Notes

Title: Distributed, Parallel and Internet Databases


1
Distributed, Parallel and Internet Databases
  • David Nelson
  • CAT
  • January 2004

2
Contents
  • Definitions
  • Distributed Database Systems
  • Homogeneous and Heterogeneous Systems
  • Federated DBMS Systems
  • X/OPEN DTP Standard
  • Internet Databases
  • Client/Server Architectures, advantages and
    disadvantages
  • Web Database Approaches
  • Further Reading

3
Definitions
  • Distributed Database System
  • the ability of the DDBS users to run applications
    at each node
  • Parallel Database System
  • black box nodes
  • Federated Database System
  • a DDBS is usually a single application
    distributed over various sites
  • a FDBS is a cooperating multiple system

4
Traditional Architecture
  • Traditional Database Systems are based on a
    two-tier client-server architecture
  • User interface
  • Main business and data processing logic

Client
Database Server
  • Server-side validation
  • Database access

5
Distributed Database Systems
  • System needs facilities to be able to
  • perform distributed query optimization
  • manage distributed transactions
  • manage data replication
  • Homogeneous DDBS simplest case
  • several sites, each running applications on same
    DBMS with same schema and transactions
  • location transparency
  • can communicate over large distances, and are
    autonomous

6
Heterogeneous DDBMS
  • Several existing databases (using different
    DBMSs) linked into a single system
  • Problems
  • variation in costs of operation between sites
  • some operations may not be available at some
    sites
  • some DBMSs cannot read records of others
  • varying base types
  • Requesting site must
  • have detailed knowledge of operation of remote
    system
  • assume remote system has only rudimentary
    functionality
  • make programmer do query composition by hand

7
Federated DBMS
  • A collection of independently managed,
    heterogeneous database systems
  • allow partial and controlled sharing of data
    without affecting existing applications

Federated schema
Federated to local schema mapping
Local schema
Federated schema
Federated to local schema mapping
Local schema
8
Client-Server Systems
  • DB applications can be classified according to
    granularity of database accesses
  • coarse grained - transactions which access large
    volumes of data, e.g. decision support, large
    scientific apps
  • medium grained - access groups of up to several
    dozens of records, e.g. order entry, convential
    commercial apps)
  • fine grained - typically navigational, accessing
    small portions of linked records, e.g. CAD, CASE
    etc.

9
Coarse Grained Systems
  • Few large interactions with database
  • will benefit from parallel execution of queries
    on the server
  • may be necessary to bring back useful data to
    client for sequential processing

Application (on client)
Application (on client)
Database server
10
Fine Grained Systems
  • Often advantageous to request large data
    transfers from server to local cache
  • high locality of reference gives good performance
  • built-in inter-query parallelism

application client database (cache)
application client database (cache)
Database server
11
Data Placement
  • Two aims in spreading the data over the nodes
  • balance load between nodes (or disks)
  • minimize data transfers between nodes
  • Aims are incompatible - need a compromise
  • place frequently co-referenced records on the
    same node
  • ensure groups and evenly distributed between
    nodes
  • Optimal size depends on range of cost-effective
    granule sizes, and workload characteristics

12
Parallel Database Architectures
  • Classified according to the elements that are
    shared
  • Shared Memory Systems
  • Shared Disk Systems
  • Shared Nothing Systems
  • Hybrid Systems

13
Shared Memory Systems
  • All processors share disks and main memory
  • Inter-query parallelism achieved by adding more
    processors
  • intra-query difficult
  • Does not scale well beyond 20 processors
  • Many commercial systems (Oracle, DB2, Ingres)

...
processor
processor
processor
Memory
...
14
Shared Disk Systems
  • Consists of nodes (processors and memory) and a
    pool of shared disks
  • Easy to port existing systems
  • by mapping transactions to single nodes (gives
    inter-query parallelism)
  • Limited by volume of traffic on communications
    network

...
Processor memory
Processor memory
Processor memory
...
15
Shared Nothing Systems
  • Nodes linked by a communications network
  • Data allocated to nodes
  • every query must be parallelised
  • load balancing is difficult, as data is
    statically distributed
  • Heavily reliant on intra-query parallelism
  • e.g. ICL Goldrush

...
Processor memory
Processor memory
Processor memory
16
Hybrid Systems
  • May have a top-level shared disk architecture
  • Individual nodes are themselves shared memory
    systems
  • Good compromise between load balancing and
    scalability

17
Web Architecture
  • Need for enterprise scalability causes problems
    which can be solved by a three-tier architecture
  • User interface

Client
Application Server
  • Business logic
  • Data processing logic
  • Server-side validation
  • Database access

Database Server
18
Web as a Database Platform
  • Advantages
  • DBMS advantages
  • Simplicity
  • Graphical User Interface
  • Standardization
  • Cross-platform support
  • Transparent network access
  • Scalable deployment
  • Innovation

19
Web as a Database Platform
  • Disadvantages
  • Reliability
  • Security
  • Cost
  • Scalability
  • Limited HTML Functionality
  • Statelessness
  • Bandwidth
  • Performance
  • Immaturity of development tools

20
Approaches
  • CGI
  • Server Side Includes
  • HTTP Cookies
  • API (non-CGI gateways)
  • ODBC
  • Java (JDBC, JSQL, JRB)
  • JavaScript, JScript
  • Microsoft Active Platform (ASP, ADO, ActiveX)
  • PHP (Hypertext Preprocessor)

21
Further Reading
  • Connolly and Begg, Database Systems, chapters 22,
    28, 29, and appendix H.
  • Ozsu and Valduriez, Principles of Distributed
    Database Systems, 2nd edition
  • everything you ever wanted to know about
    distributed database systems
  • Chaudri and Zicari, Succeeding with Object
    Databases, 2001.
  • Elmasri, Fundamentals of Database Systems, 3rd
    edition, 1999.
Write a Comment
User Comments (0)
About PowerShow.com