ARSITEKTUR DBMS TERDISTRIBUSI

About This Presentation

Title:

ARSITEKTUR DBMS TERDISTRIBUSI

Description:

arsitektur dbms terdistribusi * distributed dbms architecture mdbs architecture - models without a gcs * distributed dbms architecture mdbs architecture - models ... – PowerPoint PPT presentation

Number of Views:144

Avg rating:3.0/5.0

Slides: 63

Provided by: Wojt3

Category:

more less

Transcript and Presenter's Notes

Title: ARSITEKTUR DBMS TERDISTRIBUSI

1
ARSITEKTUR DBMS TERDISTRIBUSI
2
STANDARISASI DBMS

Berdasarkan Komponen.
Komponen dari sistem didefinisikan bersama
dengan keterkaitan antar komponen. Suatu DBMS
terdiri dari sejumlah komponen, masing-masing
menyediakan beberapa fungsi.
Berdasarkan Fungsi.
Kelas-kelas yang berbeda dari pengguna
diidentifikasi dan fungsi bahwa sistem akan
melakukan untuk masing-masing kelas
didefinisikan. Spesifikasi Sistem dalam kategori
ini biasanya menentukan struktur hirarki untuk
kelas pengguna.

3
STANDARISASI DBMS

Berdasarkan Data.
Jenis data yang berbeda diidentifikasi, dan
sebuah kerangka kerja arsitektur ditentukan yang
mendefinisikan unit fungsional yang akan
menyadari atau menggunakan data sesuai dengan
pandangan yang berbeda. Pendekatan (juga disebut
sebagai pendekatan datalogical) diklaim menjadi
pilihan lebih baik untuk kegiatan standardisasi.

4
STANDARISASI DBMSARSITEKTUR ANSI / SPARC

ANSI / SPARC arsitektur diklaim didasarkan pada
data organisasi. Ia mengakui tiga tampilan
datatampilan eksternal, yang adalah bahwa dari
pengguna, yang mungkin programmer, pandangan
internal, bahwa dari sistem atau mesin dan
pandangan konseptual, yaitu perusahaan.Untuk
masing-masing pandangan, definisi skema yang
tepat diperlukan.

5
STANDARISASI DBMSARSITEKTUR ANSI / SPARC
6
STANDARISASI DBMSARSITEKTUR ANSI / SPARC

Pada tingkat terendah arsitektur adalah pandangan
internal, yang berkaitan dengan definisi fisik
dan organisasi data.Pada ekstrem yang lain
adalah pandangan eksternal, yang berkaitan dengan
bagaimana para pemakai memandang
database.Antara kedua ujung adalah skema
konseptual, yang merupakan definisi abstrak dari
database. Ini adalah "dunia nyata" pandangan dari
perusahaan yang dimodelkan dalam database.

7
STANDARISASI DBMSARSITEKTUR ANSI / SPARC
8
STANDARISASI DBMSARSITEKTUR ANSI / SPARC

Kotak-kotak persegi merupakan fungsi pengolahan,
sedangkan segi enam adalah peran administratif.
Tanda panah menunjukkan data, perintah, program,
dan aliran deskripsi, sedangkan "Aku" bar
berbentuk pada mereka merupakan antarmuka.
Komponen utama yang memungkinkan pandangan
pemetaan antara data yang berbeda organisasi
adalah data dictionary / directory (digambarkan
sebagai segitiga), yang merupakan meta-database.
Database administrator bertanggung jawab untuk
menentukan definisi skema internal.
Peran perusahaan administrator adalah untuk
mempersiapkan definisi skema konseptual.
Administrator aplikasi bertanggung jawab untuk
mempersiapkan skema eksternal untuk aplikasi.

9
STANDARISASI DBMSARSITEKTUR ANSI / SPARC

Sistem ditandai sehubungan dengan(1) otonomi
sistem lokal,(2) distribusi,(3) heterogenitas.

10
MODEL ARSITEKTUR UNTUK DISTRIBUSI DBMS Otonomi

Otonomi mengacu pada distribusi kontrol, tidak
ada data. Hal ini menunjukkan sejauh mana DBMSs
individu dapat beroperasi secara
independen.Tiga alternatifketat
integrasisemiautonomous sistemisolasi total

11
MODEL ARSITEKTUR UNTUK DISTRIBUSI DBMS Otonomi

Tight integrasi.Gambar-tunggal seluruh database
tersedia untuk setiap pengguna yang ingin berbagi
informasi, yang dapat berada di beberapa
database. Dari sudut pandang pengguna, data
secara logis terpusat dalam satu database.
Semiautonomous sistem.The DBMSs dapat beroperasi
secara independen. Masing-masing DBMSs menentukan
bagian mana dari database mereka sendiri, mereka
akan membuat diakses pengguna DBMSs lain.
Total isolasi.Sistem DBMSs individu yang berdiri
sendiri, yang tidak mengetahui tentang keberadaan
DBMSs lain atau bagaimana berkomunikasi dengan
mereka.

12
MODEL ARSITEKTUR UNTUK DISTRIBUSI DBMS Otonomi

Distribusi mengacu pada distribusi data. Tentu
saja, kita sedang mempertimbangkan distribusi
fisik data melalui beberapa situs, pengguna
melihat data sebagai salah satu kolam renang
logis.Dua alternatifclient / server
distribusipeer-to-peer distribusi (distribusi
penuh)

13
MODEL ARSITEKTUR UNTUK DISTRIBUSI DBMS
Distribusi

Client / distribusi server.Klien / server
distribusi konsentrat tugas manajemen data pada
server sedangkan klien fokus pada penyediaan
lingkungan aplikasi termasuk user interface.
Tugas komunikasi yang dibagi antara mesin klien
dan server. Client / server DBMSs merupakan usaha
pertama di mendistribusikan fungsionalitas.
Peer-to-peer distribusi.Tidak ada perbedaan dari
mesin klien versus server. Setiap mesin memiliki
fungsionalitas penuh DBMS dan dapat berkomunikasi
dengan mesin lainnya untuk mengeksekusi query dan
transaksi.

14
MODEL ARSITEKTUR UNTUK DISTRIBUSI DBMS
Heterogenitas

Heterogenitas dapat terjadi dalam berbagai
bentuk dalam sistem terdistribusi, mulai bentuk
heterogenitas perangkat keras dan perbedaan dalam
jaringan protokol untuk variasi dalam manajer
data.Mewakili data dengan alat pemodelan yang
berbeda menciptakan heterogenitas karena kekuatan
ekspresif yang melekat dan keterbatasan model
data individu. Heterogenitas dalam bahasa query
tidak hanya melibatkan penggunaan paradigma yang
sama sekali berbeda akses data dalam model data
yang berbeda, tetapi juga mencakup perbedaan
dalam bahasa bahkan ketika sistem individu
menggunakan model data yang sama.

15
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs -
ALTERNATIVES

The dimensions are identified as A (autonomy),
D (distribution) and H (heterogeneity).
The alternatives along each dimension are
identified by numbers as 0, 1 or 2.
A0 - tight integration D0 - no distribution
A1 - semiautonomous systems D1 - client /
server systems
A2 - total isolation D2 - peer-to-peer systems
H0 - homogeneous systems
H1 - heterogeneous systems

16
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs -
ALTERNATIVES

(A0, D0, H0)
If there is no distribution or heterogeneity,
the system is a set of multiple DBMSs that are
logically integrated.
(A0, D0, H1)
If heterogeneity is introduced, one has multiple
data managers that are heterogeneous but provide
an integrated view to the user.
(A0, D1, H0)
The more interesting case is where the database
is distributed even though an integrated view of
the data is provided to users (client / server
distribution).

17
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs -
ALTERNATIVES

(A0, D2, H0)
The same type of transparency is provided to the
user in a fully distributed environment. There is
no distinction among clients and servers, each
site providing identical functionality.
(A1, D0, H0)
These are semiautonomous systems, which are
commonly termed federated DBMS. The component
systems in a federated environment have
significant autonomy in their execution, but
their participation in the federation indicate
that they are willing to cooperate with other in
executing user requests that access multiple
databases.

18
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs -
ALTERNATIVES

(A1, D0, H1)
These are systems that introduce heterogeneity
as well as autonomy, what we might call a
heterogeneous federated DBMS.
(A1, D1, H1)
System of this type introduce distribution by
pacing component systems on different machines.
They may be referred to as distributed,
heterogeneous federated DBMS.
(A2, D0, H0)
Now we have full autonomy. These are
multidatabase systems (MDBS). The components have
no concept of cooperation. Without heterogeneity
and distribution, an MDBS is an interconnected
collection of autonomous databases.

19
ARCHITECTURAL MODELS FOR DISTRIBUTED DBMSs -
ALTERNATIVES

(A2, D0, H1)
These case is realistic, maybe even more so than
(A1, D0, H1), in that we always want to built
applications which access data from multiple
storage systems with different characteristics.
(A2, D1, H1) and (A2, D2, H1)
These two cases are together, because of the
similarity of the problem. They both represent
the case where component databases that make up
the MDBS are distributed over a number of sites -
we call this the distributed MDBS.

20
DISTRIBUTED DBMS ARCHITECTURE

Client / server systems - (Ax, D1, Hy)
Distributed databases - (A0, D2, H0)
Multidatabase systems - (A2, Dx, Hy)

21
DISTRIBUTED DBMS ARCHITECTURECLIENT / SERVER
SYSTEMS

This provides two-level architecture which make
it easier to manage the complexity of modern
DBMSs and the complexity of distribution.
The server does most of the data management work
(query processing and optimization, transaction
management, storage management).
The client is the application and the user
interface (management the data that is cached to
the client, management the transaction locks).

22
DISTRIBUTED DBMS ARCHITECTURECLIENT / SERVER
SYSTEMS

This architecture is quite common in relational
systems where the communication between the
clients and the server(s) is at the level of SQL
statements.

23
DISTRIBUTED DBMS ARCHITECTURECLIENT / SERVER
SYSTEMS

Multiple client - single server
From a data management perspective, this is not
much different from centralized databases since
the database is stored on only one machine (the
server) which also hosts the software to manage
it. However, there are some differences from
centralized systems in the way transactions are
executed and caches are managed.
Multiple client - multiple server
In this case, two alternative management
strategies are possible either each client
manages its own connection to the appropriate
server or each client knows of only its home
server which then communicates with other
servers as required.

24
DISTRIBUTED DBMS ARCHITECTUREPEER-TO-PEER
DISTRIBUTED SYSTEMS

The physical data organization on each machine
may be different.
Local internal scheme (LIS) - is an individual
internal schema definition at each site.
Global conceptual schema (GCS) - describes the
enterprise view of the data.
Local conceptual schema (LCS) - describes the
logical organization of data at each site.
External schemas (ESs) - support user
applications and user access to the database.

25
DISTRIBUTED DBMS ARCHITECTUREPEER-TO-PEER
DISTRIBUTED SYSTEMS
26
DISTRIBUTED DBMS ARCHITECTUREPEER-TO-PEER
DISTRIBUTED SYSTEMS

In these case, the ANSI/SPARC model is extended
by the addition of global directory / dictionary
(GD/D) to permits the required global mappings.
The local mappings are still performed by local
directory / dictionary (LD/D). The local database
management components are integrated by means of
global DBMS functions. Local conceptual schemas
are mappings of global schema onto each site.

27
DISTRIBUTED DBMS ARCHITECTUREPEER-TO-PEER
DISTRIBUTED SYSTEMS

The detailed components of a distributed DBMS.
Two major components
user processor
data processor

28
DISTRIBUTED DBMS ARCHITECTUREPEER-TO-PEER
DISTRIBUTED SYSTEMS

User processor
user interface handler - is responsible for
interpreting user commands as they come in, and
formatting the result data as it is sent to the
user,
semantic data controller - uses the integrity
constraints and authorizations that are defined
as part of the global conceptual schema to check
if the user query can be processed,
global query optimizer and decomposer -
determines an execution strategy to minimize a
cost function, and translates the global queries
in local ones using the global and local
conceptual schemas as well as global directory,
distributed execution monitor - coordinates the
distributed execution of the user request.

29
DISTRIBUTED DBMS ARCHITECTUREPEER-TO-PEER
DISTRIBUTED SYSTEMS

Data processor
local query optimizer - is responsible for
choosing the best access path to access any data
item,
local recovery manager - is responsible for
making sure that the local database remains
consistent even when failures occur,
run-time support processor - physically accesses
the database according to the physical commands
in the schedule generated by the query optimizer.
This is the interface to the operating system and
contains the database buffer (or cache) manager,
which is responsible for maintaining the main
memory buffers and managing the data accesses.

30
DISTRIBUTED DBMS ARCHITECTUREMDBS ARCHITECTURE

Models using a Global Conceptual Schema (GCS)
The GCS is defined by integrating either the
external schemas of local autonomous databases or
parts of their local conceptual schemas.
If the heterogeneity exists in the system, then
two implementation alternatives exists unilingual
and multilingual.
Models without a Global Conceptual Schema (GCS)
The existence of a global conceptual schema in a
multidatabase system is a controversial issue.
There are researchers who even define a
multidatabase management system as one that
manages several databases without the global
schema.

31
DISTRIBUTED DBMS ARCHITECTUREMDBS ARCHITECTURE -
models using a GCS
32
DISTRIBUTED DBMS ARCHITECTUREMDBS ARCHITECTURE -
models using a GCS

A unilingual multi-DBMS requires the users to
utilize possibly different data models and
languages when both a local database and the
global database are accessed.
Any application that accesses data from multiple
databases must do so by means of an external view
that is defined on the global conceptual schema.
One application may have a local external schema
(LES) defined on the local conceptual schema as
well as a global external schema (GES) defined on
the global conceptual schema.

33
DISTRIBUTED DBMS ARCHITECTUREMDBS ARCHITECTURE -
models using a GCS

An alternative is multilingual architecture,
where the basic philosophy is to permit each user
to access the global database by means of an
external schema, defined using the language of
the users local DBMS.
The multilingual approach obviously makes
querying the databases easier from the users
perspective. However, it is more complicated
because we must deal with translation of queries
at run time.

34
DISTRIBUTED DBMS ARCHITECTUREMDBS ARCHITECTURE -
models without a GCS
35
DISTRIBUTED DBMS ARCHITECTUREMDBS ARCHITECTURE -
models without a GCS

The architecture identifies two layers the local
system layer and the multidatabase layer on top
of it.
The local system layer consists of a number of
DBMSs, which present to the multidatabase layer
the part of their local database they are willing
to share with users of the other databases. This
shared data is presented either as the actual
local conceptual schema or as a local external
schema definition.
The multidatabase layer consist of a number of
external views, which are constructed where each
view may be defined on one local conceptual
schema or on multiple conceptual schemas. Thus
the responsibility of providing access to
multiple databases is delegated to the mapping
between the external schemas and the local
conceptual schemas.

36
DISTRIBUTED DBMS ARCHITECTUREMDBS ARCHITECTURE -
models without a GCS

The MDBS provides a layer of software that runs
on top of these individual DBMSs and provides
users with the facilities of accessing various
databases.
Fig. represents a nondistributed multi-DBMS. If
the system is distributed, we would need to
replicate the multidatabase layer to each site
where there is a local DBMS that participates in
the system.

37
DISTRIBUTED DBMS ARCHITECTUREGLOBAL DIRECTORY
ISSUE

The global directory includes information about
the location of the fragments as well as the
makeup of the fragments.
The directory is itself a database that contains
meta-data about the actual data stored in the
database.
We have three dimensions
1.type 2.location 3.replication

38
DISTRIBUTED DBMS ARCHITECTUREGLOBAL DIRECTORY
ISSUE

Type
A directory maybe either global to the entire
database or local to each site. In other words,
there might be a single directory containing
information about all the data in the database,
or a number of directories, each containing the
information stored at one site.
Location
The directory maybe maintained centrally at one
site, or in a distributed fashion by distributing
it over a number of sites.
Replication
There maybe a single copy of the directory or
multiply copies.

39
DISTRIBUTED DBMS ARCHITECTUREGLOBAL DIRECTORY
ISSUE

These three dimensions are orthogonal to one
another. The unrealistic combination have been
designed by a question mark.

40
DISTRIBUTED DATABASE DESIGN
41
DISTRIBUTED DATABASE DESIGN

The organization of distributed systems can be
investigated along three orthogonal dimensions
1. Level of sharing
2. Behavior of access patterns
3. Level of knowledge on access pattern behavior

42
DISTRIBUTED DATABASE DESIGN

Level of sharing
no sharing - each application and its data
execute at one site,
data sharing - all the programs are replicated at
all the sites, but data files are not,
data plus program sharing - both data and
programs may be shared.
Behavior of access patterns
static - access patterns of user requests do not
change over time,
dynamic - access patterns of user requests change
over time.
Level of knowledge on access pattern behavior
complete information - the access patterns can
reasonably be predicted and do not deviate
significantly from the predictions,
partial information - there are deviations from
the predictions.

43
ALTERNATIVE DESIGN STRATEGIES

Two major strategies that have been identified
for designing distributed databases are
the top-down approach
the bottom-up approach

44
ALTERNATIVE DESIGN STRATEGIESTOP-DOWN DESIGN
PROCESS
45
ALTERNATIVE DESIGN STRATEGIESTOP-DOWN DESIGN
PROCESS

view design - defining the interfaces for end
users,
conceptual design - is the process by which the
enterprise is examined to determine entity types
and relationships among these entities. One can
possibly divide this process into to related
activity groups
entity analysis - is concerned with determining
the entities, their attributes, and the
relationships among these entities,
functional analysis - is concerned with
determining the fundamental functions with which
the modeled enterprise is involved.

46
ALTERNATIVE DESIGN STRATEGIESTOP-DOWN DESIGN
PROCESS

distributions design - design the local
conceptual schemas by distributing the entities
over the sites of the distributed system. The
distribution design activity consists of two
steps
fragmentation
allocation
physical design - is the process, which maps the
local conceptual schemas to the physical storage
devices available at the corresponding sites,
observation and monitoring - the results is some
form of feedback, which may result in backing up
to one of the earlier steps in the design.

47
ALTERNATIVE DESIGN STRATEGIESBOTTOM-UP DESIGN
PROCESS

Top-down design is a suitable approach when a
database system is being designed from scratch.
If a number of databases already exist, and the
design task involves integrating them into one
database - the bottom-up approach is suitable for
this type of environment. The starting point of
bottom-up design is the individual local
conceptual schemas. The process consists of
integrating local schemas into the global
conceptual schema.

48
DISTRIBUTION DESIGN ISSUESREASONS FOR
FRAGMENTATION

The important issue is the appropriate unit of
distribution. For a number of reasons it is only
natural to consider subsets of relations as
distribution units.
If the applications that have views defined on a
given relation reside at different sites, two
alternatives can be followed, with the entire
relation being the unit of distribution. The
relation is not replicated and is stored at only
one site, or it is replicated at all or some of
the sites where the applications reside.
The fragmentation of relations typically results
in the parallel execution of a single query by
dividing it into a set of subqueries that operate
on fragments. Thus, fragmentation typically
increases the level of concurrency and therefore
the system throughput.

49
DISTRIBUTION DESIGN ISSUESREASONS FOR
FRAGMENTATION

There are also the disadvantages of
fragmentation
if the application have conflicting requirements
which prevent decomposition of the relation into
mutually exclusive fragments, those applications
whose views are defined on more than one fragment
may suffer performance degradation,
the second problem is related to semantic data
control, specifically to integrity checking.

50
DISTRIBUTION DESIGN ISSUESFRAGMENTATION
ALTERNATIVES

The are clearly two alternatives
horizontal fragmentation
vertical fragmentation
The fragmentation may, of course, be nested. If
the nestings are of different types, one gets
hybrid fragmentation.

51
DISTRIBUTION DESIGN ISSUESDEGREE OF FRAGMENTATION

The extent to which the database should be
fragmented is an important decision that affects
the performance of query execution.
The degree of fragmentation goes from one
extreme, that is, not to fragment at all, to the
other extreme, to fragment to the level of
individual tuples (in the case of horizontal
fragmentation) or to the level of individual
attributes (in the case of vertical
fragmentation).

52
DISTRIBUTION DESIGN ISSUESCORRECTNESS RULES OF
FRAGMENTATION

Completeness
If a relation instance R is decomposed into
fragments R1,R2, ..., Rn, each data item that can
be found in R can also be found in one or more of
Ris. This property is also important in
fragmentation since it ensures that the data in a
global relation is mapped into fragments without
any loss.
Reconstruction
If a relation R is decomposed into fragments
R1,R2, ..., Rn, it should be possible to define a
relational operator ? such that
R ?Ri, ? Ri?FR
The reconstructability of the relation from its
fragments ensures that constraints defined on the
data in the form of dependencies are preserved.

53
DISTRIBUTION DESIGN ISSUESCORRECTNESS RULES OF
FRAGMENTATION

Disjointness
If a relation R is horizontally decomposed into
fragments R1,R2, ..., Rn and data item di is in
Rj, it is not in any other fragment Rk (k ? j).
This criterion ensures that the horizontal
fragments are disjoint. If relation R is
vertically decomposed, its primary key attributes
are typically repeated in all its fragments.
Therefore, in case of vertical partitioning,
disjointness is defined only on the nonprimary
key attributes of a relation.

54
DISTRIBUTION DESIGN ISSUESALLOCATION ALTERNATIVES

The reasons for replication are reliability and
efficiency of read-only queries.
Read-only queries that access the same data items
can be executed in parallel since copies exist on
multiple sites.
The execution of update queries cause trouble
since the system has to ensure that all the
copies of the data are updated properly.
The decisions regarding replication is a
trade-off which depends on the ratio of the
read-only queries to the update queries.

55
DISTRIBUTION DESIGN ISSUESALLOCATION ALTERNATIVES

A nonreplicated database (commonly called a
partitioned database) contains fragments that are
allocated to sites, and there is only one copy of
any fragment on the network.
In case of replication, either the database
exists in its entirety at each site (fully
replicated database), or fragments are
distributed to the sites in such a way that
copies of a fragment may reside in multiple sites
(partially replicated database).

56
DISTRIBUTION DESIGN ISSUESALLOCATION ALTERNATIVES
57
DISTRIBUTION DESIGN ISSUESINFORMATION
REQUIREMENTS

The information needed for distribution design
can be divided into four categories
database information,
application information,
communication network information,
computer system information.

58
DISTRIBUTION DESIGN ISSUES FRAGMENTATION

Horizontal fragmentation partitions a relation
along its tuples
Two versions of horizontal fragmentation
Primary horizontal fragmentation of relation is
performed using predicates that are defined on
that relation
Derived fragmentation is the partitioning of
relation that results from predicates being
defined on another relation

59
DISTRIBUTION DESIGN ISSUES FRAGMENTATION

Vertical fragmentation partitions a relation into
a set of smaller relations so that many of users
aplications will run on only one fragment
Vertical fragmentation is inherently more
complicated than horizontal partitioning

60
DISTRIBUTION DESIGN ISSUESALLOCATION

Allocation problem
there are set of fragments F F1, F2, ... , Fn
and network consisiting of sites S S1, S2,
... , Sm on wich sets aplications Q q1, q2,
... , qq is running
The allocation problem involves finding the
optimal distribution of F to S

61
DISTRIBUTION DESIGN ISSUESALLOCATION

One of important issues that need to be discussed
is the definition of optimality
The optimality can be defined with respects of
two measures Dowdy and Foster, 1982
Minimal cost. The cost consists of the cost of
storing each Fi at the site Sj, the cost of
quering Fi at Sj, the cost of updating Fi, at all
sites it is stored, and cost of data
comunication. The allocation problem,then,
attempts to find an alocations scheme that
minimizes cost function.

62
DISTRIBUTION DESIGN ISSUESALLOCATION

Perfomance. The allocation strategy is designed
to maintain a performance mertic. Two well-known
are to minimize the response time and to maximize
the system throughput at each site

Write a Comment

User Comments (0)