Title: COMP 5138 Relational Database Management Systems
1COMP 5138Relational Database Management Systems
Semester 2, 2007 Lecture 8 Database System
Architectures
2Todays Agenda
- Database System Architectures
- Centralized systems
- Client-Server systems
- Three-tier systems
- Presentation layer technologies
- Application layer technologies
3Data-intensive Systems
- Three types of functionality
- The system architecture determines whether these
three components reside on a single system
(1-tier) or whether they are distributed across
several tiers
- Presentation Services
- Input keyboard/mouse
- Output monitor/printer
GUI Interface
Application Services- Business rules - I/O
processing
Procedures, functions, programs
Data Management(Storage Logic) - data storage
and retrieval
DBMS activities
4Single-User System
centralized system
DBMS
presentation application services
services
user module
- Presentation Services - displays forms, handles
flow of information to/from screen - Application Services - implements user request,
interacts with DBMS - Transactional properties automatic (isolation is
trivial) or not required (this is not really an
enterprise) - DBMS runs within the user process
- Examples Access or Berkeley DB
5Centralized Multi-User System
- Dumb terminals connected to mainframe
- Application and presentation services on
mainframe - ACID properties required
- Atomicity - Consistency - Isolation - Durability
- Isolation DBMS sees an interleaved schedule
- Atomicity and durability system supports a major
enterprise - Transaction abstraction, implemented by DBMS,
provides ACID properties - More details on implementing transactions in week
11
6Centralized Multi-User System
communication
central machine
presentation application services
services
DBMS
presentation application services
services
user module
dumb terminal
7Evaluation
- All functionality combined into a single tier
- Centralised system runs on a single computer
system and does not interact with other computer
systems - System resources must be used for both DBMS and
application logic - Usually mainframe, but also PC
- Benefit
- Easy maintenance and central administration
- Drawback
- Bad scalability for larger number of users
- Typically only dumb terminals as a user interface
8Two-Tier Architectures Client-Server
- Functionality divided between client and server
- typically separated by a network
- Server systems satisfy requests of m client
systems
9Front-End versus Back-End
- Database functionality can be divided into
- Back-end manages access structures, query
evaluation and optimization, concurrency control
and recovery. - Front-end consists of tools such as forms,
report-writers, and graphical user interface
facilities. - The interface between the front-end and the
back-end is through SQL or through an application
program interface.
SQL user-interface
formsinterface
report writer
graphical interface
front-end
interface(SQL API)
SQLengine
back-end
10Thin versus Thick Client
- Database applications data management
application logic presentation - Work division Thin client
- Client responsible for presentation (typically
GUI) - Server business logic and data management
- Work division Thick client
- Client implements both user interface and
business logic - Server data management only
Server
Data Management
ApplicationLogic
?
?
11C/S Architecture Zoom-In
- Client and server are completely separated
- 2 separated processes / systems
- Communication between client server via network
- Usage of a special protocol, e.g. JDBC, Net8
(Oracle)
Server
Client
DB Server
Client
Process
Process
DB ServerLibrary
DB ClientLibrary / Driver
Network
OperatingSystem
OperatingSystem
12DB Server Internals
DB Server
Monolithic Server single DB process
Multiple Server multiple DB processes
Symmetric per AP one DB process
Asymmetric dynamic assignment of APs to DB
processes
13Monolithic Server
- Exactly one server process for all clients
(One-to-many) - DBMS server process typically prioritised by
operating system - Server uses multi-threading
- Server does own resource management
- Examples Sybase, MS SQL Server
AP Client i
DBMS Server
AP Client 1
AP Client k
CL
CL
CL
operating system 1
operating system n
network
14Multiple Server
- DBMS consists of several processes
- Scheduling done by the operating system
- Communication between
- server processes via shared memory
- clients and servers via operating system or
network - Two variants
- Symmetric - each client is mapped to exactly one
server process static mapping with a certain
number of n servers pre-generated ? maximal
degree of parallelism is n - Asymmetric - a dispatcher connects a client to a
server process. Again, a certain number of
servers are instantiated beforehand, but degree
of parallelism can be higher - Examples Oracle, Informix, DB2
15Asymmetric Multiple Server
DBMS Server k
AP Client i
DBMS Server 1
AP Client 1
AP Client m
Dispatcher
CL
CL
CL
sharedmem
Operating system 1
Operating system n
network
16Oracle Server Architecture
17Oracle Server Configurations
18Oracle Client-Server Connections
19Comparison
- Thick Clients
- Makes good use of resources on desktop machines
which are often lightly loaded - Allows server machine to concentrate on disk
transfers etc - More responsive
- Some responses dont need to cross the network
- Eg data validation done in client
- May scale better
- Perhaps smaller and fewer data transfers between
client and server (if many different
presentations are needed of the same data)
- Thin clients
- Server doesnt need to trust clients
- No threat to data integrity from malicious client
- Less complex administration and maintenance
- Eg no need to deploy upgrades or patches to
application logic? - May scale better
- Perhaps smaller and fewer data transfers between
client and server (if results of database queries
are further reduced and filtered in application
layer, to just what user needs)
The best design varies according to technology
changes
20Three-Tier Architectures
Client Program
Presentation Tier
- primary interface to the user (e.g. web
browser) - needs to adapt to different display devices
(PC, PDA, ...)
Web / Application Server
Middle Tier
- implements business logic and workflow
- may access several different DBMSs
Database System
Data ManagementTier
- one or more standard database management systems
21 Three-Tiered Model of TPS
database server machine
applic. server machines
trans. server machines
client machines
present. server
DBMS
trans. server
applic. server
present. server
communication
22Interconnection/Clustering of Servers in
Three-Tiered Model
presentation server
presentation server
presentation server
presentation server
application server
application server
transaction server
transaction server
database server
database server
23E-Commerce as Example
Web Browser
http
Web Server (dynamic web pages such as CGI or JSP)
JDBC or ODBC
- Web browser is client
- Application logic in dynamic web pages
- Servlets or Java Server Pages (JSP) Java code
that runs in web server - Non-Java world ASP, PHP, CGI,
- Database access from servlets / JSPs via JDBC
Database System
24More complicated example
Web Browser
Web Server (servlets JSP)
http
RMI
Application Server
- Dynamic web pages mostly presentation
- Again servlets or JSPs
- Application Server
- Container for application logice.g. Enterprise
Java Beans (EJB) - Resource mgmt. (e.g. db connections)
- Database system
- is accessed from application serverusing JDBC or
ODBC
EJB
EJB
EJB
EJB
JDBC ODBC
Database System
25Example 1 Course Enrolment System
- Build a system in which students can enroll in
courses - Client Program
- Log in different users (students, staff,
faculty), display forms and human-readable output - Web Server
- Logic to add a course, drop a course, create a
new course, etc. - Database System
- Student info, course info, instructor info,
course availability, pre-requisites, etc.
Browser
Client
Web Server
Server
Client
Server
Database(s)
26Example 2 Internet Shop
Browser
- Web Browser GUI, navigation, presentation,
simple input checks -
- WWW ServerLog in different users, session
management, display forms etc. - Application ServerLogic for placing orders,
purchasing, order status checks, etc. - Databaseitem infos, customer data, orders,
Client
Server
Web Server
Client
Application Server
Server
Client
Server
Database(s)
27Technologies for the Three Tiers
Client Program (Web Browser)
HTML JavaScript
HTTP
Application Logic(Web Server Application
Server)
Servlets JSP ASP J2EE EJB .NET
JDBC SQLJ OLEDB
Data Management(Database System)
SQL, XML Stored Procedures
28Advantages of Three-Tier Architectures
- Heterogeneous Systems
- Tiers can be independently maintained, modified,
and replaced - Thin Clients
- Only presentation layer at clients (web browser)
- Integrated data access
- Several database systems can be handled
transparently at the middle tier - Central management of connections
- Scalability
- Replication at middle tier allows scalability of
business logic - Software development
- Code for business logic is centralised
- Interaction between tiers through well-defined
APIsOne can reuse standard components at each
tier
29Review Question
- What kind of system architecture has Oracles
iSQLPlus, which we are using in the labs? - 1-Tier (centralized)
- 2-Tier (Client/Server)
- 3-Tier
30Todays Agenda
- Database System Architectures
- Presentation layer technologies
- HTML, Javascript
- Application layer technologies
31Overview of the Presentation Tier
- Recall Functionality of the presentation tier
- Primary interface to the user
- Needs to adapt to different display devices (PC,
PDA, cell phone, voice access?) - Simple functionality, such as field validity
checking - We will cover
- HTML Forms How to pass data to the middle tier
- JavaScript Simple functionality at the
presentation tier
32HTML Forms
- Common way to communicate data from client to
middle tier - General format of a form
- ltFORM ACTIONpage.jsp METHODGET NAMELogin
Formgtlt/FORMgt - Components of an HTML FORM tag
- ACTION Specifies URI that handles the content
- METHOD Specifies HTTP GET or POST method
- NAME Name of the form can be used in
client-side scripts to refer to the form
33Inside HTML Forms
- INPUT tag
- Attributes
- TYPE text (text input field), password (text
input field where input is, reset (resets all
input fields) - NAME symbolic name, used to identify field value
at the middle tier - VALUE default value
- Example ltINPUT TYPEtext Nametitlegt
- Example form
- ltform method"POST" action"TableOfContents.jsp"gt
- ltinput type"text" name"userid"gt
- ltinput type"password" name"password"gt
- ltinput type"submit" value"Login
name"submit"gt - ltinput typereset valueCleargt
- lt/formgt
34Passing Arguments
- Two methods GET and POST
- GET
- Form contents go into the submitted URI
- Structureaction?name1value1name2value2name3
value3 - Action name of the URI specified in the form
- (name,value)-pairs come from INPUT fields in the
form empty fields have empty values (name) - Example from previous password formTableOfConten
ts.jsp?useridjohnpasswordjohnpw - Note that the page named action needs to be a
program, script, or page that will process the
user input
35HTTP GET Encoding Form Fields
- Form fields can contain general ASCII characters
that cannot appear in an URI - A special encoding convention converts such field
values into URI-compatible characters - Convert all special characters to xyz, were
xyz is the ASCII code of the character. Special
characters include , , , , etc. - Convert all spaces to the character
- Glue (name,value)-pairs from the form INPUT tags
together with to form the URI
36HTML Forms A Complete Example
- ltform method"POST" action"TableOfContents.jsp"gt
- lttable align "center" border"0" width"300"gt
- lttrgt
- lttdgtUseridlt/tdgt
- lttdgtltinput type"text" name"userid"
size"20"gtlt/tdgt - lt/trgt
- lttrgt
- lttdgtPasswordlt/tdgt
- lttdgtltinput type"password" name"password"
size"20"gtlt/tdgt - lt/trgt
- lttrgt
- lttd align "center"gtltinput type"submit"
value"Login - name"submit"gtlt/tdgt
- lt/trgt
- lt/tablegt
- lt/formgt
37JavaScript
- Goal Add functionality to the presentation tier.
- Sample applications
- Detect browser type and load browser-specific
page - Form validation Validate form input fields
- Browser control Open new windows, close existing
windows (example pop-up ads) - Usually embedded directly inside the HTML with
the ltSCRIPTgt lt/SCRIPTgt tag. - ltSCRIPTgt tag has several attributes
- LANGUAGE specifies language of the script (such
as javascript) - SRC external file with script code
- ExampleltSCRIPT LANGUAGEJavaScript
SRCvalidate.jsgtlt/SCRIPTgt
38JavaScript (Contd.)
- If ltSCRIPTgt tag does not have a SRC attribute,
then the JavaScript is directly in the HTML file. - ExampleltSCRIPT LANGUAGEJavaScriptgtlt!--
alert(Welcome to our bookstore)//--gtlt/SCRIPTgt - Two different commenting styles
- lt!-- comment for HTML, since the following
JavaScript code should be ignored by the HTML
processor - // comment for JavaScript in order to end the
HTML comment
39JavaScript (Contd.)
- JavaScript is a complete scripting language
- Variables
- Assignments (, , )
- Comparison operators (lt,gt,), boolean operators
(, , !) - Statements
- if (condition) statements else statements
- for loops, do-while loops, and while-loops
- Functions with return values
- Create functions using the function keyword
- f(arg1, , argk) statements
40JavaScript A Complete Example
- HTML Form
- ltform method"POST
- action"TableOfContents.jsp"gt
- ltinput type"text" name"userid"gt
- ltinput type"password" name"password"gt
- ltinput type"submit" value"Login
name"submit"gt - ltinput typereset valueCleargt
- lt/formgt
- Associated JavaScript
- ltscript language"javascript"gt
- function testLoginEmpty()
-
- loginForm document.LoginForm
- if ((loginForm.userid.value "")
- (loginForm.password.value ""))
-
- alert('Please enter values for userid and
password.') - return false
-
- else return true
-
- lt/scriptgt
41Stylesheets
- Idea Separate display from contents, and adapt
display to different presentation formats - Two aspects
- Document transformations to decide what parts of
the document to display in what order - Document rending to decide how each part of the
document is displayed - Why use stylesheets?
- Reuse of the same document for different displays
- Tailor display to users preferences
- Reuse of the same document in different contexts
- Two stylesheet languages
- Cascading style sheets (CSS) For HTML documents
- Extensible stylesheet language (XSL) For XML
documents
42CSS Cascading Style Sheets
- Defines how to display HTML documents
- Many HTML documents can refer to the same CSS
- Can change format of a website by changing a
single style sheet - ExampleltLINK RELstyle sheet TYPEtext/css
HREFbooks.css/gt - Each line consists of three parts
- selector property value
- Selector Tag whose format is defined
- Property Tags attribute whose value is set
- Value value of the attribute
43CSS Cascading Style Sheets
- Example style sheet
- body background-color yellow
- h1 font-size 36pt
- h3 color blue
- p margin-left 50px color red
- The first line has the same effect as
- ltbody background-coloryellowgt
44XSL
- Language for expressing style sheets
- More at http//www.w3.org/Style/XSL/
- Three components
- XSLT XSL Transformation language
- Can transform one document to another
- More at http//www.w3.org/TR/xslt
- XPath XML Path Language
- Selects parts of an XML document
- More at http//www.w3.org/TR/xpath
- XSL Formatting Objects
- Formats the output of an XSL transformation
- More at http//www.w3.org/TR/xsl/
45Todays Agenda
- Database System Architectures
- Presentation layer technologies
- Application layer technologies
- CGI, application servers, Servlets,
JavaServerPages, maintaining state
46Overview of the Middle Tier
- Recall Functionality of the middle tier
- Encodes business logic
- Connects to database system(s)
- Accepts form input from the presentation tier
- Generates output for the presentation tier
- We will cover
- CGI Protocol for passing arguments to programs
running at the middle tier - Application servers Runtime environment at the
middle tier - Servlets Java programs at the middle tier
- JavaServerPages Java scripts at the middle tier
- Maintaining state How to maintain state at the
middle tier. Main focus Cookies.
47CGI Common Gateway Interface
- Goal Transmit arguments from HTML forms to
application programs running at the middle tier - Details of the actual CGI protocol unimportant à
libraries implement high-level interfaces - Disadvantages
- The application program is invoked in a new
process at every invocation (remedy FastCGI) - No resource sharing between application programs
(e.g., database connections) - Remedy Application servers
48CGI Example
- HTML form
- ltform actionfindbooks.cgi methodPOSTgt
- Type an author name
- ltinput typetext nameauthorNamegt
- ltinput typesubmit valueSend itgt
- ltinput typereset valueClear formgt
- lt/formgt
- Perl code
- use CGI
- dataInnew CGI
- dataIn-gtheader()
- authorNamedataIn-gtparam(authorName)
- print(ltHTMLgtltTITLEgtArgument passing
testlt/TITLEgt) - print(The author name is authorName)
- print(lt/HTMLgt)
- exit
49Application Servers
- Idea Avoid the overhead of CGI
- Main pool of threads of processes
- Manage connections
- Enable access to heterogeneous data sources
- Other functionality such as APIs for session
management
50Application Server Process Structure
HTTP
Web Browser
Web Server
C Application
JavaBeans
JDBC
Application Server
DBMS 1
ODBC
DBMS 2
Pool of Servlets
51Servlets
- Java Servlets Java code that runs on the middle
tier - Platform independent
- Complete Java API available, including JDBC
- Example
- import java.io.
- import java.servlet.
- import java.servlet.http.
- public class ServetTemplate extends HttpServlet
- public void doGet(HTTPServletRequest
request, HTTPServletResponse response)throws
SerletExpection, IOException PrintWriter
outresponse.getWriter() - out.println(Hello World)
-
52Servlets (Contd.)
- Life of a servlet?
- Webserver forwards request to servlet container
- Container creates servlet instance (calls init()
method deallocation time calls destroy()
method) - Container calls service() method
- service() calls doGet() for HTTP GET or doPost()
for HTTP POST - Usually, dont override service(), but override
doGet() and doPost()
53Servlets A Complete Example
- public class ReadUserName extends HttpServlet
- public void doGet( HttpServletRequest request,
- HttpSevletResponse response)
- throws ServletException, IOException
- reponse.setContentType(text/html)
- PrintWriter outresponse.getWriter()
- out.println(ltHTMLgtltBODYgt\n ltULgt \n
- ltLIgt request.getParameter(userid) \n
- ltLIgt request.getParameter(password)
\n - ltULgt\nltBODYgtlt/HTMLgt)
-
- public void doPost( HttpServletRequest request,
- HttpSevletResponse response)
- throws ServletException, IOException
- doGet(request,response)
-
54Java Server Pages
- Servlets
- Generate HTML by writing it to the PrintWriter
object - Code first, webpage second
- JavaServerPages
- Written in HTML, Servlet-like code embedded in
the HTML - Webpage first, code second
- They are usually compiled into a Servlet
55JavaServerPages Example
- lthtmlgt
- ltheadgtlttitlegtWelcome to BNlt/titlegtlt/headgt
- ltbodygt
- lth1gtWelcome back!lt/h1gtlt String nameNewUser
- if (request.getParameter(username) ! null)
namerequest.getParameter(username) -
- gt
- You are logged on as user ltnamegt
- ltpgt
- lt/bodygt
- lt/htmlgt
56Maintaining State
- HTTP is stateless.
- Advantages
- Easy to use dont need anything
- Great for static-information applications
- Requires no extra memory space
- Disadvantages
- No record of previous requests means
- No shopping baskets
- No user logins
- No custom or dynamic content
- Security is more difficult to implement
57Application State
- Server-side state
- Information is stored in a database, or in the
application layers local memory - Client-side state
- Information is stored on the clients computer in
the form of a cookie - Hidden state
- Information is hidden within dynamically created
web pages
58Application State
59Server-Side State
- Many types of Server side state
- 1. Store information in a database
- Data will be safe in the database
- BUT requires a database access to query or
update the information - 2. Use application layers local memory
- Can map the users IP address to some state
- BUT this information is volatile and takes up
lots of server main memory
5 million IPs 20 MB
60Server-Side State
- Should use Server-side state maintenance for
information that needs to persist - Old customer orders
- Click trails of a users movement through a
site - Permanent choices a user makes
61Client-side State Cookies
- Storing text on the client which will be passed
to the application with every HTTP request. - Can be disabled by the client.
- Are wrongfully perceived as "dangerous", and
therefore will scare away potential site visitors
if asked to enable cookies1 - Are a collection of (Name, Value) pairs
1http//www.webdevelopersjournal.com/columns/state
ful.html
62Client State Cookies
- Advantages
- Easy to use in Java Servlets / JSP
- Provide a simple way to persist non-essential
data on the client even when the browser has
closed - Disadvantages
- Limit of 4 kilobytes of information
- Users can (and often will) disable them
- Should use cookies to store interactive state
- The current users login information
- The current shopping basket
- Any non-permanent choices the user has made
-
63Creating A Cookie
- Cookie myCookie
- new Cookie(username", jeffd")
- response.addCookie(userCookie)
- You can create a cookie at any time
64Accessing A Cookie
- Cookie cookies request.getCookies()
- String theUser
- for(int i0 iltcookies.length i)
- Cookie cookie cookiesi
- if(cookie.getName().equals(username))
- theUser cookie.getValue()
-
- // at this point theUser username
- Cookies need to be accessed BEFORE you set your
response header - response.setContentType("text/html")
- PrintWriter out response.getWriter()
65Cookie Features
- Cookies can have
- A duration (expire right away or persist even
after the browser has closed) - Filters for which domains/directory paths the
cookie is sent to - See the Java Servlet API and Servlet Tutorials
for more information
66Hidden State
- Often users will disable cookies
- You can hide data in two places
- Hidden fields within a form
- Using the path information
- Requires no storage of information because the
state information is passed inside of each web
page
67Hidden State Hidden Fields
- Declare hidden fields within a form
- ltinput typehidden nameuser
valueusername/gt - Users will not see this information (unless they
view the HTML source) - If used prolifically, its a killer for
performance since EVERY page must be contained
within a form.
68Hidden State Path Information
- Path information is stored in the URL request
- http//server.com/index.htm?userjeffd
- Can separate fields with an character
- index.htm?userjeffdpreferencepepsi
- There are mechanisms to parse this field in Java.
Check out the javax.servlet.http.HttpUtils
parserQueryString() method.
69Multiple state methods
- Typically all methods of state maintenance are
used - User logs in and this information is stored in a
cookie - User issues a query which is stored in the path
information - User places an item in a shopping basket cookie
- User purchases items and credit-card information
is stored/retrieved from a database - User leaves a click-stream which is kept in a log
on the web server (which can later be analyzed)
70Todays Agenda
- Extra non-examined material
- Parallel System
- Distributed System
- See RG ch 22 for more information
71Parallel Systems
- Parallel database systems consist of multiple
processors and multiple disks connected by a fast
interconnection network. - A coarse-grain parallel machine consists of a
small number of powerful processors - A massively parallel or fine grain parallel
machine utilizes thousands of smaller processors. - Two main performance measures
- throughput --- the number of tasks that can be
completed in a given time interval - response time --- the amount of time it takes to
complete a single task from the time it is
submitted
72Terminology
Ideal
- Speed-Up
- More resources means proportionally less time for
given amount of data. - Scale-Up
- If resources increased in proportion to increase
in data size, time is constant.
Xact/sec. (throughput)
degree of parallel
Ideal
sec./Xact (response time)
degree of parallel
73Factors Limiting Speedup and Scaleup
- Speedup and scale-up are often sub-linear due to
- Startup costs Cost of starting up multiple
processes may dominate computation time, if the
degree of parallelism is high. - Interference Processes accessing shared
resources (e.g.,system bus, disks, or locks)
compete with each other, thus spending time
waiting on other processes, rather than
performing useful work. - Skew Increasing the degree of parallelism
increases the variance in service times of
parallel executing tasks. Overall execution time
determined by slowest of parallel executing tasks.
74Parallel System Architectures
- Shared memory -- processors share a common memory
- Shared disk -- processors share a common disk
- Shared nothing -- processors share neither a
common memory nor common disk - Hierarchical -- hybrid of the above architectures
75Parallel System Architectures
76Shared Memory
- Processors and disks have access to a common
memory, typically via a bus or through an
interconnection network. - Extremely efficient communication between
processors - Data in shared memory can be accessed by any
processor without having to move it using
software. - Downside architecture is not scalable beyond 32
or 64 processors since the bus or the
interconnection network becomes a bottleneck - Widely used for lower degrees of parallelism (4
to 8).
77Shared Disk
- All processors can directly access all disks via
an interconnection network, but the processors
have private memories. - The memory bus is not a bottleneck
- Architecture provides a degree of fault-tolerance
if a processor fails, the other processors can
take over its tasks since the database is
resident on disks that are accessible from all
processors. - Examples IBM Sysplex and DEC clusters (now part
of Compaq) running Rdb (now Oracle Rdb) were
early commercial users - Downside bottleneck now occurs at
interconnection to the disk subsystem. - Shared-disk systems can scale to a somewhat
larger number of processors, but communication
between processors is slower.
78Shared Nothing
- Node consists of a processor, memory, and one or
more disks. Processors at one node communicate
with another processor at another node using an
interconnection network. A node functions as the
server for the data on the disk or disks the node
owns. - Examples Teradata, Tandem, Oracle-n CUBE
- Data accessed from local disks (and local memory
accesses) do not pass through interconnection
network, thereby minimizing the interference of
resource sharing. - Shared-nothing multiprocessors can be scaled up
to thousands of processors without interference. - Main drawback cost of communication and
non-local disk access sending data involves
software interaction at both ends.
79Hierarchical
- Combines characteristics of shared-memory,
shared-disk, and shared-nothing architectures. - Top level is a shared-nothing architecture
nodes connected by an interconnection network,
and do not share disks or memory with each other. - Each node of the system could be a shared-memory
system with a few processors. - Alternatively, each node could be a shared-disk
system, and each of the systems sharing a set of
disks could be a shared-memory system.
80Distributed DBMS
- Data is stored at several sites, each managed by
a DBMS that can run independently. - Distributed Data Independence Users should not
have to know where data is located (extends
Physical and Logical Data Independence
principles). - Distributed Transaction Atomicity Users should
be able to write transactions accessing multiple
sites just like local transactions.
81Recent Trends
- Users have to be aware of where data is located,
i.e., Distributed Data Independence and
Distributed Transaction Atomicity are not
supported. - These properties are hard to support efficiently.
- For globally distributed sites, these properties
may not even be desirable due to administrative
overheads of making the location of the data
transparent.
82Types of Distributed Databases
- Homogeneous Every site runs the same type of
DBMS. - Heterogeneous Different sites run different
DBMSs (different RDBMSs or even non-relational
DBMSs).
Gateway
DBMS1
DBMS2
DBMS3
83Storing Data
TID
- Fragmentation
- Horizontal
- Vertical
- Replication
- Gives increased availability
- Faster query evaluation
t1
t2
t3
t4
84Distributed Catalog Management
- Must keep track of how data is distributed across
sites. - Must be able to name each replica of each
fragment. - To preserve local autonomy
- ltlocal-name, birth-sitegt
- Site Catalog Describes all objects (fragments,
replicas) at a site Keeps track of replicas of
relations created at this site. - To find a relation, look up its birth-site
catalog. - Birth-site never changes, even if relation is
moved.
85Distributed Queries
SELECT AVG(S.age) FROM Sailors S WHERE S.rating gt
3 AND S.rating lt 7
- Horizontally Fragmented Tuples with rating lt 5
at Sydney, gt 5 at Melbourne. - Must compute SUM(age), COUNT(age) at both sites.
- If WHERE contained just S.ratinggt6, just one
site. - Vertically Fragmented sid and rating at Sydney,
sname and age at Melbourne, tid at both. - Must reconstruct relation by join on tid, then
evaluate the query. - Replicated Sailors copies at both sites.
- Choice of site based on local costs, shipping
costs.