ICS 214B: Transaction Processing and Distributed Data Management - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

ICS 214B: Transaction Processing and Distributed Data Management

Description:

Internet London Payroll app Annual Bonus app London New York Payroll app New York Hong Kong Payroll app Hong Kong London Emp NY Emp HK Emp Distribution provides ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 57
Provided by: Chen154
Learn more at: https://www.ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: ICS 214B: Transaction Processing and Distributed Data Management


1
ICS 214B Transaction Processing and Distributed
Data Management
  • Lecture 8 Distributed Database Systems
  • Professor Chen Li

2
So far Centralized DB systems
  • Software

P
M
...
  • Simplifications
  • single front end
  • one place to keep locks
  • if processor fails, system fails, ...

3
Next distributed database systems
  • Multiple processors ( memories)
  • Heterogeneity and autonomy of components

4
Why do we need Distributed Databases?
  • Example Big Corp. has offices in London, New
    York, and Hong Kong.
  • Employee data
  • EMP(ENO, NAME, TITLE, SALARY, )
  • Where should the employee data table reside?

5
Big Corp. Data Access Pattern
  • Mostly, employee data is managed at the office
    where the employee works
  • E.g., payroll, benefits, hire and fire
  • Periodically, Big Corp needs consolidated access
    to employee data
  • E.g., Big Corp. changes benefit plans and that
    affects all employees.
  • E.g., Annual bonus depends on global net profit.

6
New York Payroll app
London Payroll app
EMP
London
New York
Hong Kong Payroll app
Problem NY and HK payroll apps run very slowly!
Hong Kong
7
New York Payroll app
London Payroll app
London Emp
NY Emp
London
New York
Hong Kong Payroll app
Much better!!
Hong Kong
HK Emp
8
New York Payroll app
London Payroll app
Annual Bonus app
London Emp
NY Emp
London
New York
Hong Kong Payroll app
Distribution provides opportunities for parallel
execution
Hong Kong
HK Emp
9
New York Payroll app
London Payroll app
Annual Bonus app
London Emp
NY Emp
London
New York
Hong Kong Payroll app
Hong Kong
HK Emp
10
New York Payroll app
London Payroll app
Annual Bonus app
Lon, NY Emp
NY, HK Emp
London
New York
Hong Kong Payroll app
Replication improves availability
Hong Kong
HK, Lon Emp
11
Heterogeneity and Autonomy

Application
Files
Stock ticker tape
RDBMS
Portfolio
History of dividends, ratios,...
12
  • We will study data management
  • with multiple processors and possible
  • autonomy, heterogeneity
  • Impact on
  • Data organization
  • Query processing
  • Access structures
  • Concurrency control
  • Recovery

13
  • We also study transaction monitors
  • Coordinate transaction execution
  • Multiple DBMSs
  • High performance
  • Have workflow facilities
  • Manage communications with client terminals

14
DB architectures
  • (1) Shared memory

P
P
P
...
M
15
DB architectures
  • (2) Shared disk

...
M
M
...
16
DB architectures
  • (3) Shared nothing

...
17
DB architectures
  • (4) Hybrid example Hierarchical or Clustered

18
Issues for selecting architecture
  • Reliability
  • Scalability
  • Geographic distribution of data
  • Data clusters
  • Performance
  • Cost

19
Parallel or distributed DB system?
  • More similarities than differences!

20
  • Typically, parallel DBs
  • Fast interconnect
  • Homogeneous software
  • High performance is goal
  • Transparency is goal

21
  • Typically, distributed DBs
  • Geographically distributed
  • Data sharing is goal (may run into
  • heterogeneity, autonomy)
  • Disconnected operation possible

22
Distributed Database Challenges
  • Distributed Database Design
  • Deciding what data goes where
  • Depends on data access patterns of major
    applications
  • Two subproblems
  • Fragmentation partition tables into fragments
  • Allocation allocate fragments to nodes

23
Distributed Database Challenges
  • Distributed Query Processing
  • Centralized query plan goal minimize number of
    disk I/Os
  • Additional factors in distributed scenario
  • Communication costs
  • Opportunity for parallelism
  • Space of possible query plans is much larger!

24
Distributed Database Challenges
  • Distributed Concurrency Control
  • Transactions span nodes
  • Must be globally serializable
  • Two main approaches
  • Locking
  • Timestamps
  • Distributed Deadlock Management
  • Multiple data copies need to be kept in sync
    when updates occur

25
Distributed Database Challenges
  • Reliability of Distributed Databases
  • Centralized database failure model
  • processor fails
  • Distributed database failure model
  • One or more processors may fail
  • Network may fail
  • Network may be partitioned
  • Data must be kept in sync

26
? To illustrate synchronization problems
Two Generals Problem
27
The one general problem (Trivial!)
  • ? Battlefield

G
Troops
28
The two general problem
  • lt-------------------------------gt

Blue army
Red army
Enemy
Blue G
Red G
messengers
29
Rules
  • Blue and red army must attack at same time
  • Blue and red generals synchronize through
    messengers
  • Messengers can be lost

30
Distributed Database Challenges
  • Heterogeneity

Application
Files
Stock ticker tape
RDBMS
Portfolio
History of dividends, ratios,...
31
Distributed Database Challenges
  • Autonomy
  • Example unable to get statistics
  • for query optimization
  • Example blue general may have mind of his (or
    her) own!

32
Next Topic
  • Distributed DB Design

33
Distributed DB Design
  • Top-down approach
  • - have DB
  • - how to split and allocate the sites
  • Bottom-up approach
  • - multi-database (possibly heterogeneous,
    autonomous)
  • - no design issues!

34
Two issues in DDB design
  • Fragmentation
  • Allocation
  • Note issues not independent,
  • but will cover separately

35
  • Employee relation E (,name,loc,sal,)
  • 40 of queries 40 of queries
  • Qa select Qb select
  • from E from E
  • where locSa where locSb
  • and and ...

36
  • NM Loc Sal
  • E

5
Sa
10
Joe
7
Sally
Sb
25
8
Tom
Sa
15
..
..
F
NM Loc Sal
NM Loc Sal
5
Sa
10
Joe
7
Sb
25
Sally
..
8
Tom
Sa
15
..
At Sb
At Sa
37
  • F F1, F2
  • F1 ?locSa(E) F2 ?locSb(E)

? called primary horizontal fragmentation
38
Fragmentation
  • Horizontal Primary
  • depends on local attributes
  • R Derived
  • depends on foreign relation
  • Vertical
  • R

39
Three common horizontal fragmentation techniques
  • Round robin
  • Hash partitioning
  • Range partitioning

40
Round robin
  • R D0 D1 D2
  • t1 t1
  • t2 t2
  • t3 t3
  • t4 t4
  • ... t5
  • Evenly distributes data
  • Good for scanning full relation
  • Not good for point or range queries
  • Not suitable for databases distributed over WAN

41
Hash partitioning
  • R D0 D1 D2
  • t1?h(k1)2 t1
  • t2?h(k2)0 t2
  • t3?h(k3)0 t3
  • t4?h(k4)1 t4
  • ...
  • Good for point queries on key also for joins on
    key
  • Not good for range queries point queries not on
    key
  • If hash function good, even distribution
  • Not suitable for databases distributed over a WAN

42
Range partitioning
  • R D0 D1 D2
  • t1 A5 t1
  • t2 A8 t2
  • t3 A2 t3
  • t4 A3 t4
  • ...

partitioning vector
4
7
V0 V1
  • Good for point queries on A also for joins on A
  • Good for some range queries on A
  • Need to select good vector else unbalanced
  • data skew, execution skew

43
Which are good fragmentations?
  • Example
  • F F1, F2
  • F1 ? sallt10 E F2 ? salgt20 E

? Problem Some tuples lost!
44
Which are good fragmentations?
Second example F F3, F4 F3 ? sallt10 E
F4 ? salgt5 E
? Tuples with 5 lt sal lt 10 are duplicated...
45
  • Better design
  • Example F F5, F6, F7
  • F5 ?sal ? 5 E F6 ?5ltsallt10 EF7
    ?sal ? 10 E
  • ? Then replicate F6 if convenient
  • (part of allocation problem)

46
Desired properties for fragmentation
  • R ? F F1, F2, , Fn
  • Completeness
  • For every data item x ? R, ? Fi?F such that
    x?Fi
  • Disjointness
  • ?x?Fi, ?? Fj such that x?Fj, i ? j
  • Reconstruction
  • There is function g such that
  • R g(F1, F2, , Fn)

47
Desired properties for horizontal fragmentation
  • R ? F F1, F2, , Fn
  • Completeness
  • For every tuple t?R, ? Fi?F such that t?Fi
  • Disjointness
  • ?t?Fi, ?? Fj such that t?Fj, i ? j
  • Reconstruction can safely ignore
  • Completeness ? R

48
How do we get completeness and disjointness?
  • (1) Check it manually!
  • e.g., F1 ?sallt10 E F2 ?sal?10 E

49
How do we get completeness and disjointness?
  • (2) Automatically generate fragments
  • with these properties
  • Horizontal fragments are defined by selection
    predicates
  • Generate a set of selection predicates with the
    desired properties

50
Example of generation
  • Say queries use predicates
  • Alt10, Agt5, Loc SA, Loc SB
  • Next - generate minterm predicates
  • - eliminate useless ones
  • Given simple predicates Pr p1, p2,.. pn
  • minterm predicates are of the form
  • p1 ? p2 ? ? pn
  • where pk is pk or is pk

51
Minterm predicates (part I)
  • (1) Alt10 ? Agt5 ? LocSA ? LocSB
  • (2) Alt10 ? Agt5 ? LocSA ? (LocSB)
  • (3) Alt10 ? Agt5 ? (LocSA) ? LocSB
  • (4) Alt10 ? Agt5 ? (LocSA) ? (LocSB)
  • (5) Alt10 ? (Agt5) ? LocSA ? LocSB
  • (6) Alt10 ? (Agt5) ? LocSA ? (LocSB)
  • (7) Alt10 ? (Agt5) ? (LocSA) ? LocSB
  • (8) Alt10 ? (Agt5) ? (LocSA) ? (LocSB)

52
Minterm predicates (part II)
  • (9) (Alt10) ? Agt5 ? LocSA ? LocSB
  • (10) (Alt10) ? Agt5 ? LocSA ?(LocSB)
  • (11) (Alt10) ? Agt5 ?(LocSA) ? LocSB
  • (12) (Alt10) ? Agt5 ?(LocSA) ?(LocSB)
  • (13) (Alt10) ?(Agt5) ? LocSA ? LocSB
  • (14) (Alt10) ?(Agt5) ? LocSA ?(LocSB)
  • (15) (Alt10) ?(Agt5) ?(LocSA) ? LocSB
  • (16) (Alt10) ?(Agt5) ?(LocSA) ?(LocSB)

53
Final fragments
  • F2 5 lt A lt 10 ? LocSA
  • F3 5 lt A lt 10 ? LocSB
  • F6 A ? 5 ? LocSA
  • F7 A ? 5 ? LocSB
  • F10 A ? 10 ? LocSA
  • F11 A ? 10 ? LocSB

54
Note elimination of useless fragments depends on
application semantics
  • e.g. if LOC could be ? SA, ? SB, we need to
    add fragments
  • F4 5 ltA lt10 ? Loc ? SA ? Loc ? SB
  • F8 A ? 5 ? Loc ? SA ? Loc ? SB
  • F12 A ? 10 ? Loc ? SA ? Loc ? SB

55
Why does this algorithm work?
  • Must prove that the set of fragments is
  • Complete
  • Disjoint

56
Summary
  • Given simple predicates Pr p1, p2,.. pn
  • minterm predicates are
  • Mm m ? pk, 1 ? k ?n
  • where pk is pk or is pk

pk?Pr
  • Fragments ?m R for all m ? M are
  • complete and disjoint
Write a Comment
User Comments (0)
About PowerShow.com