R: An Overview of the Architecture - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

R: An Overview of the Architecture

Description:

Local site maintains objects in its database. Catalog entry may be ... HORIZONTALLY. VERTICALLY. REPLICATED. DEFINE SNAPSHOT. REFRESH SNAPSHOT. MIGRATE TABLE ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 33
Provided by: tanan3
Category:

less

Transcript and Presenter's Notes

Title: R: An Overview of the Architecture


1
R An Overview of the Architecture
  • R. Williams, et al
  • IBM Almaden Research Center

2
Outline
  • Environment and Data Definitions
  • Object Naming
  • Distributed Catalogs
  • Transaction Management and Commit Protoctols
  • Query Preparation
  • Query Execution
  • SQL Additions and Changes

3
Environment and Data Definitions
  • CICS as the underlying communication model
  • Data distribuion
  • Dispersed
  • Replicated
  • Partitioned
  • Horizontal
  • vertical
  • Snapshot

4
Figure 1 from paper
5
Figure 21.4 from CS 432 text
6
Object Naming
  • System Wide Names (SWN)
  • USER _at_ USER_SITE.OBJECT_NAME _at_ BIRTH_SITE

7
Distributed Catalogs
  • Local site maintains objects in its database
  • Catalog entry may be cached
  • Entries are versioned

8
Transaction Management and Commit Protocol
  • Transaction number
  • SITE.SEQ_NUM (or SITE.TIME)
  • Two phase commit (2PC)

9
Query Preparation
  • Name resolution
  • Authorization check
  • Distributed compilation
  • Global plan generation/optimization
  • Local access path selection
  • Local optimization
  • Local view materialization

10
Figure 2 from paper
11
Cost Model
  • 3 weighted components
  • I/O
  • CPU
  • Message
  • of messages sent
  • of bytes sent

12
Query Execution
  • Synchronous vs asynchronous execution
  • Distributed concurrency control
  • Deadlock detection and resolution
  • Crash recovery

13
Figure 3 from paper
14
SQL Additions and Changes
  • DEFINE SYNONYM
  • DISTRIBUTE TABLE
  • HORIZONTALLY
  • VERTICALLY
  • REPLICATED
  • DEFINE SNAPSHOT
  • REFRESH SNAPSHOT
  • MIGRATE TABLE

15
R Optimizer Validation and Performance
Evaluation for Distributed Queries
  • Lothar F. Mackert
  • Guy M. Lohman
  • IBM Almaden Research Center

16
Outline
  • Distributed Compilation/Optimization
  • Instrumentation
  • Experiments and Results

17
Distributed Compilation/Optimization
  • Issues
  • Join site
  • Transfer methods
  • ship whole
  • fetch matches
  • Cost model

18
Weights Estimation
  • CPU inverse of MIPS
  • I/O avg seek, latency, transfer time
  • MSG of instruction per msg
  • BYTE effective transmission speed of network

19
Figure 2 from paper
20
Instrumentation
  • Distributed EXPLAIN
  • Distributed COLLECT COUNTERS
  • Force optimizier

21
Experiment I
  • Transfer method
  • Merge-scan join of 2 tables
  • 500 tuples in each table
  • Project both table 50
  • 100 different values for join attribute
  • Join result 2477 tuples

22
Figure 4 from paper
23
Figure 3 from paper
24
Experiment II
  • Distributed vs local join
  • Join of 2 tables
  • 1000 tuples in each table
  • Project both table 50
  • 3000 different values for join attribute

25
Figure 5 from paper
26
Figure 6 from paper
27
Experiment III
  • Relative importance of cost components

28
Figure 7, 8, 9, 10 from paper
29
Experiment IV
  • Optimizer evaluation
  • Accurate estimates of of msgs and bytes sent
    (lt2 difference)
  • Better estimates when tables are more distributed

30
Experiment V
  • Alternative distributed join methods
  • Dynamically created indexes
  • Semijoins
  • Bloomjoins
  • 2 tables
  • 1000 tuples for outer
  • Varies inner from 100 to 6000 tuples

31
Figure 11, 12 from paper
32
Other Experiments
  • Clustered index
  • Bloomjoins lt Semijoins lt R
  • 50 Projection
  • Site 1 Bloomjoins lt Semijoins lt R
  • Site 2 Bloomjoins lt R ltlt Semijoins
  • Wider join column
  • Bloomjoins lt R ltlt Semijoins
Write a Comment
User Comments (0)
About PowerShow.com