Title: Querying Large Databases
1Querying Large Databases
2Purpose
- Research for efficient algorithms and software
architectures of query engines.
3Query Execution Engine Architecture
- Query processing algorithms physical algebra
- Data Model logical algebra
4Sorting Hashing
- Both are memory intensive.
- Memory Concerns
- - Merge Efficiency memory
- management.
- - Hash table overflow
5Aggregation and Duplicate Removal
- Aggregation Concept
- Describes a set of objects with one value.
- Algorithms
- Three Types
- - Nested Loops
- - Sorting
- - Hashing
-
6Aggregation Duplicate Removal
- Nested Loops
- - Easiest of the three
- - Doesnt work well for large inputs
- Sorting
- - Sort for common elements which results in a
simple duplicate removal. - - Should remove duplicates as early as
possible.
7Aggregation Duplicate Removal
- Hashing
- - Hash on group attributes.
- - Can perform duplicate removal when
creating hash table. - Algorithm Analysis
- Sorting and hashing functions are logarithmic
with input size
8Complex Query Execution Plan
- Purpose
- - To schedule a query with several operations
optimally - Ideas
- - Right-deep plans
- - Left-deep plans
-
9Complex Query Execution Plan
- Prediction
- - Use a decision tree of sub-plans
- - Done by using choose-plan operators
- Major Concern
- - Optimal resource allocation
10Parallel Query Execution Mechanism
- Goal
- Obtain speed-up scale-up
- Speed-up
- - Uses extra hardware for constant size
problem - - Linear speed-up is optimal
- - Can be expressed as parallel efficiency
11Parallel Query Execution Mechanism
- Scale-up
- - Uses same resources with altered problem
size - - Can be expressed as parallel efficiency.
12Parallel Query Execution Mechanism
- Parallel Vs Distributed Systems
- Distributed
- - Locally Autonomous
- - Also uses Parallelism
-
13Parallel Query Execution Mechanism
- Parallel
- - One center of control
- - Three types
- Shared memory
- Shared Disk
- Distributed Memory
-
14Parallel Query Execution Mechanism
- Three forms of parallelism
- - Inter Query Servicing multiple requests at
the same time - - Inter Operator Pipelining
- - Intra Operator Execute a single operator
in multiple processors
15Parallel Query Execution Mechanism
- Implementation
- Bracket Models
- Operator Models
- Bracket Model
- Goal Generic process template that receives
and sends data and performs one operation at a
time
16Parallel Query Execution Mechanism
- Number of inputs is limited to two
- Can be run in parallel by having many
templates in the system running simultaneously. - Operator Model
- Goal Insert parallel operators in an ordered
plan
17Parallel Query Execution Mechanism
- Uses the exchange operator
- Exchange operator
- - Does not manipulate data
- - Provides capabilities for parallel query
processing - - Changes a complex query into a single
process
18Parallel Algorithms
- Idea More focus on algorithms and parallel
execution - Parallel selections and updates
- - Disk input and output should be made
parallel - - Selection Maintain indices near stored
data - - Updates Use keys for partitioning
attributes -
19Parallel Algorithms
- Parallel Sorting
- -classified by
- - number of parallel inputs
- - number of parallel outputs
- - Algorithms consists of local sort and a data
exchange step -
20Parallel Algorithms
- - Major Concern
- - Deadlock can be avoided by using range
partitioning - - having a sufficient size data exchange
buffer - - using a modified sort algorithm
21Query Optimization
- Uses the differences between logical and physical
aspects - Must keep track of the properties of the inputs
- Cost models focus on throughput measures
22Tuning query performance
- Focus
- - Guidelines for improving query performance
- Guidelines for three points of view
- - implementor and vendor
- - database administrator
- - application programmer
23Tuning Query Performance
- Implementor
- System should support indexing and clustering
- Query optimizer should be reliable and
accurate - Administrator
- Ensure usage of system facilities
24Tuning Query Performance
- carefully choose physical database design
- provide available and efficient processing
resources - Application Programmer
- Provide high level queries