Querying Large Databases - PowerPoint PPT Presentation

About This Presentation
Title:

Querying Large Databases

Description:

Querying Large Databases. Rukmini Kaushik. Purpose. Research ... well for large inputs. Sorting ... Uses extra hardware for constant size problem - Linear speed ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 25
Provided by: rukmini1
Learn more at: http://www.cs.bsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Querying Large Databases


1
Querying Large Databases
  • Rukmini Kaushik

2
Purpose
  • Research for efficient algorithms and software
    architectures of query engines.

3
Query Execution Engine Architecture
  • Query processing algorithms physical algebra
  • Data Model logical algebra

4
Sorting Hashing
  • Both are memory intensive.
  • Memory Concerns
  • - Merge Efficiency memory
  • management.
  • - Hash table overflow

5
Aggregation and Duplicate Removal
  • Aggregation Concept
  • Describes a set of objects with one value.
  • Algorithms
  • Three Types
  • - Nested Loops
  • - Sorting
  • - Hashing

6
Aggregation Duplicate Removal
  • Nested Loops
  • - Easiest of the three
  • - Doesnt work well for large inputs
  • Sorting
  • - Sort for common elements which results in a
    simple duplicate removal.
  • - Should remove duplicates as early as
    possible.

7
Aggregation Duplicate Removal
  • Hashing
  • - Hash on group attributes.
  • - Can perform duplicate removal when
    creating hash table.
  • Algorithm Analysis
  • Sorting and hashing functions are logarithmic
    with input size

8
Complex Query Execution Plan
  • Purpose
  • - To schedule a query with several operations
    optimally
  • Ideas
  • - Right-deep plans
  • - Left-deep plans

9
Complex Query Execution Plan
  • Prediction
  • - Use a decision tree of sub-plans
  • - Done by using choose-plan operators
  • Major Concern
  • - Optimal resource allocation

10
Parallel Query Execution Mechanism
  • Goal
  • Obtain speed-up scale-up
  • Speed-up
  • - Uses extra hardware for constant size
    problem
  • - Linear speed-up is optimal
  • - Can be expressed as parallel efficiency

11
Parallel Query Execution Mechanism
  • Scale-up
  • - Uses same resources with altered problem
    size
  • - Can be expressed as parallel efficiency.

12
Parallel Query Execution Mechanism
  • Parallel Vs Distributed Systems
  • Distributed
  • - Locally Autonomous
  • - Also uses Parallelism

13
Parallel Query Execution Mechanism
  • Parallel
  • - One center of control
  • - Three types
  • Shared memory
  • Shared Disk
  • Distributed Memory

14
Parallel Query Execution Mechanism
  • Three forms of parallelism
  • - Inter Query Servicing multiple requests at
    the same time
  • - Inter Operator Pipelining
  • - Intra Operator Execute a single operator
    in multiple processors

15
Parallel Query Execution Mechanism
  • Implementation
  • Bracket Models
  • Operator Models
  • Bracket Model
  • Goal Generic process template that receives
    and sends data and performs one operation at a
    time

16
Parallel Query Execution Mechanism
  • Number of inputs is limited to two
  • Can be run in parallel by having many
    templates in the system running simultaneously.
  • Operator Model
  • Goal Insert parallel operators in an ordered
    plan

17
Parallel Query Execution Mechanism
  • Uses the exchange operator
  • Exchange operator
  • - Does not manipulate data
  • - Provides capabilities for parallel query
    processing
  • - Changes a complex query into a single
    process

18
Parallel Algorithms
  • Idea More focus on algorithms and parallel
    execution
  • Parallel selections and updates
  • - Disk input and output should be made
    parallel
  • - Selection Maintain indices near stored
    data
  • - Updates Use keys for partitioning
    attributes

19
Parallel Algorithms
  • Parallel Sorting
  • -classified by
  • - number of parallel inputs
  • - number of parallel outputs
  • - Algorithms consists of local sort and a data
    exchange step

20
Parallel Algorithms
  • - Major Concern
  • - Deadlock can be avoided by using range
    partitioning
  • - having a sufficient size data exchange
    buffer
  • - using a modified sort algorithm

21
Query Optimization
  • Uses the differences between logical and physical
    aspects
  • Must keep track of the properties of the inputs
  • Cost models focus on throughput measures

22
Tuning query performance
  • Focus
  • - Guidelines for improving query performance
  • Guidelines for three points of view
  • - implementor and vendor
  • - database administrator
  • - application programmer

23
Tuning Query Performance
  • Implementor
  • System should support indexing and clustering
  • Query optimizer should be reliable and
    accurate
  • Administrator
  • Ensure usage of system facilities

24
Tuning Query Performance
  • carefully choose physical database design
  • provide available and efficient processing
    resources
  • Application Programmer
  • Provide high level queries
Write a Comment
User Comments (0)
About PowerShow.com