Query Processing in Connectivity-Challenged Environments - PowerPoint PPT Presentation

About This Presentation
Title:

Query Processing in Connectivity-Challenged Environments

Description:

Title: Text Mining: An Overview Author: Sharma Last modified by: SC Created Date: 7/9/2004 6:12:24 PM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 21
Provided by: Shar3222
Learn more at: https://web.mst.edu
Category:

less

Transcript and Presenter's Notes

Title: Query Processing in Connectivity-Challenged Environments


1
Query Processing in Connectivity-Challenged
Environments
  • Priyanka Puri
  • Sharma Chakravarthy
  • Gururaj Poornima
  • Mohan Kumar
  • Information Technology Laboratory
  • Computer Science and Engineering Department
  • The University of Texas at Arlington, Arlington,
    TX 76009
  • Email sharma_at_cse.uta.edu
  • URL http//itlab.uta.edu/sharma

2
  • This effort is supported by AFRL under Contract
    Number FA8750-09-2-0199
  • Sanjay Madria and Raytheon (Waseem Naqvi) are
    also involved in this project

3
Query Processing
  • Has been addressed in the context of centralized
    DBMSs
  • Has been addressed in the context of distributed
    DBMSs
  • Cost-based plan generation is typically used
  • So, is there anything more/new to do?

4
UAV 4
UAV 2
UAV 3
UAV 1
UAV 5
  • Ground Controller 2

Ground Controller n
Ground Controller 1
5
  • Ground Controller 2

Ground Controller 1
6
Currently
  • Data is dumped into a central server and queried
  • Bandwidth, QoS issues are not addressed
  • No collaboration among nodes
  • No continuous query processing, notification,
    fusion, context usage, and real- or near
    real-time support

7
Proposed long-term Architecture
Limited Resources Mobility Heterogeneity Disconnec
tions
Network of computing nodes Unmanned vehicles,
Sensors, Robots, PCs , Servers, Ground
Controlling devices
Queries, Tasks, Requests, Continuous Queries
Publish/Subscribe

SOA Distributed Middleware Task planning Join
computation Composition pub/sub Context-aware N
otification Resource Management Data management
Context/ Knowledge Base
Fault Tolerance Services
Local fusion/Materialization
Publish Subscribe Capability
Query Capability
Raw Data / fused data /data from other nodes
8
  • Query Processing

9
MyObjects Table at each node
Timestamp Node_id Longitude Latitude Obj_type Obj_desc Object_ptr
8 bytes 4 bytes 4 bytes 4 bytes 8 chars Varchar (64) Pointer (8 bytes)


Total width 100 bytes

Cardinality (number of tuples) , Selectivity,
replication site of data are known (part of meta
data)
10
Query Plan Format
Operation 1 Param Operand1 Operand1 Loc Operand2 Operand 2 Location Result Name Result Loc
Operation 2 Param Operand1 Operand1 Loc Operand1 Operand2 Loc Result Name Result Loc

Operation n Param Operand1 Operand1 Loc Operand1 Operand2 Loc Result Name Result Loc
11
Operations in Plan format
Operation Param Operand1 Operand1 Loc Operand2 Operand 2 Loc Result Name Result Loc
Select A gt 100 R1 1 Null Null R1 1
Project A1, A3, A4 R1 1 Null Null R1 1
Move Null R1 1 Null Null R 2
Copy Null R1 1 Null Null R14 4
SemiJoin A C R 2 R2 2 SR1 2
Join B D R12 2 R2 2 JR1 2
12
Plan using Semijoin chains
  • SELECT c1 R1
  • MOVE R11 To Site2
  • SELECT c2 R2
  • SJ R11 R21 J1
  • MOVE J1 To Site3
  • SELECT c3 R3
  • SJ J1 R31 J2
  • MOVE J2 To Site2
  • SJ J2 R21 J3
  • MOVE J3 To Site1
  • SJ J3 R11 J4
  • COPY R To Site7 J
  • Total Cost 14720 32000 46720

R1 1000
R2 5000
R3 3000
1
2
3
select project
select project
select project
R213000
lat
R11800
R31600
long
J11200
J2240
Cost3200
Cost4800
long,nodeid
7
Cost1920
J31200
lat,nodeid
Cost4800
J4320
J
Cost32000
13
Semi-join/join plan generation
  • We are developing algorithms for generating the
    plan space and pruning it for generating best
    (or good) plan for each input query (expressed
    as a join query)
  • It is a cost-based algorithm based on System R
    and SDD approaches extended to include
    connectivity and bandwidth issues
  • The complexity of plan generation is kn n is
    number of joins and k is the number of
    alternatives for each join.
  • Assuming less than 5 joins in a query
  • Integrate replication into the algorithm

14
Plan Generation Alternatives
  • A Query Plan (QP) is a numbered sequence of
    operations for executing a Query
  • A QP includes how data is moved as part of
    execution
  • Plan generation alternatives
  • Static Plan generated once and executed in a
    distributed manner
  • Dynamic plan generated incrementally at each
    node as the query progresses using current
    connectivity information
  • Parallel plan partial plans are executed in
    parallel
  • Interactive plan get some estimate by asking
    nodes that has relevant data

15
Static plan
  • The physical plan generated will have node
    information for data propagation.
  • This will be mapped to actual connectivity by
    the physical layer for execution
  • It is possible that no connectivity exists by the
    time execution is performed for a generated query
    plan
  • In that case, either a new plan can be generated
    (using the same algorithm, but using current meta
    data) or an alternative approach can be used to
    incrementally modify the plan

16
Dynamic plan
  • Generate plan for the first join and defer the
    rest of the plan
  • Join plans are generated one at a time
  • Current connectivity information can be used
  • Result size estimation will also be more accurate
  • Query execution and (partial) plan generation are
    intertwined
  • Does not increase the complexity of plan
    generation or plan execution (compared to static)

17
Parallel plan
  • All local operations/computations (select,
    project, and even some joins) can be done in
    parallel
  • Join plans are still generated one at a time
  • Increases message/information exchange
  • Current connectivity information can be used
  • Result size estimation will also be more accurate
  • Deal with responses and plan generation and
    execution may be slightly more complicated than
    the previous cases

18
Interactive plan
  • When a query comes in, send out requests for
    local processing and get processing time and size
    information
  • Use the above to generate partial plans
  • Join plans are still generated using information
    obtained interactively
  • Increases message/information exchange
  • Current connectivity information can be used
  • Result size estimation will also be more accurate
  • Combines Dynamic and parallel execution in an
    interactive manner

19
Replication Issues
  • Algorithm for Replication
  • Single copy replication that minimizes the data
    transmission cost and maximizes the number of
    paths (to deal with connectivity)
  • Algorithm for Replication utilization
  • Given a replication, determine the utility of
    that replica in terms of query evaluation cost
    for a reasonable load
  • Reconcile the above two to come up with a
    replication strategy that balances the competing
    tradeoffs

20
Thank You !
Write a Comment
User Comments (0)
About PowerShow.com