Telegraph: Ideas - PowerPoint PPT Presentation

About This Presentation
Title:

Telegraph: Ideas

Description:

Translate query into French via BabelFish. Find a French search engine, restrict domains to .fr. Fetch matches and translate back to English via BabelFish ... – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 22
Provided by: joehell
Learn more at: https://dsf.berkeley.edu
Category:

less

Transcript and Presenter's Notes

Title: Telegraph: Ideas


1
Telegraph Ideas Status
2
Overview
  • Folks
  • Amol Deshpande, Mohan Lakhamraju, VijayShankar
    Raman
  • Rob von Behren, Steve Gribble, Matt Welsh
  • Kris Hildrum
  • Hellerstein, Franklin, Brewer, Papadimitriou (ITR
    team)
  • Roots
  • Regres think-tank
  • Carey, Hellerstein, Stonebraker, 1998-99
  • CONTROL project (Online Aggregation, etc.)
  • UC Berkeley 96-present
  • Query Scrambling
  • Franklin Urhan, UMD
  • Inktomi experiences
  • Jaguar (Welsh Culler)

3
Telegraph Goals
  • Query all the data in the world
  • ITR internet sources and services
  • Endeavour sensors
  • Also shared-nothing DBMS done better
  • Unify and redesign storage engines
  • DBMS, HTTP server, cluster-based FS
  • Reject multi-threading in favor of event-flow
    state machines
  • storage manager a query plan over events
  • Cluster-centric recovery scheme

4
Today
  • Status on storage manager
  • Event flow and state machines
  • Simplified transactional API
  • Experiences with Jaguar
  • Status
  • Continuously adaptive dataflow
  • Eddies Rivers
  • Applications to event flow storage mgr is a
    dataflow plan too
  • Open Questions

5
State Machines
  • Web servers/proxies, cache consistency HW use
    FSMs
  • Order 100x-1000x more concurrent clients than
    threads allow
  • One thread per concurrent HW activity
  • FSMs for multiplexing threads on connections
  • Thesis apply query plan technology to state
    machines
  • We understand data flow
  • Optimization composition of FSMs
  • MS Research Pipeline Server
  • State machine gives better cache locality
    (old-fashioned DB batching of I/O on chip!)
  • A theme in the TinyOS research too

6
Gribble I/O Core (v6!!)
SN_ Hashtable (fsm)
stub
buffer cache (fsm)
SN_ Hashtable (fsm)
stub
lock mngr (fsm)
HT work queue
thread boundary
thread boundary
7
Mohan API for Xact Recovery
Read
Lock
Update
Unlock
Scan
Begin
Commit
Abort
Deadlock Detect
Pin
Readaction Updateaction
Unpin
Recoveryaction
Flush
Commit/Abort-action
8
Jaguar
  • Two Basic Features
  • Rather than JNI, map certain bytecodes to inlined
    assembly code
  • Do this judiciously, maintaining type safety!
  • Pre-Serialized Objects (PSOs)
  • Can lay down a Java object container over an
    arbitrary VM range outside Javas heap.
  • With these, you have everything you need
  • Inlining and PSOs allow direct user-level access
    to network buffers, disk device drivers, etc.
  • PSOs allow buffer pool to be pre-allocated, and
    tuples I the pool to be pointed at
  • Matt Welsh

9
Storage Manager Status
  • Working!
  • Transactions and recovery too
  • Gribbles hashtable indexes currently dont talk
    to Mohans stuff
  • Complete version and numbers for VLDB 2000,
    mid-February
  • Lessons
  • Debugger support for state machine development
    needed
  • Thinking about where to multiplex and queue in a
    state machine is NOT EASY (but were learning)
  • Jaguar isnt quite there yet
  • e.g. GC control
  • But were getting there
  • Need to keep Welsh and Culler aboard

10
Query Processing Challenges
  • The world is a messy place
  • performance varies widely over time
  • River lessons on NOW (NowSort experience)
  • Internet
  • MEMS for sure!
  • performance metadata usually unavailable or wrong
  • no runstats on the web
  • Users are unpredictable
  • want to get early answers, control queries as
    they run
  • Plus Mariposa/Millenium-esque issues
  • local autonomy, costs for access, etc.

11
ITR Example Scenario
  • What do the French think about farm subsidies?
  • How would you do this on the web today?
  • Translate query into French via BabelFish
  • Find a French search engine, restrict domains to
    .fr
  • Fetch matches and translate back to English via
    BabelFish
  • Feed to a text summarizer like NetSumm

12
Behavior Along the Way
  • Speed changes
  • Site that was fast suddenly slows down
  • Behavior changes
  • Site that was returning few answers starts
    returning lots (selectivity)
  • Failures
  • Site wont respond. Choose an alternate server.
  • Ordering affects answers
  • summarize then translate? Or vice versa?

13
Standard Query Engine Wont Cut It
  • Cant adapt while running
  • need a continuous query optimizer
  • need to handle midstream failover
  • Reload, alternate sites
  • Uses the wrong QP algorithms
  • Cant produce incremental results
  • need CONTROL-based dataflow algorithms
  • Cant understand cost/quality tradeoffs
  • maybe Id settle for something cheesier if it
    went faster -- e.g. use an English search engine
    in the US

14
QP Framework Eddies
Avnur Hellerstein SIGMOD 2000
  • Need an adaptive query processor
  • respond to changes mid-stream
  • Eddy
  • a pipelining object router
  • works well with ops that have
  • frequent moments of symmetry
  • adjusts flow adaptively
  • objects flow in different orders
  • visit each op once before output
  • simple policy for routing
  • never give out a new object if theres a used one

15
Simple Eddies Learn Input Rates
  • Two single-table, unchanging filters
  • one fast, one slow
  • both have same probability of output
    (selectivity)
  • most tuples visit the fast op first
  • policy finite queues result in back pressure
  • slow op almost always finds a used tuple from
    fast op
  • fast op rarely finds a used tuple

16
Simple Eddies Output Rate
  • Again, two single-table static filters
  • one low probability of output, one high
  • equal costs
  • Back-pressure slightly worse than random
  • low-probability should be favored
  • but it is more likely to find used tuples

17
An Aside n-Arm Bandits
  • A little machine learning problem
  • Each arm pays off differently
  • Explore? Or Exploit?
  • Sometimes want to randomly choose an arm
  • Usually want to go with the best
  • If probabilities are static, dampen exploration
    over time

18
Learning Eddies
  • Tuple routing is basically a bandit problem
  • which operator should I choose next?
  • Complicated by back pressure
  • Bandit problems queueing theory
  • Lottery Scheduling implementation
  • Each operator starts with k tickets
  • When multiple operators request a tuple, hold a
    lottery holder of winning ticket gets it
  • When an operator takes a tuple, it earns a ticket
  • When an operator produces a tuple, it is charged
    a ticket
  • Works well in practice for some things
  • Problems with delayed sources joins
  • Kris Hildrum studying formal proofs of
    convergence
  • Ticket policy needs work. Mechanism looks robust.

19
Open Eddy Questions
  • Eddy addresses the operator ordering problem
  • Remaining problems
  • operator choice (hash join or index join?)
  • source choice, overlap, failover Ninja?
  • delayed sources
  • short jobs
  • resource mgmt (memory allocation)
  • distributed work and parallelism
  • Sensor (i.e. sequence) operations
  • What changes when data-ordering matters?
  • What are the ops for sensors?
  • Streaming media?
  • Objects not discretely differentiated??

20
Putting it together
  • Current eddy/river in C
  • Prototypes in Java, but not state machines
  • Probably do a rewrite in state machine format
  • Thesis every piece of the system is a query
    plan
  • Apply eddies to event routing in the storage
    manager?
  • To network protocol?

21
Cross-pollination
  • Telegraph QP and Ninja Paths
  • DB, IStore, and OceanStore students looking at
    adaptive storage location
  • OceanStore orthogonal to Telegraph storage
    manager? But lets combine!
  • DB and Istore efforts apply to clusters
  • MEMS and sensors
  • As soon as eddy/river rewrite done, we need to
    look at sensor apps and ops
  • TinyOS
  • Good state machine lessons at the boundary
  • Data flow between the devices??
  • Negotiation
  • Eddies and pricing fits into this! I.e. we have
    the infrastructure for dynamic pricing and
    re-routing on the way.
Write a Comment
User Comments (0)
About PowerShow.com