Yanlei Diao - PowerPoint PPT Presentation

About This Presentation
Title:

Yanlei Diao

Description:

Title: Learning Based Web Query Processing Author: dom Last modified by: Yanlei Diao Created Date: 4/8/2000 11:17:47 AM Document presentation format – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 23
Provided by: Dom193
Category:

less

Transcript and Presenter's Notes

Title: Yanlei Diao


1
Towards an Internet-Scale XML Dissemination
Service
  • Yanlei Diao
  • Shariq Rizvi
  • Michael J. Franklin
  • EECS, U.C. Berkeley

2
Outline
  • XML dissemination services
  • System model
  • Core techniques
  • Status and conclusions

3
Applications of XML Dissemination
  • News feeds via RSS (Really Simple Syndication)
  • My Yahoo! updated headlines from BBC, CNet, NPR.
  • Mobile services
  • Mobile operators connect content providers with
    millions of clients running a multitude of
    operating systems.
  • Stock tickers
  • QuoteMedia fast access to real-time and
    historical stock data.
  • Online auctions
  • freebidingtools.com create your own feed for
    your favorite eBay search.
  • Network monitoring
  • Ganglia a distributed monitoring system for
    clusters and grids.

4
YFilter An XML Dissemination Service
YFilter
  • User queries Specification of data interests,
    written in an XML query language.
  • Data sources Continuously publish XML data
    items.
  • The service Delivers to each user the XML data
    items that match her data interests the
    delivered results are presented in a customized
    format.

5
ONYX Large-Scale XML Dissemination
  • ONYX
  • Operator Network using YFilter for XML
    Dissemination

YFilter
  • An overlay network of information brokers running
    YFilter.
  • Underlying infrastructures
  • A dedicated network
  • Peer-to-peer
  • Collaboration among administrative domains

6
Design Space Expressiveness
  • Expressiveness data model query language a
    service supports
  • Subject-based
  • Messages a subject label
  • Queries a specific label or a wildcard
  • Predicate-based
  • Messages attribute-value pairs
  • Queries a set of predicates
  • XML filtering
  • Messages XML
  • Queries subset of XPath 1.0
  • XML filtering and transformation
  • Messages XML
  • Queries subset of XQuery

7
Design Space Why Distributed Processing?
  • Privacy
  • Regulations e.g., CA Senate Bill No. 1386.
  • Policies e.g., customers data stay behind the
    firewall.
  • Locality of data interests
  • Disseminate regional data directly to local
    subscribers.
  • Scalability
  • Data volume number of messages per second up to
    thousands, message size from 1 KB to 20 KB.
  • Query population up to millions.
  • Frequency of query updates from a daily basis to
    every few minutes.
  • Result Volume can amplify the input data volume
    by a large factor.

8
Related Systems
9
Content of the Paper
  • Content-driven routing
  • Need to handle both structural and value-based
    constraints.
  • Leverage YFilter NFA-based operator networks,
    distributed construction.
  • Filtering power of routing (i.e., fraction of
    messages filtered)
  • Filtering power can be inherently limited.
  • Use query partitioning (if possible) to improve
    it.
  • Distributed transformation
  • Currently either at the publishers side or at
    the edge brokers.
  • Perform cascading message transformation during
    routing.
  • Efficient XML transmission
  • Verbosity of XML, and XML parsing at each routing
    step.
  • Investigate different XML formats for XML
    transmission.
  • Detailed architectural design
  • Other optimization techniques

10
Outline
  • XML dissemination services
  • System model
  • Core techniques
  • Status and conclusions

11
Operations on Data/Query flows
a transformation query
12
System Tasks on Data/Query Planes
  • Processing planes query plane and data plane

Planes System Tasks Query Plane Data Plane
Content-driven routing
Incremental transformation
Final query processing
13
Outline
  • XML dissemination services
  • System model
  • Core techniques
  • Status and conclusions

14
Routing Table Design
  • A routing table mapping from output links to
    routing queries.
  • a routing query the data interests of queries
    down from an output link.
  • data interest of a query XPath expressions, for
    and where clauses of FLWOR expressions.
  • Routing table design
  • a canonical form of routing queries
  • a representation of routing tables and
  • an algorithm constructing them from a distributed
    query population.
  • Two (conflicting) goals
  • High filtering power of routing
  • Fraction of messages filtered in routing.
  • High routing efficiency
  • Number of messages routed per second.

15
YFilter Basics
  • An XML filtering and transformation engine that
    processes multiple queries in a shared fashion.
  • A Non-Deterministic Finite Automaton (NFA)-based
    operator network.
  • Benefits for routing
  • Fast structure matching.
  • A small maintenance cost for query updates.
  • Extensibility for supporting new operators.

Q1 /nitf head/pubdata_at_edition.areaSF
.//tobject.subject_at_tobject.subject.typeS
tock
Q2 /nitf head/pubdata_at_edition.areaSF
.//tobject.subject_at_tobject.subject.matterfis
hing
  • Y. Diao and M.J. Franklin. Query Processing for
    High-Volume XML Message Brokering. VLDB 2003.
  • Y. Diao, et al. Path Sharing and Predicate
    Evaluation for High-Performance XML Filtering.
    TODS, Dec. 2003.
  • ? YFilter v1.0 release Coming later this month!

16
Our Solution
  • Routing queries are a disjunction of path
    expressions
  • Each XPath expression (equivalent of the for and
    where clauses of FLOWR expressions) is a routing
    query.
  • Multiple routing queries can be connected by or.
  • Routing table representation
  • Merge routing queries into a single combined
    operator network.
  • Construction algorithm
  • Map() a user query ? a routing query in the
    canonical form.
  • Collect() routing queries sent from child
    brokers ? a routing table.
  • Aggregate() all the routing queries (at a node)
    ?a new routing query.

17
An Example Scenario
18
Example (continued)
19
Sharing and Short-cut Evaluation
  • A problem with sharing
  • Separate routing query representations short-cut
    evaluation.
  • Combined one sharing may sacrifice the short-cut
    evaluation strategy.
  • Solution dynamic pruning of the operator network
    at runtime
  • Each operator/NFA state has a static set of
    broker ids that it can reach.
  • System keeps a dynamic set of broker ids that
    have been reached.
  • YFilter execution is extended to prune the
    operator network using these sets.

20
Other Routing Considerations
  • Content Generalization
  • Large routing tables can be a problem.
  • Introduce content generation as an additional
    step in Collect( ) or Aggregate( ).
  • Generalization methods.
  • Trade off filtering power for routing (space)
    efficiency.
  • Filtering Power of Routing
  • Fraction of messages filtered by routing.
  • Selectivity of the union of the user queries at
    the node.
  • Loss in precision in the routing queries
    representing this node.
  • If inherently low, partition the query population
    to improve it.
  • An Exclusiveness Pattern e.g., /a/b_at_id?
  • Identify a set of such patterns, and partition
    queries using them.

21
Status and Conclusions
  • Queries bring intelligence to the network routing
    fabric.
  • We present a detailed architectural design of
    ONYX.
  • We address fundamental issues.
  • YFilters NFA-based operator networks are good
    for routing!
  • Locality of data interests is key to filtering
    power!
  • Status YFilter release, XML transmission, other
    implementation underway.
  • This is an area full of opportunities for
    optimization.
  • Improving routing efficiency.
  • Improving filtering power of routing.
  • Incremental message transformation.
  • Sharing among different processing tasks.
  • Schema-based optimization

22
Questions
ONYX
Write a Comment
User Comments (0)
About PowerShow.com