Selective Dissemination of Streaming XML - PowerPoint PPT Presentation

About This Presentation
Title:

Selective Dissemination of Streaming XML

Description:

Element Character Handler. Nested Path Expression. Treat Nested Sub-Queries as Another Query ... Eliminate Queries, which have Element Name(s) that are not ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 33
Provided by: Het72
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Selective Dissemination of Streaming XML


1
Selective Dissemination of Streaming XML
  • By Hyun Jin Moon, Hetal Thakkar

2
Overview
  • Introduction
  • Background
  • XFilter
  • Architecture
  • Implementation
  • Optimizations
  • Experiments/Analysis
  • Conclusion
  • Related Work XTrie

3
Introduction
  • Information Dissemination
  • Enormous Amount of Data
  • Lots of Users
  • User Profiles
  • Bag of Keywords
  • Selective Distribution of Data
  • Applications
  • Stocks, Sports, Traffic, Electronic Personalized
    Newspapers, Entertainment, etc.

4
Introduction (Contd)
  • Emergence of XML as Standard of Information
    Exchange on Internet
  • Utilize Structure of XML for Better Dissemination
  • Use XPath(s) for User Profile
  • Optimizations for Searching a Streaming XML
    Document for Many XPaths
  • XFilter
  • XTrie Structure

5
Background
  • SDI Structure
  • XPath
  • XML Parsers
  • DOM
  • SAX

6
Background SDI Architecture
7
Background XPath
  • Query Structure and Data
  • Enough Complexity for Dissemination
  • Constructs
  • Relative Path
  • //productprice/msrplt300/name

8
Background XML Parser
  • DOM Document Object Model
  • SAX Simple API for XML (SAX)
  • Standard Interface for Event-Based XML Parsing
  • Suitable for Streaming XML
  • Example

9
XFilter
  • Architecture
  • Implementation
  • Optimizations
  • List Balancing
  • Prefiltering
  • Experiments/Analysis
  • Conclusions

10
XFilter Architecture
11
XFilter Implementation
  • Filter Engine
  • Brute Force Approach
  • Instead,
  • Decompose Queries into Path Nodes
  • Create a Query Index from Path Nodes
  • Build a Finite State Machine on the Query Index
  • As a Document Arrives Traverse the FSM for All
    Queries (In One Pass)

12
XFilter Implementation
  • Path Nodes
  • QueryId
  • Position
  • Sequence Number for Path Node in the Query
    (XPath)
  • RelativePos
  • Relative Distance in the Document
  • Level (Can be Updated During Evaluation)
  • Absolute Level in the XML Document, at Which the
    Path Node should be Checked

13
XFilter Implmentation
  • Query Index
  • Hash Table
  • Key Element Names that Appear in XPath
    Expressions
  • Data 2 Lists Containing Path Nodes
  • Candidate List Current Node of Each Query
    Representing Current State of the Query
  • Wait List Path Nodes Representing Future States

14
XFilter Implementation
15
XFilter Implementation
  • Start Element Handler
  • Inputs Name, Level, and Attribute-Values of the
    Element
  • Action
  • Look-up Element Name in Query Index
  • Examine Nodes in Candidate List
  • Check Level, etc.
  • If All Checks Succeed AND Final Path Node of
    Query Then the Document is Deemed to Match the
    Query
  • Else If All Checks Succeed Then Move the Query to
    its Next State
  • Else Do Nothing

16
XFilter Implementation
  • End Element Handler
  • Input Element Name
  • Action
  • Delete the Corresponding Path Nodes from the
    Candidate List (for Restoring Purpose)
  • Element Character Handler
  • Input Data
  • Action Similar to Start Element Handler

17
XFilter Implementation
  • Example
  • Start Document
  • Start Element a Level 1
  • Start Element b Level 2
  • Start Element c Level 3
  • End Element c
  • End Element b
  • End Element a

18
XFilter Implementation
19
XFilter Implementation
  • Advanced Features
  • Attribute Filter
  • Start Element Event Handler
  • Content Filter
  • Element Character Handler
  • Nested Path Expression
  • Treat Nested Sub-Queries as Another Query

20
XFilter Optimizations
  • List Balancing (LB)
  • Basic Approach First Path Node for Each Query in
    the Candidate List
  • Low Selectivity
  • Instead, Apply Candidate List Balancing
  • When Adding a New Query to Query Index the Path
    Node Who has the Shortest Candidate List is
    Chosen as the Pivot Node
  • Prefix

21
XFilter Optimizations
  • Prefiltering
  • Eliminate Queries, which have Element Name(s)
    that are not Present in the Document
  • Yan and Garcia-Molinas Key Based Algorithm
  • Assign Key Element of the Queries
  • Create Occurrence Table for Each Arriving
    Document
  • Occurrence Table Hash Table
  • Key Element Name
  • Data Queries, Whose Key is this Element
  • Only Queries in Occurrence Table are Checked
    Further
  • Thus, Each Input Document is Parsed Twice

22
XFilter Experimental Setup
23
Experiment 1.1 The Effect of Number of Profiles
  • Number of Profiles (Standing XPath Queries)
    Changes
  • Basic Algorithm Gives the Worst Performance
  • List Balance Improves
  • Prefiltering Leads to a Greater Speed-Up Than LB
  • 2.6 of Profiles Match a Given Document
  • Basic Algorithm Examines 12 of Profiles
  • Prefiltering Examines Only 3.5 of Profiles

24
Experiment 1.2 The Effect of Number of Profiles
  • Number of Profiles Changes Same as Before
  • Skewed Selection of Elements Leads to
    Unbalanced Query Index (Hash Table) in Basic
    Algorithm
  • List Balance is Effective in Balancing the Hash
    Table

25
Experiment 2.1The Effect of Depth
  • Maximum Depth of XML Documents and Queries Change
  • More Depth -gt More Checking -gt Greater Filtering
    Time
  • List Balance and Prefiltering Graphs cross at
    Depth 8. With Higher Depth,
  • Less Prefiltering
  • LB Benefits with More Choices of Pivot Elements

26
Experiment 2.2The Effect of Depth
  • Maximum Depth of XML Documents and Queries Change
  • Skewed Selection of Elements
  • LB Effectively Balances the Skewed Hash Table
  • After Level 4, the Presence of Element Names in
    the Queries does not Change Much Due to Skewed
    Distribution. Workload Characteristics Remain
    Similar.

27
Experiment 3The Effect of Wildcard
  • Wildcard () Usage Probability in Queries
    Change
  • Prefiltering is Slower with More Wildcards
  • Prefiltering Takes Extra Time Trying Filtering,
    but Prefiltering cannot Filter Out the Wildcards
  • However, it is Unlikely that Many Profiles will
    have such a High Proportion of Wildcards.

28
Experiment 4.1The Effect of Filter
  • Injected a New Fixed Attribute Named dummy into
    the Documents with Certain Probability
  • Created a Simple Element Node Filter Containing
    Only that Fixed Attribute
  • (e.g. _at_dummytrue)
  • In this Experiment, a Single Element Node Filter
    is Placed in Different Levels of the Query with
    Fixed Query Selectivity of 10
  • The Deeper the Filter, the Longer it Takes to Test

29
Experiment 4.2The Effect of Filter
  • Filters are Placed at Level 2, with Varying
    Selectivity.
  • Logarithmic Scale on Selectivity
  • For All Algorithms, Performance is not Heavily
    Affected by Filter Selectivity

30
Summary of Results
  • These Experiments Demonstrate that,
  • XFilter approach is scalable
  • The Extensions Provide Substantial Improvements
  • List Balance is Effective When the Distribution
    of Elements in Queries is Highly Skewed
  • Prefiltering is Effective in Reducing the Number
    of Profiles to Examine
  • Combination of LB-Prefiltering Provides the Best
    Performance in All Cases
  • Considering that Distribution of Elements in
    Queries of SDI Applications is Highly Skewed, and
    Prefiltering Requires a Space Overhead, Simple LB
    is Preferable in Many Practical Cases

31
Conclusions
  • XML Document Filtering System XFilter for
    Selective Dissemination of Information (SDI)
  • Expressive Profiles in XPath Query Language
  • Profile Indexing and Matching Algorithms Based on
    a FSM Approach
  • Optimization Techniques
  • List Balancing
  • Prefiltering

32
Related Work XTrie
  • Efficient Filtering of XML Documents with XPath
    Expression ICDE 2002
  • Supports Complex XPath Expressions (As Opposed to
    Simple, Single-Path Specifications)
  • e.g. /a/bc/d//eg//e/f////e/f
  • Supports Both Ordered and Unordered Matching of
    XML Data
  • Ordered Matching //a//b/following-siblingd/c
  • Substring-Based Query Indexing
  • 2 to 4 Times Faster Than XFilter
Write a Comment
User Comments (0)
About PowerShow.com