Selective Dissemination of Streaming XML

About This Presentation

Title:

Selective Dissemination of Streaming XML

Description:

Element Character Handler. Nested Path Expression. Treat Nested Sub-Queries as Another Query ... Eliminate Queries, which have Element Name(s) that are not ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 33

Provided by: Het72

Learn more at: http://web.cs.ucla.edu

Category:

more less

Transcript and Presenter's Notes

Title: Selective Dissemination of Streaming XML

1
Selective Dissemination of Streaming XML

By Hyun Jin Moon, Hetal Thakkar

2
Overview

Introduction
Background
XFilter
Architecture
Implementation
Optimizations
Experiments/Analysis
Conclusion
Related Work XTrie

3
Introduction

Information Dissemination
Enormous Amount of Data
Lots of Users
User Profiles
Bag of Keywords
Selective Distribution of Data
Applications
Stocks, Sports, Traffic, Electronic Personalized
Newspapers, Entertainment, etc.

4
Introduction (Contd)

Emergence of XML as Standard of Information
Exchange on Internet
Utilize Structure of XML for Better Dissemination
Use XPath(s) for User Profile
Optimizations for Searching a Streaming XML
Document for Many XPaths
XFilter
XTrie Structure

5
Background

SDI Structure
XPath
XML Parsers
DOM
SAX

6
Background SDI Architecture
7
Background XPath

Query Structure and Data
Enough Complexity for Dissemination
Constructs
Relative Path
//productprice/msrplt300/name

8
Background XML Parser

DOM Document Object Model
SAX Simple API for XML (SAX)
Standard Interface for Event-Based XML Parsing
Suitable for Streaming XML
Example

9
XFilter

Architecture
Implementation
Optimizations
List Balancing
Prefiltering
Experiments/Analysis
Conclusions

10
XFilter Architecture
11
XFilter Implementation

Filter Engine
Brute Force Approach
Instead,
Decompose Queries into Path Nodes
Create a Query Index from Path Nodes
Build a Finite State Machine on the Query Index
As a Document Arrives Traverse the FSM for All
Queries (In One Pass)

12
XFilter Implementation

Path Nodes
QueryId
Position
Sequence Number for Path Node in the Query
(XPath)
RelativePos
Relative Distance in the Document
Level (Can be Updated During Evaluation)
Absolute Level in the XML Document, at Which the
Path Node should be Checked

13
XFilter Implmentation

Query Index
Hash Table
Key Element Names that Appear in XPath
Expressions
Data 2 Lists Containing Path Nodes
Candidate List Current Node of Each Query
Representing Current State of the Query
Wait List Path Nodes Representing Future States

14
XFilter Implementation
15
XFilter Implementation

Start Element Handler
Inputs Name, Level, and Attribute-Values of the
Element
Action
Look-up Element Name in Query Index
Examine Nodes in Candidate List
Check Level, etc.
If All Checks Succeed AND Final Path Node of
Query Then the Document is Deemed to Match the
Query
Else If All Checks Succeed Then Move the Query to
its Next State
Else Do Nothing

16
XFilter Implementation

End Element Handler
Input Element Name
Action
Delete the Corresponding Path Nodes from the
Candidate List (for Restoring Purpose)
Element Character Handler
Input Data
Action Similar to Start Element Handler

17
XFilter Implementation

Example
Start Document
Start Element a Level 1
Start Element b Level 2
Start Element c Level 3
End Element c
End Element b
End Element a

18
XFilter Implementation
19
XFilter Implementation

Advanced Features
Attribute Filter
Start Element Event Handler
Content Filter
Element Character Handler
Nested Path Expression
Treat Nested Sub-Queries as Another Query

20
XFilter Optimizations

List Balancing (LB)
Basic Approach First Path Node for Each Query in
the Candidate List
Low Selectivity
Instead, Apply Candidate List Balancing
When Adding a New Query to Query Index the Path
Node Who has the Shortest Candidate List is
Chosen as the Pivot Node
Prefix

21
XFilter Optimizations

Prefiltering
Eliminate Queries, which have Element Name(s)
that are not Present in the Document
Yan and Garcia-Molinas Key Based Algorithm
Assign Key Element of the Queries
Create Occurrence Table for Each Arriving
Document
Occurrence Table Hash Table
Key Element Name
Data Queries, Whose Key is this Element
Only Queries in Occurrence Table are Checked
Further
Thus, Each Input Document is Parsed Twice

22
XFilter Experimental Setup
23
Experiment 1.1 The Effect of Number of Profiles

Number of Profiles (Standing XPath Queries)
Changes
Basic Algorithm Gives the Worst Performance
List Balance Improves
Prefiltering Leads to a Greater Speed-Up Than LB
2.6 of Profiles Match a Given Document
Basic Algorithm Examines 12 of Profiles
Prefiltering Examines Only 3.5 of Profiles

24
Experiment 1.2 The Effect of Number of Profiles

Number of Profiles Changes Same as Before
Skewed Selection of Elements Leads to
Unbalanced Query Index (Hash Table) in Basic
Algorithm
List Balance is Effective in Balancing the Hash
Table

25
Experiment 2.1The Effect of Depth

Maximum Depth of XML Documents and Queries Change
More Depth -gt More Checking -gt Greater Filtering
Time
List Balance and Prefiltering Graphs cross at
Depth 8. With Higher Depth,
Less Prefiltering
LB Benefits with More Choices of Pivot Elements

26
Experiment 2.2The Effect of Depth

Maximum Depth of XML Documents and Queries Change
Skewed Selection of Elements
LB Effectively Balances the Skewed Hash Table
After Level 4, the Presence of Element Names in
the Queries does not Change Much Due to Skewed
Distribution. Workload Characteristics Remain
Similar.

27
Experiment 3The Effect of Wildcard

Wildcard () Usage Probability in Queries
Change
Prefiltering is Slower with More Wildcards
Prefiltering Takes Extra Time Trying Filtering,
but Prefiltering cannot Filter Out the Wildcards
However, it is Unlikely that Many Profiles will
have such a High Proportion of Wildcards.

28
Experiment 4.1The Effect of Filter

Injected a New Fixed Attribute Named dummy into
the Documents with Certain Probability
Created a Simple Element Node Filter Containing
Only that Fixed Attribute
(e.g. _at_dummytrue)
In this Experiment, a Single Element Node Filter
is Placed in Different Levels of the Query with
Fixed Query Selectivity of 10
The Deeper the Filter, the Longer it Takes to Test

29
Experiment 4.2The Effect of Filter

Filters are Placed at Level 2, with Varying
Selectivity.
Logarithmic Scale on Selectivity
For All Algorithms, Performance is not Heavily
Affected by Filter Selectivity

30
Summary of Results

These Experiments Demonstrate that,
XFilter approach is scalable
The Extensions Provide Substantial Improvements
List Balance is Effective When the Distribution
of Elements in Queries is Highly Skewed
Prefiltering is Effective in Reducing the Number
of Profiles to Examine
Combination of LB-Prefiltering Provides the Best
Performance in All Cases
Considering that Distribution of Elements in
Queries of SDI Applications is Highly Skewed, and
Prefiltering Requires a Space Overhead, Simple LB
is Preferable in Many Practical Cases

31
Conclusions

XML Document Filtering System XFilter for
Selective Dissemination of Information (SDI)
Expressive Profiles in XPath Query Language
Profile Indexing and Matching Algorithms Based on
a FSM Approach
Optimization Techniques
List Balancing
Prefiltering

32
Related Work XTrie

Efficient Filtering of XML Documents with XPath
Expression ICDE 2002
Supports Complex XPath Expressions (As Opposed to
Simple, Single-Path Specifications)
e.g. /a/bc/d//eg//e/f////e/f
Supports Both Ordered and Unordered Matching of
XML Data
Ordered Matching //a//b/following-siblingd/c
Substring-Based Query Indexing
2 to 4 Times Faster Than XFilter

Write a Comment

User Comments (0)

About PowerShow.com

Selective Dissemination of Streaming XML - PowerPoint PPT Presentation

Selective Dissemination of Streaming XML

Element Character Handler. Nested Path Expression. Treat Nested Sub-Queries as Another Query ... Eliminate Queries, which have Element Name(s) that are not ... – PowerPoint PPT presentation