Fast Trie Data Structures - PowerPoint PPT Presentation

About This Presentation
Title:

Fast Trie Data Structures

Description:

Example for worst case complexity. root. bh-1. b-1. b-1. b-1. JK. 11 ... Improve space by splitting the set of keys into subsets. How to split is the problem: ... – PowerPoint PPT presentation

Number of Views:323
Avg rating:3.0/5.0
Slides: 28
Provided by: katzj8
Category:
Tags: data | fast | keys | structures | trie

less

Transcript and Presenter's Notes

Title: Fast Trie Data Structures


1
Fast Trie Data Structures
  • Seminar On Advanced Topics In Data Structures
  • Jacob Katz
  • December 1, 2001
  • Dan E. Willard, 1981, New Trie Data Structures
    Which Support Very Fast Search Operations

2
Agenda
  • Problem statement
  • Existing solutions and motivation for a new one
  • P-Fast tries their complexity
  • Q-Fast tries their complexity
  • X-Fast tries their complexity
  • Y-Fast tries their complexity

3
Problem statement
  • Let S be a set of N records with distinct integer
    keys in range 0, M, with the following
    operations
  • MEMBER(K) does the key K belong to the set
  • SUCCESSOR(K) find the least element which is
    greater than K
  • PREDECESSOR(K) find the greatest element which
    is less than K
  • SUBSET(K1, K2) produce a list of elements whose
    keys lie between K1 and K2
  • The problem efficient data structure supporting
    this definition

4
Existing solutions
  • AVL trees, 2-3 trees use O(N) space and O(log N)
    time in worst case
  • With no restriction on the keys better
    performance is impossible
  • Expected O(log log N) time is possible when keys
    are uniformly distributed
  • Stratified trees use O(M log log M) space and
    O(log log M) time in worst case for integer keys
    in range 0, M
  • Disadvantage O(M log log M) space is much
    larger when O(N), if M gtgt N

5
Motivation for another solution
  • More space-efficient data structure is wanted for
    restricted keys, which still maintains the time
    efficiency

6
The way to the solution
  • We first define P-Fast Trie
  • O( ) time O(N 2 )
    space
  • Then show Q-Fast Trie
  • improvement to the space requirement to O(N)
  • Then show X-Fast Trie
  • O(log log M) time O(Nlog M) space no dynamic
    operations
  • Then show Y-Fast Trie
  • O(log log M) time O(N) space no dynamic
    operations

7
Whats Trie
  • Trie of size (h, b) is a tree of height h and
    branching factor b
  • All keys can be regarded as integers in range 0,
    bh
  • Each key K can be represented as h-digit number
    in base b K1K2K3Kh
  • Keys are stored in the leaf level path from the
    root resembles decomposition of the keys to
    digits

8
Trivial Trie
  • In each node store vector of branches
  • MEMBER(K) O(h)
  • visits O(h) nodes, spends O(1) time in each
  • SUCCESSOR(K)/PREDECESSOR(K) O(hb)
  • visits O(h) nodes, spend O(b) time in each node
  • this is too much time
  • Observation increasing b (the base of key
    representation, the branching factor) decreases h
    (number of digits required to represent a key,
    the height of the tree) and vice versa

9
Example for worst case complexity
10
P-Fast Trie Idea
  • Improve SUCCESSOR(k)/PREDECESSOR(k) time by
    overcoming the linear search in every
    intermediate node

11
P-Fast Trie
  • Each internal node v has additional fields
  • LOWKEY(v) leaf node containing the smallest key
    descending from v
  • HIGHKEY(v) leaf node containing the largest key
    descending from v
  • INNERTREE(v) binary tree of worst-case height
    O(log b) representing the set of digits directly
    descending from v
  • Each leaf node points to its immediate neighbors
    on the left and on the right
  • CLOSEMATCH(K) query returning the node with key
    K if it exists in the trie returning
    PREDECESSOR(K) or SUCCESSOR(K) otherwise

12
CLOSEMATCH(k) Algorithm Intuitively
  • Starting from Root, look for kk1k2..kh
  • If found, return it
  • If not, then v is the node at depth j from which
    theres no way down any more
  • kj Ï INNERTREE(v)
  • Looking for kj in INNERTREE(v), find D existing
    digit in INNERTREE(v) that is either
  • the least digit greater than kj
  • the greatest digit less than kj
  • If D gt kj, then return LOWKEY(ds child of v),
    else if D lt kj, then return HIGHKEY(ds child of
    v)

13
P-Fast Trie Complexities
  • CLOSEMATCH(K) time complexity is O(h log b)
  • Other queries require O(1) addition to the
    CLOSEMATCH(K) complexity
  • Space complexity of such trie is O(hbN)
  • Representing the input keys in base 2
    requires
  • digits, therefore with such h and b
    the desired complexities are achieved

14
Q-Fast Trie Idea
  • Improve space by splitting the set of keys into
    subsets
  • How to split is the problem
  • To preserve the time complexity
  • To decrease the space complexity

15
Q-Fast Trie
  • Let S denote the ordered list of keys from S
  • 0 K1 lt K2 lt K3 lt lt KL lt M
  • Define
  • Si K Î S Ki K Ki1 for i lt L
  • SL K Î S K ³ KL
  • S is a c-partition of S iff each Si has
    cardinality in range c, 2c-1
  • Q-Fast Trie of size (h, b, c) is a two-level
    structure
  • Upper part p-fast trie T of size (h, b)
    representing set S which is a c-partition of S
  • Lower part forest of 2-3 trees, where ith tree
    represents Si
  • The leafs of 2-3 trees are connected to form an
    ordered list

16
Example of Q-Fast Trie
17
CLOSEMATCH(k) Algorithm Intuitively
  • Look for DPREDECESSOR(k) in the upper part
  • O(h log b)
  • Then search the Ds 2-3 tree for k
  • O(log c)

18
Q-Fast Trie Complexities
  • CLOSEMATCH(K) time complexity is O(h log b
    log c)
  • Other queries require O(1) addition to the
    CLOSEMATCH(K) complexity
  • Space complexity is O(NNhb/c)
  • By choosing h , b 2 , c
    hb, the desired complexities are achieved

19
P/Q-Fast Trie Insertion/Deletion
  • P-fast trie
  • Use AVL trees for INNERTREEs
  • O(h log b) for insertion/deletion
  • Q-fast trie
  • O(h log b log c) for insertion/deletion
  • Maintenance of c-partition property through trees
    splitting/merging in O(log c) time

20
X-Fast Trie Idea
  • P/Q-Fast trie uses top-down search to get to the
    wanted level, making binary search in each node
    on the way.
  • Thus, P/Q-Fast Trie relies on the balance between
    the height of the tree and the branching factor
  • X-Fast trie idea Use binary search of the wanted
    level
  • Requires to be possible to find the wanted node
    by knowing its level without top-down pass
  • For the purpose of worst case complexity the
    branching factor is not important any more, since
    it only affects the basis of the log

21
X-Fast Trie
  • Part 1 Trie of height h and branching factor 2
    (representing all keys in binary)
  • Each node has additional field DESCENDANT(v)
  • If v has only right branch, it points to the
    largest leaf descending from v (thru the left
    branch)
  • If v has only left branch, it points to the
    smallest leaf descending from v (thru the right
    branch)
  • All leaves form doubly-linked list
  • Node v at height j may have descending leaves
    only in range (i-1)2j1, i2j for some integer
    i this i is called ID(v)
  • Node v at height j is called ancestor of key K,
    if K/2jID(v)
  • BOTTOM(k) is the lowest ancestor of K

22
X-Fast Trie
  • Part 2 h1 Level Search Structures (LSS), each
    of which uses perfect hashing as we have seen in
    the first lecture
  • Linear space constant time

23
BOTTOM(k) Algorithm Intuitively
  • Make binary search among the h1 different LSSs
  • Searching each LSS is O(1)
  • h log M, therefore binary search of h1 LSSs is
    O(log log M)

24
X-Fast Trie Complexities
  • BOTTOM(k) is O(log log M)
  • All queries require O(1) addition to BOTTOM(k),
    with assistance of the DESCENDANT field and the
    doubly-linked list
  • BOTTOM(K) is either K itself, or its DESCENDANT
    is PREDECESSOR(K)/SUCCESSOR(K)
  • Space is O(N log M)
  • No more than h N nodes in the trie (hlog M)
  • log M LSSs each using O(N) space

25
Y-Fast Trie Idea
  • Apply similar partitioning technique, as done for
    P-Fast trie to move to Q-Fast trie
    c-partitioning of all the keys to L subsets each
    containing c, 2c-1 keys
  • Upper part X-Fast trie representing S
  • Lower part forest of binary trees of height log c

26
Y-Fast Trie Complexities
  • Upper part can be searched within O(log log M)
    time and occupies no more than O((N/c) log M)
    space
  • Each binary tree can be searched within O(log c)
    and they all together occupy O(N) space
  • Choosing clog M O(N) space O(log log M) time

27
X/Y-Fast Trie Insertion/Deletion
  • LSSs have practically uncontrolled time
    complexity for dynamic operations
  • At least at the time the article was presented
  • Therefore, X/Y-Fast tries inherit this limitation
Write a Comment
User Comments (0)
About PowerShow.com