Title: Data Structures
1Data Structures Algorithms Radix Search
Richard Newman based on slides by S. Sahni and
book by R. Sedgewick
2Radix-based Keys
- Key has multiple parts
- Each part is an element of some set
- Character
- Numeral
- Key parts can be accessed (e.g., string si)
- Size of set is radix
3Advantages of Radix-based Search
- Good worst-case performance
- Simpler than balanced trees, etc.
- Fast access to data
- Easy way to handle variable-length keys
- Save space (part of key in structure)
4Disadvantages of Radix-based Search
- May be space-inefficient
- Performance depends on access to bytes of keys
- Must have distinct keys, or other way to handle
duplicate keys
5Digital Search Trees
- Similar to binary search trees
- Difference is that we use bits of the key to
determine subtree to search - Path in tree prefix of key
6Digital Search Trees
A
Key Repr A 00001 S 10011 E 00101 R 10010 C 00011 H
01000 I 01001 N 01110 G 00111
1
0
S
E
1
0
1
0
R
C
H
1
1
0
1
0
0
N
G
I
1
0
1
0
1
0
Note that binary tree is not sorted in BST sense
7Digital Search Trees
Prop 15.1 A search or insertion into a DST takes
about lg N comparisons on average, and about 2 lg
N comparisons in the worst case, in a tree built
from N keys. The number of comparisons is never
more than the number of bits in the search key.
8Tries
- Use bits of key to guide search like DST
- But keep keys in order like BST
- Allow recursive sort, etc.
- Pronounced try-ee or try
- Keys kept at leaves of a binary tree
9Tries
- Defn. 15.1 A trie is a binary tree that has
keys associated with each leaf, defined as
follows - a trie for an empty set is a null link
- a trie for a single key is a leaf w/key
- a trie for gt 1 key is an internal node with
left link referring to trie for keys that start
with 0, right for keys 1xxx
10Tries
Key Repr A 00001 S 10011 E 00101 R 10010 C 00011 H
01000 I 01001 N 01110 G 00111
A
1
0
S
A
Construct tree to point where prefixes match
11Tries
Key Repr A 00001 S 10011 E 00101 R 10010 C 00011 H
01000 I 01001 N 01110 G 00111
A
1
0
S
A
1
0
1
0
1
0
1
0
A
E
1
0
1
0
R
S
Construct tree to point where prefixes match
12Tries
Key Repr A 00001 S 10011 E 00101 R 10010 C 00011 H
01000 I 01001 N 01110 G 00111
1
0
1
0
1
0
H
1
0
1
0
A
E
1
0
1
0
1
0
C
A
R
S
13Tries
Key Repr A 00001 S 10011 E 00101 R 10010 C 00011 H
01000 I 01001 N 01110 G 00111
1
0
1
0
1
0
H
1
0
1
0
1
0
E
1
0
1
0
1
0
1
0
1
0
C
A
R
S
H
I
14Tries
- Prop. 15.2 The structure of a trie is
independent of key insertion order there is one
unique trie for any given set of distinct keys. - Prop. 15.3 Insertion or search for a random key
in a trie built from N random keys takes about lg
N bit comparisons on average, in the worst case,
bounded by bits in key
15Tries
- Annoying feature of tries
- One-way branching when keys have common prefix
- Prop. 15.4 A trie built from N random w-bit
keys has about N/lg 2 nodes on the average (about
1.44 N)
16Patricia Tries
- Annoying feature of tries
- One-way branching when keys have common prefix
- Two different types of nodes in trie
- Patricia tries fix both of these
- Practical Algorithm To Retrieve Information
Coded In Alphanumeric
17Patricia Tries
- Avoid one-way branching
- Keep at each node the index of the next bit to
test - Skip over common prefix!
- Avoid two types of nodes
- Store data in internal nodes
- Replace external links with back links
18Patricia Tries
Key Repr A 00001 S 10011 E 00101 R 10010 C 00011 H
01000 I 01001 N 01110 G 00111
0
S
4
1
R
H
2
E
3
C
4
A
19Patricia Tries
Key Repr A 00001 S 10011 E 00101 R 10010 C 00011 H
01000 I 01001 N 01110 G 00111
0
S
4
1
R
H
2
E
3
C
4
A
20Patricia Tries
- Prop 15.5 Insertion or search in a patricia
trie built from N random bitstrings takes about
lg N bit comparisons on average, and about 2 lg N
in the worst case, but never more than the length
of the key.
21Map
- Radix search
- Digital Search Trees
- Tries
- Patricia Tries
- Multiway tries and TSTs
- Text string algorithms
22Multiway Tries
- Like radix sort, can get benefit from comparing
more than one bit at a time - Compare r bits, speed up search by a factor of r
- What could possibly be bad?
- Number of links is now R2r
- Can waste a lot of space!
23Multiway Tries
- Structure is (almost) the same as binary tries
- Except there are R branches
- Search start at root, leftmost digit
- Follow ith link if next R-ary digit is i
- If null link, then miss
- If reach leaf, it contains only key with prefix
matching path to it - compare
24Existence Tries
- Only keys, no records
- Insert/search
- Defn. 15.2 The existence trie for a set of keys
is - Empty set null link
- Non-empty set internal node with links for each
possible digit to tries built with the leading
digit omitted
25Existence Tries
- Convenient to return null on miss, dummy record
on hit - Convenient to have no duplicate keys and no key a
prefix of another key - Keys of fixed length, or
- Use termination character with value NULLdigit,
only used as sentinel
26Existence Tries
- No need to store any data
- All keys captured in trie structure
- If reach NULLdigit at the same time we run out of
key digits, search hit - Otherwise, search miss
- Insert search until find null link, then add
nodes for each of the remaining digits in the key
27Existence Tries
the time is now for
t
n
i
f
h
i
s
o
o
m
e
w
r
e
28Multi-way Tries
- R-ary branching
- Keys stored at leaves
- Path to leaf defines prefix of key stored at leaf
- Only build tree downward until prefixes become
distinct
29Multi-way Tries
- Defn. 15.3 The multiway trie for a set of keys
associated with leaves is - Set empty null link
- Singleton set leaf with key
- Larger set internal node with links for each
possible digit to tries built with the leading
digit omitted
30Multi-way Tries
31Summary
- Radix search
- Digital Search Trees
- Tries
- Patricia Tries
- Multiway tries and TSTs
- Text string algorithms