An%20Approach%20to%20Generalized%20Hashing - PowerPoint PPT Presentation

About This Presentation

Title:

An%20Approach%20to%20Generalized%20Hashing

Description:

Also many hash functions designed, including several universal families ... Multiple data items can be crammed into a word, so let's take advantage of that. ... – PowerPoint PPT presentation

Number of Views:13

Avg rating:3.0/5.0

Slides: 12

Provided by: michael598

Learn more at: http://www.aladdin.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: An%20Approach%20to%20Generalized%20Hashing

1
An Approach to Generalized Hashing

Michael Klipper
With
Dan Blandford
Guy Blelloch

2
Hashing techniques currently available

Many hashing algorithms out there
Separate chaining
Cuckoo hashing
FKS perfect hashing
Also many hash functions designed, including
several universal families
Good O(1) expected amortized time for updates,
and many have O(1) worst case time for searches
Bad Require fixed-length keys and fixed-length
data

3
Whats so bad about fixed length?

Easy to waste a lot of space
Every hash bucket must be as large as the largest
item to be stored in the table.
This is a large problem for sparsely-filled
tables, or tables where large items occur
infrequently.
Hash tables are often building blocks to more
complicated structures, so optimizing them pays
off in a lot of places.

4
Example A Graph Layout Where We Store Edges in a
Hash Table

Lets say u is a vertex of degree d and v1, vd
are its neighbors. Lets say that v0 vd1 u
by convention. Then the entry representing the
edge (u, vi) has key (u, vi) and data (vi-1,
vi1).

Hash Table
This extra entry starts the list.
u
u
u
u
v2
v1
u
v3
v1
u
v4
v2
v3
v2
v1
v4
Degree of Vertex
4
5
An Idea for Compression

Instead of ((u, vi), (vi-1, vi1)) in the table,
we will store
((u, vi u), (vi-1 u, vi1 u)).
With this representation, we need O(kn) space
where k S(u,v)ÎE log u v.
A good labeling of the vertices will make many of
these differences small! But not all of them.
The following paper has details

D. Blandford, G. E. Blelloch, and I. Kash.
Compact Representations of Separable Graphs. In
SODA, 2003, pages 342-351.
6
First, a simpler problem

Variable-length data stored in arrays
Its like a hash table except that the indices
now are in the fixed range 0n-1 for n items in
the array.

Well use the following data for our example in
these slides (0, 10110) (1, 0110) (2, 11111) (3,
0101) (4, 1100) (5, 010) (6, 11011) (7,
00001111) Well assume that the word size of the
machine is 2 bytes.
7
Key Idea BLOCKS

Multiple data items can be crammed into a word,
so lets take advantage of that.
Two words in a block one with data, one marking
off separations of strings
If the first index in a block is i, well label
the block as bi

0 1 1 0
b0
1 0 1 1 0
1 1 1 1 1
1st word
2nd word
1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0
This is the block containing strings s0 through
s2 from our example.
8
Organization of Blocks

Index structure (regular array) Ai 1 if and
only if string i starts a block
Hash table (one of the regular kind) if string
i starts a block, H(i) address of bi
Note that it is easy to split and merge blocks.

Key size invariant two adjacent blocks (like b0
and b3 in the example) must have their sizes sum
to greater than the word size of the machine
H(0)
b0
1
0
0
H(3)
b3
1
A
0
0
0
H(7)
b7
1
9
A Rough Look at Space and Time Bounds for this
Array Structure

Lets say we have n items, and w is the word size
in bits of the machine. WLOG all data strings
are nonempty.
Let m Si si.

At most w strings
1
0
lt w apart
0
On avg block is ½ full
1
0
Lookup tables cut the time down to constant time
for finding a block and the string inside it,
since they can operate on entire words at
once. Indexing structures hash table use O(w)
bits per block. Each block is on average half
full due to the invariant. O(m w) bits used
and operations are O(1) time!
0
0
O(m/w 1) blocks!
1
10
Briefly, how we proceed from there

We can finally implement our generalized hash
table using an array of the type we just
described as the hash table.
There are more details the following paper
explains this.

D. Blandford and G. E. Blelloch. Storing
Variable-Length Keys in Arrays, Sets, and
Dictionaries, with Applications. In Symposium on
Discrete Algorithms (SODA), 2005 (hopefully)
11
Great, but if theres a paper written on the
subject already, then what do I do?

A lot of this code isnt yet written. We havent
yet checked to see that the code we have fulfills
the theoretical bounds, since we have to make
sure that any cutting corners done for the
programming is theoretically safe.
My job is to get a lot of this running and look
for optimizations.
Also, once this is running, well want to run
experiments to see how well it runs, especially
in modeling graphs.