CSC%20172%20DATA%20STRUCTURES - PowerPoint PPT Presentation

About This Presentation

Title:

CSC%20172%20DATA%20STRUCTURES

Description:

csc 172 data structures – PowerPoint PPT presentation

Number of Views:112

Avg rating:3.0/5.0

Slides: 39

Provided by: Thad167

Learn more at: https://www.cs.rochester.edu

Category:

more less

Transcript and Presenter's Notes

Title: CSC%20172%20DATA%20STRUCTURES

1
CSC 172 DATA STRUCTURES
2
SETS and HASHING

Unadvertised in-store special SETS!
in JAVA, see Weiss 4.8
Simple Idea Characteristic Vector
HASHING...The main event.

3
Representation of Sets

List
Simple O(n) dictionary operations
Binary Search Trees
O(log n) average time
Range queries, sorting
Characteristic Vector
O(1) dictionary ops, but limited to small sets
Hash Table
O(1) average for dictionary ops
Tricky to expand, no range queries

4
Characteristic Vectors

Boolean Strings whose position corresponds to the
members of some fixed universal set
A 1 in a location means that the element is in
the set
A 0 means that it is not

5
MUSIC THEORY

A chord is a set of notes played at the same
time.
Represented by a 12 bit vector called a pitch
class
B,A,A,G,G,F,F,E,D,D,C,C
000010010001 represents C major
000010001001 represents C minor
Rotation is transposition
Bit reversal is inversion

6
UNIX file privileges

user, group, others x read, write, execute
9 possible privileges
Type ls l on UNIX
total 142
-rw-rw-r-- 1 pawlicki none 76 Jun 20
2000 PKG416.desc
-rw-rw-r-- 1 pawlicki none 28906 Jun 20
2000 PKG416.pdf
-rw-rw-r-- 1 pawlicki none 1849 Jun 20
2000 let.1
-rw-rw-r-- 1 pawlicki none 0 Apr 2
1303 out
-rw-rw-r-- 1 pawlicki none 39891 Jun 20
2000 stapp.uu

7
UNIX files

The order is rwx for each of user (owner), group,
and others
So, a protection mode of 110100000 means that the
owner may read and write (but not execute), the
group can read only and others cannot even read

8
GAMBLING

A deck has 52 cards
2C,2H,2S,2D,3C, .... KD,AC,AH,AS,AD
Represent a hand as a vector of 52 bits
00000000000000000000000000000000000000000000000001
01 is a pair of aces
In Texas Hold'em everyone gets two hole cards
and 5 board cards
We can use bitwise to find hands

9
CV advantages

If the universal set is small, sets can be
represented by bits packed 32 to a word
Insert, delete, and lookup are O(1) on the proper
bit
Union, intersection, difference are implemented
on a word-by-word basis
O(m) where m is the size of the set
Small constant factor (1/32)
Fast, machine operations

10
Hashing

A cool way to get from an element x to the place
where x can be found
An array 0..B-1 of buckets
Bucket contains a list of set elements
B number of buckets
A hash function that takes potential set elements
and quickly produces a random integer 0..B-1

11
Example

If the set elements are integers then the
simplest/best hash function is usually h(x) x
B or h(x) x - (xB), (never 0).
Suppose B 6 and we wish to store the integers
70, 53, 99, 94, 83, 76, 64, 30
They belong in the buckets 4, 5, 3, 4, 5, 4, 4,
and 0
Note If B 7 0,4,1,3,6,6,1,2

12
Pitfalls of Hash Function Selection

We want to get a uniform distribution of elements
into buckets
Beware of data patterns that cause non-uniform
distribution

13
Example

If integers were all even, then B 6 would cause
only buckets 0,2, and 4 to fill
If we hashed words in the UNIX dictionary into 10
buckets by length of word then 20 go into bucket
7

14
Dictionary Operations

Lookup
Go to head of bucket h(x)
Search for bucket list. If x is in the bucket
Insertion append if not found
Delete list deletion from bucket list

15
Analysis

If we pick B to be new N, the number of elements
in the set, then the average list is O(1) long
Thus, dictionary ops take O(1) time
Worst case all elements go into one bucket
O(n)

16
Managing Hash Table Size

If n gets as high as 2B, create a new hash table
with 2B buckets
Rehash every element into the new table
O(n) time total
There were at least n inserts since the last
rehash
All these inserts took time O(n)
Thus, we amortize the cost of rehashing over
the inserts since the last rehash
Constant factor, at worst
So, even with rehashing we get O(1) time ops

17
Collisions

A collision occurs when two values in the set
hash to the same value
There are several ways to deal with this
Chaining (using a linked list or some secondary
structure)
Open Addressing
Double hashing
Linear Probing

18
Chaining
Very efficient Time Wise
Other approaches Use less space
?
19
Open Addressing

When a collision occurs,
if the table is not full find an available space
Linear Probing
Quadratic Probing
Double Hashing

20
Linear Probing

If the current location is occupied, try the next
table location
LinearProbingInsert(K)
if (table is full) error
probe h(K)
while (tableprobe is occupied)
probe probe M
tableprobe K
Walk along table until an empty spot is found
Uses less memory than chaining (no links)
Takes more time than chaining (long walks)
Deleting is a pain (mark a slot as having been
deleted)

21
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5,
22
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2,
23
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9,
24
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7,
25
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7, 6,
26
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7, 6, 5,
27
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7, 6, 5,
28
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7, 6, 5,
29
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7, 6, 5,
30
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7, 6, 5,
8
31
Linear Probing
h(K) K 13
Insert 18, 41, 22, 59, 32, 31, 73
h(K) 5, 2, 9, 7, 6, 5,
8
73
32
Double Hashing

If the current location is occupied, try another
table location
Use two hash functions
If M is prime, eventually will examine every
location
DoubleHashInsert(K)
if (table is full) error
probe h1(K)
offset h2(K)
while (tableprobe is occupied)
probe (probeoffset) M
tableprobe K
Many of the same (dis)advantages as linear
probing
Distributes keys more evenly than linear probing

33
Quadratic Probing

Don't step by 1 each time. Add i2 to the h(x)
hashed location (mod B of course) for i 1,2,...

34
Double Hashing
h1(K) K 13 h1(K) 8 - K 8
Insert 18, 41, 22, 59, 32, 31, 73
h1(K) 5, 2, 9, 7, 6, 5,
8
h2(K) 6, 7, 2, 5, 8, 1,
7
35
Double Hashing
h1(K) K 13 h1(K) 8 - K 8
Insert 18, 41, 22, 59, 32, 31, 73
h1(K) 5, 2, 9, 7, 6, 5,
8
h2(K) 6, 7, 2, 5, 8, 1,
7
31
36
Double Hashing
h1(K) K 13 h1(K) 8 - K 8
Insert 18, 41, 22, 59, 32, 31, 73
h1(K) 5, 2, 9, 7, 6, 5,
8
h2(K) 6, 7, 2, 5, 8, 1,
7
31
73
37
Theoretical Results
38
Expected Probes
1.0
0.5
1.0

Write a Comment

User Comments (0)