ROAD MAP - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

ROAD MAP

Description:

DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by nan TAHRALI REVIEW We have investigated the following ADTs LISTS Array Linked List STACKS QUEUE TREES ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 41
Provided by: inanct5
Category:
Tags: map | road | trees

less

Transcript and Presenter's Notes

Title: ROAD MAP


1
DATA STRUCTURES ANDALGORITHMS
Lecture Notes 7 Prepared by Inanç TAHRALI
2
REVIEW
  • We have investigated the following ADTs
  • LISTS
  • Array
  • Linked List
  • STACKS
  • QUEUE
  • TREES
  • Binary Trees
  • Binary Search Trees
  • AVL Trees
  • What about their running times ?

3
Running times of important operations
insertion deletion find
Array O(n) O(n) O(n)
Linked list O(1) O(n) O(n)
Tree O(log n) O(log n) O(logn)
Can we decrease the running times more ?
4
ROAD MAP
  • HASHING
  • General Idea
  • Hash Function
  • Separate Chaining
  • Open Adressing
  • Rehashing

5
Hashing
  • Hashing implementation of hash tables
  • hash table an array of elements
  • fixed size TableSize
  • Search is performed on a part of the item key
  • Each key is mapped into a number
  • in the range 0 to TableSize-1
  • Used as array index
  • Mapping by hash function
  • Simple to compute
  • Ensure that any two distinct keys get different
    cells
  • How to perform insert, delete and find operations
    in O(1) time ?

6
An ideal hash table
  • Each key is mapped to a different index !
  • Not always possible
  • many keys, finite indexes
  • Even distribution
  • Considerations
  • Choose a hash function
  • Decide what to do when two keys hash to the same
    value
  • Decide on table size

7
Hash function
  • If keys are integers
  • hash function return Key mod TableSize
  • Ex TableSize 10
  • Keys 120, 330, 1000
  • TableSize should be prime

8
Hash function
  • If keys are strings
  • Add ASCII values of the characters
  • If TableSize is large and number of characters is
    small
  • TableSize 10000 number of characters in a
    key 8
  • 12781016 lt 10000

int hash( const string key, int tableSize
) int hashVal 0 for( int i 0 i lt
key.length( ) i ) hashVal
keyi return hashVal tableSize
9
Hash function
  • If keys are strings
  • Use all characters
  • ? 32i Key KeySize -i -1
  • Early characters does not count
  • Use only some number of characters
  • Use characters in odd spaces

10
Hash function
  • If keys are strings
  • Use first three characters
  • 729key2 27key1 key0
  • If the keys are not random some part of the table
    is not used.

int hash( const string key, int tableSize
) return ( key 0 27 key 1 729
key 2) tableSize
11
A good hash function
  • int hash( const string key, int tableSize )
  • int hashVal 0
  • for( int i 0 i lt key.length( ) i )
  • hashVal 37 hashVal key i
  • hashVal tableSize
  • if( hashVal lt 0 )
  • hashVal tableSize
  • return hashVal

12
Collusion
  • Main programming detail is collision resolution
  • If when an element is inserted, it hashes to the
    same value as an already inserted element, there
    is collision.
  • There are several methods to deal with this
    problem
  • Separate chaining
  • Open addressing

13
Separate Chaining Hash Table
  • Keep a list of all elements that hash to the same
    value
  • TableSize 10
  • is not good
  • not prime

14
Type declaration for separate chaining hash table
  • template ltclass HashedObjgt
  • class HashTable
  • public
  • explicit HashTable(const HashedObj
    notFound,int size 101)
  • HashTable( const HashTable rhs )
  • ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND),theLists(
    rhs.theLists )
  • const HashedObj find( const HashedObj x )
    const
  • void makeEmpty( )
  • void insert( const HashedObj x )
  • void remove( const HashedObj x )
  • const HashTable operator( const HashTable
    rhs )
  • private
  • vectorltListltHashedObjgt gt theLists // The
    array of Lists
  • const HashedObj ITEM_NOT_FOUND

15
  • / Construct the hash table.
  • template ltclass HashedObjgt
  • HashTableltHashedObjgtHashTable( const HashedObj
    notFound, int size )
  • ITEM_NOT_FOUND(notFound), theLists(
    nextPrime( size ) )
  • / Make the hash table logically empty.
  • template ltclass HashedObjgt
  • void HashTableltHashedObjgtmakeEmpty( )
  • for( int i 0 i lt theLists.size( ) i )
  • theLists i .makeEmpty( )
  • / Deep copy.
  • template ltclass HashedObjgt
  • const HashTableltHashedObjgt HashTableltHashedObjgt
  • operator( const HashTableltHashedObjgt rhs )
  • if( this ! rhs )

16
  • / Remove item x from the hash table.
  • template ltclass HashedObjgt
  • void HashTableltHashedObjgtremove( const
    HashedObj x )
  • theLists hash( x, theLists.size( ) ) .remove(
    x )
  • / Find item x in the hash table.
  • template ltclass HashedObjgt
  • const HashedObj HashTableltHashedObjgt
  • find( const HashedObj x ) const
  • ListItrltHashedObjgt itr
  • itr theLists hash( x, theLists.size( ) )
    .find( x )
  • if( itr.isPastEnd( ) ) return ITEM_NOT_FOUND
  • else return itr.retrieve( )

17
  • / Insert item x into the hash table.
  • template ltclass HashedObjgt
  • void HashTableltHashedObjgtinsert( const
    HashedObj x )
  • ListltHashedObjgt whichList theLists hash( x,
    theLists.size( ) )
  • ListItrltHashedObjgt itr whichList.find( x )
  • if( itr.isPastEnd( ) )
  • whichList.insert( x, whichList.zeroth( ) )

18
Analysis
  • Let ? be load factor of a hash table
  • number of elements / TableSize
  • ? is the avarage length of a list
  • Successful Find ? ?/2 comparisons time to
    evaluate hash function
  • Unsuccessful Find Insert ? ? comparisons time
    to evaluate hash function
  • Good choise ? 1

Disadvantage of separate chaining is
allocate/deallocate memory !
19
Open Adressing
  • If collision ? try an alternate cell
  • h0(x), h1(x), h2(x),
  • hi(x) (hash(x) F(i)) mod TableSize
  • F(0) 0
  • ? lt 1
  • Good choise lt 0.5

20
Linear Probing
  • F is a linear function of i
  • F(i) i
  • Insert keys
  • 89, 18, 49, 58, 69
  • When 49 is inserted collision occurs
  • Put into the next available spot 0
  • 58 collidates with 18, 89, 49

21
Linear Probing
  • Problem It is not easy to delete an element
  • May have caused a collision before
  • Mark the element deleted
  • Problem Primary Clustering

22
Linear Probing
  • Analysis

Problem Primary Clustering
23
Quadratic Probing
  • F(i) is a quadratic function
  • Ex F(i) i2

24
Quadratic Probing
  • When 49 collides with 89, next position attemped
    is one cell away
  • 58 collides at position 8. The cell one away is
    tried, another collision occurs. It is inserted
    into the cell 224 away

25
Quadratic Probing
  • Solves primary clustering problem
  • All empty cells may not be accessed
  • A loop around full cells may happen
  • Hash table not full but empty space not found
  • Theorem If the table size is prime and ?lt0.5
    new element can always be inserted.
  • Problem Secondary clustering!...

26
Type declaration for open addressing hash table
  • template ltclass HashedObjgt
  • class HashTable
  • public
  • explicit HashTable(const HashedObj
    notFound,int size 101)
  • HashTable( const HashTable rhs)
  • ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND), array(
    rhs.array ), currentSize( rhs.currentSize )
  • const HashedObj find( const HashedObj x )
    const
  • void makeEmpty( )
  • void insert( const HashedObj x )
  • void remove( const HashedObj x )
  • const HashTable operator( const HashTable
    rhs )
  • enum EntryType ACTIVE, EMPTY, DELETED

27
Type declaration for open addressing hash table
  • private
  • struct HashEntry
  • HashedObj element
  • EntryType info
  • HashEntry( const HashedObj e HashedObj( ),
    EntryType i EMPTY ) element( e ), info(i)
  • vectorltHashEntrygt array
  • int currentSize
  • const HashedObj ITEM_NOT_FOUND
  • bool isActive( int currentPos ) const
  • int findPos( const HashedObj x ) const
  • void rehash( )

28
  • / Construct the hash table.
  • template ltclass HashedObjgt
  • HashTableltHashedObjgt
  • HashTable( const HashedObj notFound, int size )
  • ITEM_NOT_FOUND( notFound ), array( nextPrime(
    size ) )
  • makeEmpty( )
  • / Make the hash table logically empty.
  • template ltclass HashedObjgt
  • void HashTableltHashedObjgtmakeEmpty( )
  • currentSize 0
  • for( int i 0 i lt array.size( ) i )
  • array i .info EMPTY

29
  • / Find item x in the hash table.
  • template ltclass HashedObjgt
  • const HashedObj HashTableltHashedObjgt
  • find( const HashedObj x ) const
  • int currentPos findPos( x )
  • if( isActive( currentPos ) )
  • return array currentPos .element
  • else return ITEM_NOT_FOUND
  • / Method that performs quadratic probing
    resolution.
  • template ltclass HashedObjgt
  • int HashTableltHashedObjgtfindPos(const HashedObj
    x) const
  • int collisionNum 0
  • int currentPos hash( x, array.size( ) )
  • while ( array currentPos .info ! EMPTY
  • array currentPos .element ! x )
  • currentPos 2 collisionNum - 1

30
  • / Return true if currentPos exists and is
    active.
  • template ltclass HashedObjgt
  • bool HashTableltHashedObjgtisActive( int
    currentPos ) const
  • return array currentPos .info ACTIVE
  • / Remove item x from the hash table.
  • template ltclass HashedObjgt
  • void HashTableltHashedObjgtremove( const
    HashedObj x )
  • int currentPos findPos( x )
  • if( isActive( currentPos ) )
  • array currentPos .info DELETED
  • / Insert routine with quadratic probing
  • template ltclass HashedObjgt
  • void HashTableltHashedObjgtinsert( const
    HashedObj x )

31
  • / Deep copy.
  • template ltclass HashedObjgt
  • const HashTableltHashedObjgt HashTableltHashedObjgt
  • operator( const HashTableltHashedObjgt rhs )
  • if( this ! rhs )
  • array rhs.array
  • currentSize rhs.currentSize
  • return this

32
Double Hashing
  • Use second hash function
  • F(i) i hash2(x)
  • Poor example
  • hash2(x) X mod 9
  • hash1(x) X mod 10
  • TableSize 10
  • If X 99 what happens ?
  • hash2(x) ? 0 for any X

33
Double Hashing
  • Good choise
  • hash2(x) R (X mod R)
  • R is a prime and lt TableSize

34
Double Hashing
hash2(x) 7 (X mod 7)
35
Analysis
  • Random collision resolution
  • Probes are independent
  • No clustering problem
  • Unsuccessful search and Insert
  • Number of probes until an empty cell is found
  • (1- ?) fraction of cells that are empty
  • 1 / (1- ?) expected number of probes
  • Successful search
  • P(X)Number of probes when the element X is
    inserted
  • 1/N? P(X) approximately

36
Rehashing
  • If ? gets large, number of probes increases.
  • Running time of operations starts taking too long
    and insertions might fail
  • Solution Rehashing with larger TableSize
    (usually 2)
  • When to rehash
  • if ? gt 0.5
  • if insertion fails

37
Rehashing Example
  • Elements 13, 15, 24 and 6 is inserted into an
    open addressing hash table of size 7
  • H(X) X mod 7
  • Linear probing is used to resolve collisions

38
Rehashing Example
  • If 23 is inserted, the table is over 70 percent
    full.

?
A new table is created 17 is the first
prime twice as large as the old one so Hnew
(X) X mod 17
39
Rehashing
  • Rehashing is an expensive operation
  • Running time is O(N)
  • Rehashing frees the programmer from worrying
    about table size
  • Amortized Analysis Average over N operations
  • Operations take O(1) time

40
  • / Insert routine with quadratic probing
  • template ltclass HashedObjgt
  • void HashTableltHashedObjgtinsert( const
    HashedObj x )
  • int currentPos findPos( x )
  • if( isActive( currentPos ) ) return
  • array currentPos HashEntry( x, ACTIVE )
  • if( currentSize gt array.size( ) / 2 )
  • rehash( )
  • / Expand the hash table.
  • template ltclass HashedObjgt
  • void HashTableltHashedObjgtrehash( )
  • vectorltHashEntrygt oldArray array
  • array.resize( nextPrime( 2 oldArray.size( ) )
    )
  • for( int j 0 j lt array.size( ) j )
  • array j .info EMPTY
Write a Comment
User Comments (0)
About PowerShow.com