Title: Welcome to CIS 068 !
1Welcome to CIS 068 !
Lesson 10 Data Structures
2Overview
- Description, Usage and Java-Implementation of
- Collections
- Lists
- Sets
- Hashing
3Definition
- Data Structures
- Definition (www.nist.gov)
- An organization of information, usually in
memory, for better algorithm efficiency, such as
queue, stack, linked list, heap, dictionary, and
tree, or conceptual unity, such as the name and
address of a person.
4Efficiency
- An organization of information for better
algorithm efficiency... - Isnt the efficiency of an algorithm defined by
the order of magnitude O( )?
5Efficiency
- Yes, but it is dependent on its implementation.
6Introduction
- Data structures define the structure of a
collection of data types, i.e. primitive data
types or objects - The structure provides different ways to access
the data - Different tasks need different ways to access the
data - Different tasks need different data structures
7Introduction
- Typical properties of different structures
- fixed length / variable length
- access by index / access by iteration
- duplicate elements allowed / not allowed
8Examples
- Tasks
- Read 300 integers
- Read an unknown number of integers
- Read 5th element of sorted collection
- Read next element of sorted collection
- Merge element at 5th position into collection
- Check if object is in collection
9Examples
- Although you can invent any datastructure you
want, there are classic structures, providing - Coverage of most (classic) problems
- Analysis of efficience
- Basic implementation in modern languages, like
JAVA
10Data Structures in JAVA
- Lets see what JAVA has to offer
11The Collection Hierarchy
- Collection top interface, specifying
requirements for all collections
12Collection Interface
13Collection Interface
!
14Iterator Interface
- Purpose
- Sequential access to collection elements
- Note the so far used technique of sequentially
accessing elements by sequentially indexing is
not reasonable in general (why ?) ! - Methods
15Iterator Interface
- Iterator points between the elements of
collection
1
2
3
4
5
first position, hasNext() true, remove() throws
error
Returned element
Current position (after 2 calls to next()
), remove() deletes element 2
Position after next()
hasNext() false
16Iterator Interface Usage
Typical usage of iterator
17Back to Collections
AbstractCollection
18AbstractCollection
- Facilitates implementation of Collection
interface - Providing a skeletal implementation
- Implementation of a concrete class
- Provide data structure (e.g. array)
- Provide access to data structure
19AbstractCollection
- Concrete class must provide implementation of
Iterator - To maintain abstract character of data in
AbstractClass implemented (non abstract) methods
use Iterator-methods to access data
myCollection
AbstractCollection
implements Iterator int data Iterator
iterator() return this hasNext()
add() Iterator iiterator() Clear() Iterato
r iiterator()
20Back to Collections
List Interface
21List Interface
- Extends the Collection Interface
- Adds methods to insert and retrieve objects by
their position (index) - Note Collection Interface could NOT specify the
position - A new Iterator, the ListIterator, is introduced
- ListIterator extends Iterator, allowing for
bidirectional traversal (previousIndex()...)
22List Interface
Incorporates index !
A new Iterator Type (can move forward and backwar
d)
23Example Selection-Sorting a List
Part 1 call to selection sort Actual
implementation of List does not matter ! Call
to SelectionSort Use only Iterator-properties
of ListIterator (upcasting)
24Example Selection-Sorting a List
Part 2 Selection sort access at index
fill Inner loop swap
25Back to Collections
AbstractList ...again the implementation of some
methods... Note Still ABSTRACT !
26Concrete Lists
ArrayList and Vector at last concrete
implementations !
27ArrayList and Vector
- Vector
- For compatibility reasons (only)
- Use ArrayList
- ArrayList
- Underlying DataStructure is Array
- List-Properties add advantage over Array
- Size can grow and shrink
- Elements can be inserted and removed in the
middle
28An Alternative Implementation (1)
29An Alternative Implementation (2)
30An Alternative Implementation (3)
31Collections
- The underlying array-datastructure has
- advantages for index-based access
- disadvantages for insertion / removal of middle
elements (copy), insertion/removal with O(n) - Alternative linked lists
32Linked List
- Flexible structure, providing
- Insertion and removal from any place in O(1),
compared to O(n) for array-based list - Sequential access
- Random access at O(n), compared to O(1) for
array-based list
33Linked List
- List of dynamically allocated nodes
- Nodes arranged into a linked structure
- Data Structure node must provide
- Data itself (example the bead-body)
- A possible link to another node (ex. the link)
Childrens pop-beads as an example for a linked
list
34Linked List
Old node
New node
next
next
(null)
35Connecting Nodes
creating the nodes
connecting
36Inserting Nodes
r
p.link r r.link q q can be accessed by
p.link.link
37Removing Nodes
p
q
38Traversing a List
(null)
39Double Linked Lists
Single linked list Double linked list
(null)
(null)
data
data
data
(null)
successor
successor
successor
predecessor
predecessor
predecessor
(null)
40Back to Collections
AbstractSequentialList and LinkedList
41LinkedList
An implementation example See textbook
42Sets
Example task Examine, collection contains object
o Solution using a List -gt O(n) operation !
43Sets
- Comparison to List
- Set is designed to overcome the limitation of
O(n) - Contains unique elements
- contains() / remove() operate in O(1) or O(log n)
- No get() method, no index-access...
- ...but iterator can (still) be used to traverse
set
44Back to Collections
Interface Set
45Hashing
How can method contain() be implemented to be
an O(1) operation ? http//ciips.ee.uwa.edu.au/m
orris/Year2/PLDS210/hash_tables.html
46Hashing
- How can method contain() be implemented to be
an O(1) operation ? - Idea
- Retrieving an object of an array can be done in
O(1) if the index is known - Determine the index to store and retrieve an
object by the object itself !
47Hashing
- Determine the index ... by the object itself
- Example
- Store Strings Apu, Bob, Daria as Set.
- Define function H String -gt integer
- Take first character, A1, B2,...
- Store names in String array at position H(name)
48Hashing
Apu first character A H(A)
1 Bob first character B H(B)
2 Daria first character D H(D) 4 ...
Apu
Bob
(unused)
Daria
(unused)
49Hashing
- The Function H(o) is called the HashCode of the
object o - Properties of a hashcode function
- If a.equals(b) then H(a) H(b)
- BUT NOT NECESSARILY VICE VERSA
- H(a) H(b) does NOT guarantee a.equals(b) !
- If H() has sufficient variation, then it is
most likely, that different objects have
different hashcodes
50Hashing
- Additionally an array is needed, that has
sufficient space to contain at least all
elements. - The hashcode may not address an index outside the
array, this can easily be achieved by - H1(o) H(o) n
- modulo-function, n array length
- The larger the array, the more variates H1() !
Apu
Bob
(unused)
Daria
(unused)
51Hashing
Back to the example Insert Abe First
character A H(A) 1 H(Apu) H(Abe), this is
called a Collision
Apu
Bob
(unused)
Daria
(unused)
52Solving Collisions
Method 1 Dont use array of objects, but arrays
of linked lists !
Apu
Abe
Bob
(unused)
Daria
Array contains (start of) linked lists
(unused)
ARRAY
53Solving Collisions
- Drawback
- Objects must be wrapped in node structure, to
provide links, introducing a huge overhead
wrap
Apu
Apu
link
Node
54Solving Collisions
- Method 2
- Iteratively apply different hashcodes H0, H1,
H2,.. to object o, until collision is solved - As long as the different hashcodes
- are used in the same order, the
- search is guaranteed to be
- consistent
Apu
H0
Bob
Apu
H1
(unused)
H2
Daria
(unused)
ARRAY
55Solving Collisions
The easiest hashcode-series Hinc H(0) H Hi
Hi-1 i http//ciips.ee.uwa.edu.au/morris/Ye
ar2/PLDS210/hash_tables.html
Apu
H0
H1
Bob
Apu
(unused)
H2
Daria
(unused)
ARRAY
56add
Example implementation of add(Object o) using
Hinc (assume array A has length n, H as given
above) determine index H(o) n while (
Aindex ! null ) if o.equals(Aindex)
break else index (index 1)
n end add element at position
aindex
57contains
Example implementation of contains(Object o)
using Hinc (assume array A has length n, H as
given above) determine index H(o)
n found false while ( Aindex ! null
) if o.equals(Aindex) found
true break else index (index
1) n end // found is true if
set contains object o
58Analysis
- If there is no collision, contains() operates in
O(1) - If the set contains elements having the same
hashcode, there is a collision. Being dupmax the
maximum value of elements having the same hash
code, contains() operates in O(dupmax) - If dupmax is near n, there is no increase in
speed, since contains() operates in O(n)
59A Real Hashcode
- JAVA provides a hashcode for every object
- The implementation for hashCode for e.g. String
is computed by - S031(n-1) s131(n-2) ... sn-1
- n length of string, si character at
position i
Method hashCode in java.lang.Object
60Rehashing a table
- What happens if the array is full ?
- Create new array, e.g. double size, and insert
all elements of old table into new table - Note the elements wont keep their index, since
the modulo-function applied to the hashing has
changed !
61Hashcode Resume
- Hashtable provides Set-operations add(),
contains() in O(1) if hashcode is chosen properly
and array allows for sufficient variation - Speed is gained by usage of more memory
- If multiple collisions occur, hashtable might be
slower than list due to overhead (computation of
H,...)