Data Structures and Algorithms - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Data Structures and Algorithms

Description:

Data Structures and Algorithms * * * * * * * * * * * * * * * * Algorithmic Concepts Greedy algorithms always makes the choice that looks best at the moment ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 41
Provided by: KevinPe5
Category:

less

Transcript and Presenter's Notes

Title: Data Structures and Algorithms


1
Data Structures and Algorithms
2
XKCD
3
Outline
  • Analyzing algorithms
  • Designing Algorithms
  • Profiling
  • Heuristics
  • Ex) hash-based sequence alignment

4
Insertion Sort Pseudocode(review conventions
arrays, indentation, loops, logic, etc.)
  • Input array A0..n-1
  • Insertion-Sort(A)
  • 1 for j 1 to lengthA-1
  • 2 key A j
  • 3 //insert A j into the sorted sequence
    A0..j-1
  • 4 i j-1
  • 5 while i gt -1 and Ai gt key
  • 6 Ai1 Ai
  • 7 i i -1
  • 8 Ai1 key

for loop convention iterative or counting
while loop convention do while expression
is true
Cormen, Intro to Algs.
5
Insertion Sort
  • Design algorithm (as opposed to
  • Bubble Sort)
  • 2) Implement algorithm

Left of key is sorted Right of key is unsorted
  • 1 for j 1 to lengthA-1
  • 2 key A j
  • //insert A j into the sorted sequence
    A0..j-1
  • 4 i j-1
  • 5 while i gt -1 and Ai gt key
  • 6 Ai1 Ai
  • 7 i i -1
  • 8 Ai1 key

6
Analyzing Algorithms
  • predicting resources that an algorithm requires
  • memory
  • communication bandwidth
  • logic gates
  • computational time (most often measured)
  • In other words, how many steps does Insertion
    Sort take to complete???

7
Analyzing Insertion Sort
  • time taken by Insertion Sort depends on input
    1000 takes longer than 3 numbers
  • can take different amounts of time to sort 2
    input sequences of the same size -- Why?
  • in general, the time taken by an algorithm grows
    with the size of the input
  • describe the running time of a program as
    function of the size of input

8
Analyzing Insertion Sort
  • running time
  • function of number of steps executed
  • assume a constant amount of time is required to
    execute each line of pseudocode

9
Analyzing Insertion Sort
  • Insertion-Sort(A) cost time
  • 1 for j 1 to lengthA-1 c1 n
  • 2 key A j c2 n-1
  • 3 //insert A j 0
  • 4 i j-1 c4 n-1
  • 5 while i gt -1 and Ai gt key c5 ? nj0 tj
  • 6 Ai1 Ai c6 ? nj0 tj -1
  • 7 i i -1 c7 ? nj0 tj -1
  • 8 Ai1 key c8 n-1
  • ? nj0 tj n(n1)/2 -1

Algorithms, Cormen
10
Analyzing Insertion Sort
  • T(n) c1n c2(n-1) c4(n-1) c5 (n(n1)/2 -1)
    c6(n(n-1)/2) c7(n(n-1)/2) c8(n-1)
  • (c5/2 c6/2 c7/2)n2 (c1c2c4c5/2 - c6/2
    - c7/2 c8)n - (c2 c4c5c8)
  • k1n2 k2n -k3
  • ?(n2) asymptotic upper and lower bound
  • O(n2) asymptotic upper bound
  • Typical complexities
  • O(1) lt O(logn) lt O(n) lt O(nlogn) lt O(n2) lt O(n3)
    lt O(2n)

Linear time
11
Insertion Sort Observations
  • numbers are sorted in place
  • numbers are rearranged within the array, with at
    most a constant number of them stored outside of
    the array at any time
  • Run time depends on input
  • the number of operations for the following 3 sets
    will vary greatly due to level of
    pre-sortedness
  • a descending sorted order will actually take more
    operations than a random ordering
  • algorithm complexity analysis allows us to place
    upper and lower asymptotic bounds for comparison

3 5 6 7 9 8 10 15 20 30 69
3 5 6 7 9 8 10 15 1 12 20
20 12 15 10 9 8 7 6 5 3 1
12
Insertion Sort Profile
  • A 3 2 1
  • Step j key i Ai Ai1
  • 1 1
  • 2 1 2
  • 3 1 2 0 3 2
  • 4 0 gt -1 and 3 gt 2 ? true
  • 1 2 0 3 3
  • A 3 3 1
  • 6 1 2 -1 Undef 3
  • 4 -1 gt -1 and Undef gt 2 ? false
  • 7 1 2 -1 Undef 2
  • A 2 3 1

1 for j 1 to lengthA-1 2 key A j 3
i j-1 4 while i gt -1 and Ai gt key 5
Ai1 Ai 6 i i -1 7 Ai1
key
13
What about naïve.pl?
snt array of subject nucleotides qnt
array of query nucleotieds for i 0 to
length(subject) length(query) j0 while
(snti j qntj) jj1 if (j
length (query)) found sequence at
position i end
c1 c2 c3 n c4 n c5 n n c6 n n c7 n
n c8 n n ?O( n2)
14
Definitions
  • procedural programming languages tend to be
    action oriented (as opposed to Object Oriented
    Programming OOP)
  • subroutine a collection of high-level
    programming language operations procedure
    (Pascal did not return a value)
  • function (Pascal did return a value)

15
Machine Instructions At the lowest level, every
program consists of primitive machine
instructions. move.L D0, 20004 Language
Statements High-level languages consist of
statements that perform one or more machine
instructions. i k 9 Subroutines
Subroutines consist of groups of language
statements. sequence print_formated_sequence(_at_
qnts,i) Programs Programs consist of groups
of subroutines
C a s/w engineering approach, Darnell
16
Subroutines
  • programs are developed with layers of functions
  • lower-level functions perform simple operations
  • higher-level functions are created from
    lower-level functions
  • analogous to abbreviations for long and
    complicated sets of commands
  • defined once, but invoked many times
  • ease of change
  • modular and re-usable
  • enhanced reliability (complicated tasks broken
    into simpler ones)
  • improved readability
  • with low-level details of algorithm
    compartmentalized, an algorithm may be easier to
    read, understand, and modify
  • good rule of thumb if your subroutine spans
    more than 1 printed page, I would expect at least
    1 bug

17
Bioinformatics example
  • Optimal sequence alignment (allowing for gaps and
    substitutions in either query or subject sequence)

18
Heuristics
  • What do you do when faced with an NP-complete
    problem, or problem size where algorithm takes
    too long?
  • Example want to compare 2 genomes (brute force)
  • naïve.pl O( n2)
  • 31093109 110-9 S 9 109 S 285 years
  • Alternative
  • hash k-tuples of nucleotides to a number, and
    compare numbers

19
Hash-Based Alignment
  • base-10 numbers
  • 5805 5103 8102 01015100
  • k8
  • ATGCCTGGGCT
  • A0, C1, G2, and T3 (base 4 number)
  • ATGCCTGG 04734624514414334224
    1240 14714
  • Now we can compare chunks of sequence much
    faster - speed increase by factor of 8
  • Can pre-compute hashes for entire genome, and
    only compare hashes
  • Premise for popular alignment tools BLAST,
    BLAT,and UIcluster

20
Heuristics
  • Usually a trade off
  • In sequence hashing example
  • accuracy is traded for speed
  • you cannot match/find sequences shorter than 8
    nucleotides
  • How do you find optimal k-tuple?
  • depends on question
  • empirically

21
End
22
Overly simple example of compartmentalizing
  • Count the number of nucleotides in a file.
  • open file
  • while there is more sequence
  • read a nucleotide
  • increment count
  • print nt count
  • close file

23
Another Example (divide and conquer)
  • Find the average intron size for all human genes
  • Get human genome
  • Get genes
  • Find indices of exons/introns
  • Size index2 - index1
  • Tabulate and average

24
Recursion
  • recursion partially consists or is defined in
    terms of itself
  • examples
  • mirrors
  • video camera of television
  • factorial function for non-negative integers
  • n!
  • a) 0! 1
  • b) if n gt0, then n! n(n-1)!
  • 3! 3(2)! 3(2(1!)) 3(2(1(0!))) 3(2(1(1)))
    6

25
Recursion
  • power is in ability to define an infinite set of
    objects by a finite statement
  • tool for expressing a program recursively is the
    subroutine (procedure/function)
  • directly recursive subroutine P contains
    reference to itself
  • indirectly recursive P contains reference to
    another subroutine Q, which contains a (direct or
    indirect) reference to P

26
!/usr/bin/perl simple example perl program
to calculate the factorial of a number using
(gasp) "Recursion" BUG found print "Enter
integer number to determine factorial" iltSTDIN
gt get number chomp(i) remove
"newline" i int i removes any
decimals if(i lt 0) die("Error input
must be positive integer") j
Fact(i) print "(i)!" print "j\n" end
of program sub Fact()
my num shift How can num be N, and
then N-1, then N-2, etc.???? print "num
num\n" new_num num-1 if(new_num
0) return(1) else
fact num Fact(new_num)
return(fact)
27
Program Iterations or Profile
n5 tabraun_at_texas fact ./fact-test.pl Enter
integer number to determine factorial5 num
5 num 4 num 3 num 2 num 1 (5)!120
28
Recursion
  • Cut and paste () here
  • Cut and paste (Cut and paste () here) here
  • Cut and paste (Cut and paste (Cut and paste ()
    here) here) here
  • Etc.

29
Variable Scope
  • global variables variables that are
    accessible/visible from any part of a program
  • local variables accessible to a limited portion
    of the program
  • ensures that variables are not unintentionally
    manipulated
  • perl
  • variables are always global unless you specify
    otherwise
  • my variable_name specifies a local variable
  • scope usually refers to blocks of code
  • for loop
  • while loop from insertion sort
  • example scope.pl

30
!/usr/bin/perl i 5 print "ii\n"
print "ii\n" i 3 print "ii\n"
print "ii\n"
31
Can we do better than Insertion Sort O(n2)?
  • Merge-Sort(A,p,r)
  • 1 if p lt r
  • 2 q (pr)/2
  • 3 Merge-Sort(A,p,q)
  • 4 Merge-Sort(A,q1,r)
  • 5 Merge(A,p,q,r)
  • 6 return
  • Divide and conquer example -- sorted array of
    length 1 is already sorted.

32
Merge-Sort Split Steps
5 2 4 6 1 3 2 6
5 2 4 6
1 3 2 6
1 3
2 6
5 2
4 6
6
2
3
1
2
4
5
6
Merge-sort changes the problem from one of
sorting numbers, to one of simply combining
stacks of numbers that are already sorted. At
the leaf level of this graph, individual numbers
are already sorted (because a single number by
itself is sorted).
33
Merge-Sort Analysis
O(nlog2n) -- can we do better???
34
Algorithmic Concepts
  • Greedy algorithms
  • always makes the choice that looks best at the
    moment.
  • makes a locally optimal choice in the hope that
    this choice will lead to a globally optimal
    solution
  • do not always yield optimal solutions

35
Knapsack Problem
  • A thief finds n items
  • item i is worth vi dollars, and weighs wi pounds,
    where vi and wi are integers
  • thief wants to maximize value, but is limited to
    W pounds
  • What items should thief take?

36
Knapsack
37
NP-Completeness and NP-Hard
  • polynomial-time algorithms
  • naïve, insertion-sort, merge-sort, fact
  • O(nk)
  • Can all problems be solved in polynomial time?
  • no
  • this class of problem is called NP-Complete
  • these problems are intractable
  • valuable to know when a problem is NP-Complete so
    that you do not waste time attempting to develop
    a solution
  • approach is to look for approximation of solution

38
NP-Complete example
  • Traveling-salesman problem
  • a salesman must visit N cities
  • wants to visit every city exactly once
  • wants to minimize travel distance

39
Dynamic Programming
  • like divide-and-conquer, DP solves problems by
    combining the solutions of subproblems
    (Progamming refers to a tabular method, NOT
    writing computer code.)
  • D and C generally have independent subproblems
  • DP is most applicable when subproblems are not
    independent - i.e., D and C does more work than
    necessary, repeatedly solving the common
    subsubproblems
  • DP solves each subsubproblem once, and saves the
    results in a table.
  • work is avoided since the answer does not have to
    be recomputed every time the subsubproblem is
    encountered
  • Example
  • global sequence alignment -- Smith-Waterman

40
Assignment Debugging naïve.pl
  • Due
  • bug exists in algorithm
  • find input scenario where algorithm breaks
  • Assignment 1
  • Obtain naïve.pl (web)
  • Execute it (with your perl)
  • Alter input to determine bug
  • Submit a version of the program with the 2 input
    lines that illustrate the bug
  • _at_snt (A, A, . )
  • _at_qnt (A, T, .)
  • Describe a solution by inserting comments into
    the program
  • Submit your altered and commented program to icon
Write a Comment
User Comments (0)
About PowerShow.com