Title: Outline
1Outline
Outline
- Quiz 4
- Lab 1 (Quiz 3) Solution
- Scoping
- Algorithm efficiency
- Sorting
- Hashes
- Review for midterm
- Programming assignment 3
2Lab 1 a More Elegant Tester
Lab1
- !/usr/bin/perl
- use strict
- use warnings
- File regex_tester.pl
- Author Jim Logan
- Date 6 October 2009
-
- Fully interactive version (i.e., no recompiles
required) a regular expression - tester based on a script by Fernando J. Pineda
as presented to - class of BINF623 by Jeff Solka on 10/5/09.
- Particularly useful in an Eclipse environment
using its cut and paste facility. - instructions for use
- print "\nAccepts keyboard entry of a regular
expression and then permits\n" - print "successive entry of strings to test that
expression.\n" - print "Square brackets in output indicate the
text that matched pattern\n\n" - print "Note Depending upon the environment
(e.g. Eclipse), you may be\n" - print "able to cut and paste into both the \"Next
expression\" and the\n"
- initialization
- my regex '/./' default regex to
start and to demonstrate - my string 'This is a test string'
- my input ""
- my stripped_regex ""
- while (1) outer loop to sequence
regular expressions - print "\nCurrent regular expresssion
regex\n" - print "Enter a new expression to change or
ENTER to continue without change.\n" - print "(\"quit\" terminates the
program)\n" - print "New expression "
- input ltSTDINgt
- chomp input
- if (input /q./i) exit
- if (input ! //)
- regex input
-
3Lab 1 a More Elegant Tester
Lab1
- stripped_regex substr (regex, 1, length
(regex) -2) - User includes the two slashes for a
regular expresssion - but they are stripped here so that
variable is just the pattern - that will be interpolated in /pattern/
context. - while (1) inner loop to sequence
strings to test the expression - print "\nCurrent test string
string\n" - print "Enter a new expression to
change or ENTER to reset the regex.\n" - print "New test string "
- input ltSTDINgt
- chomp input
- if (input //)
for blank line, go back to set expresssion - last
-
- else
- string input
else run regex over input -
- if( string /stripped_regex/)
- print("\'\n")
show match in context of input - else
- print("no match\n")
-
-
-
- exit never used
4Lab 1 Solution
Lab1
Lab1
- What is a pattern that matches the substring
world occurring - anywhere in the input string, e.g.
- hello cold cruel world
- hello world news tonight
- helloworld.pl is a script
- Solution
- /world/
- 2. What is a pattern that matches the
- word world occurring anywhere in
- the input string, e.g.
- hello cold cruel world
- hello world news tonight
- but not
- helloworld.pl is a script
- Solution
- /\bworld\b/
5Lab 1 Solution
Lab1
- 3. What is a pattern that matches the
- word world only if occurs at the end
- of the string, i.e
- hello cold cruel world
- but not
- next is world news tonight
- hello cold cruelworld
- Solution
- /\bworld\b/
- 4. What is a pattern that matches a
- string that starts with the word hello
- OR ends in the word world, e.g.
- hello and good night
- thats all for tonight world
- Solution
- /\bhello\b\bworld\b/
6Lab 1 Solution
Lab1
- 5. What is a pattern that matches a
- string that starts with the word hello
- OR bye, AND ends with the word
- world, e.g.
- bye cold cruel world
- hello cold cruel world
- but not
- hello cold cruel world?
- hello cold cruelworld
- Solution
- /\b(hellobye)\b.\bworld\b/
- 6. What is a pattern that matches a
- substring world occurring 1 or more
- times at
- the end of the line, e.g.
- This string ends in world
- This string ends in worldworld
- This string ends in worldworldworld
- Solution
- /(world)/
7Lab 1 Solution
Lab1
- 7. What is a pattern that matches one
- or more of backslashes immediately
- Followed by one or more asterisks, e.g.
- \\\\
- but not
- \\\\\
- Solution
- /\\\/
8Lab 1 Solution
Lab1
- 8. What is a pattern that matches any line of
input - that has the same word repeated
- two or more times in a row. In this problem,
words - can be considered to be
- sequences of letters a to z, A to Z, digits, and
- underscores. Whitespace between
- words may differ, e.g.
- Paris in the the spring
- I thought that that was the problem
- For this example you will need to use
backreferences. A - backreference is a reference to a string captured
with - parentheses. (Recall that in Perl, captured
- strings are referred to as 1,,9) In a regular
expression, - you can refer to captured strings, while the
pattern is being - matched, as \1,\9. For example,
- /(AT)G(\1)/ matches a 5 character string ATGAT.
- Note Strictly speaking the inclusion of
backreferences makes
- Solution
- /\b(\S)\b(\s\1\b)/
- Understanding this
- \b start at a word boundary (begin letters)
- (\S) find chunk of nonwhite space
- \b until another word boundary (end letters)
- (\s separated by some white space
- \1 and that very same chunk again
- \b) until another word boundary
- one or more sets of these
9Be Careful With Scope
Scoping
- !/usr/bin/perl
- use strict
- use warnings
- my x 23
- print "value in main body is x \n"
- mysub(x)
- print "value in main body is x \n"
- exit
- sub mysub
- print "value in subroutine is x \n"
- x33
-
- !/usr/bin/perl
- use strict
- use warnings
-
- my x 23
- print "value in main body is x \n"
- mysub(x)
- print "value in main body is x \n"
- exit
-
- sub mysub
- print "value in subroutine is x \n"
- x33
-
10Be Careful With Scope (cont.)
Scoping
- !/usr/bin/perl
- use strict
- use warnings
-
- my x 23
- print "value in main body is x \n"
- mysub(x)
- print "value in main body is x \n"
- exit
-
- sub mysub
- my(x) _at__
- x33
- print "value in subroutine is x \n"
- value in main body is 23
- value in subroutine is 33
- value in main body is 23
11Data Structures and Algorithm Efficiency
Algorithm Efficiency
Algorithm is O(N2)
- An inefficient way to compute intersections
- my _at_a qw/ A B C D E F G H I J K X Y Z /
- my _at_b qw/ Q R S A C D T U G H V I J K X Z /
- my _at_intersection ()
- for my i (_at_a)
- for my j (_at_b)
- if (i eq j)
- push _at_intersection, i
- last
-
-
-
- print "_at_intersection\n"
- exit
- Output
N size of Lists
12Data Structures and Algorithm Efficiency
Algorithm Efficiency
-
- A better way to compute intersections
- my _at_a qw/ A B C D E F G H I J K X Y Z /
- my _at_b qw/ Q R S A C D T U G H V I J K X Z /
- my _at_intersection ()
- "mark" each item in _at_a
- my mark ()
- for my i (_at_a) marki 1
- intersection any "marked" item in _at_b
- for my j (_at_b)
- if (exists markj)
- push _at_intersection, j
-
-
- print "_at_intersection\n"
- exit
version 1
version 2
13Demonstration
Algorithm Efficiency
- Unix commands
- /usr/bin/time
- head
- diff
- cmp
- wc -l list1 list2
- 24762 list1
- 12381 list2
- 37143 total
- /usr/bin/time intersect1.pl list1 list2 gt out1
- 22.91 real 22.88 user 0.02
sys - /usr/bin/time intersect2.pl list1 list2 gt out2
- 0.06 real 0.05 user 0.00
sys - 22.88/.05 458
14Hashes and Efficiency
Hashes
- Hashes provide a very fast way to look up
information associated with a set of scalar
values (keys) - Examples
- Count how many time each word appears in a file
- Also whether or not a certain work appeared in a
file - Count how many time each codon appears in a DNA
sequence - Whether a given codon appears in a sequence
- How many time an item appears in a given list
- Intersections
15Examples
Hashes
- Write a subroutine get_intersection(\_at_a, \_at_b)
that returns the intersection of two lists. - Write a subroutine first_list_only(\_at_a, \_at_b) that
returns the items that are in list _at_a but not in
_at_b. - Write a subroutine unique(_at_a) that return the
unique items in list _at_a (that is, remove the
duplicates). - Write a subroutine dups(n, _at_a) that returns a
list of items that appear in _at_a at least n
times.
16Sorting
Sorting
- sort LIST -- returns list sorted in string order
- sort BLOCK LIST -- compares according to BLOCK
- sort USERSUB LIST -- compares according
subroutine SUB
17Sorting Our First Attempt
Sorting
- !/usr/bin/perl
- use strict
- use warnings
-
- my(_at_unsorted) (17, 8, 2, 111)
- my(_at_sorted) sort _at_unsorted
- print "_at_unsorted \n"
- print "_at_sorted \n"
- exit
- Output
- 17 8 2 111
- 111 17 2 8
18The Comparison Operator
Sorting
- 1. a ltgt b returns 0 if equal, 1 if a gt b,
-1 if a lt b -
- 2. The "cmp" operator gives similar results for
strings -
- 3. a and b are special global variables
- do NOT declare with "my" and do NOT modify.
19Sorting Numerically
Sorting
- !/usr/bin/perl
- use strict
- use warnings
-
- my(_at_unsorted) (17, 8, 2, 111)
- my(_at_sorted) sort a ltgt b _at_unsorted
- print "_at_unsorted \n"
- print "_at_sorted \n"
- exit
- Output
- 17 8 2 111
- 2 8 17 111
20Sorting Using a Subroutine
Sorting
- !/usr/bin/perl
- use strict
- use warnings
-
- my(_at_unsorted) (17, 8, 2, 111)
- my(_at_sorted) sort numerically _at_unsorted
- print "_at_unsorted \n"
- print "_at_sorted \n"
- exit
-
- sub numerically a ltgt b
- Output
- 17 8 2 111
- 2 8 17 111
21Sorting Descending
Sorting
- !/usr/bin/perl
- use strict
- use warnings
-
- my(_at_unsorted) (17, 8, 2, 111)
- my(_at_reversesorted) reverse sort numerically
_at_unsorted - print "_at_unsorted \n"
- print "_at_reversesorted \n"
- exit
-
- sub numerically a ltgt b
- Output
- 17 8 2 111
- 111 17 8 2
22Sorting DNA by Length
Sorting
- !/usr/bin/perl
- use strict
- use warnings
-
- Sorting strings
- my _at_dna qw/ TATAATG TTTT GT CTCAT /
- Sort _at_dna by length
- _at_dna sort length(a) ltgt length(b) _at_dna
- print "_at_dna\n" Output GT TTTT CTCAT TATAATG
- exit
- Output
- GT TTTT CTCAT TATAATG
23Sorting DNA by Number of Ts (Largest First)
Sorting
- !/usr/bin/perl
- use strict
- use warnings
-
- Sorting strings
- my _at_dna qw/ TATAATG TTTT GT CTCAT /
- _at_dna sort (b tr/Tt//) ltgt (a tr/Tt//)
_at_dna - print "_at_dna\n" Output TTTT TATAATG CTCAT GT
- exit
- Output
- TTTT TATAATG CTCAT GT
24Sorting DNA by Number of Ts (Largest First)
(Take 2)
Sorting
- !/usr/bin/perl
- use strict
- use warnings
-
- Sorting strings
- my _at_dna qw/ TATAATG TTTT GT CTCAT /
- _at_dna reverse sort
- (a tr/Tt//) ltgt (b tr/Tt//) _at_dna
- print "_at_dna\n" Output TTTT TATAATG CTCAT GT
- exit
- Output
- TTTT TATAATG CTCAT GT
25Sorting Strings Without Regard to Case
Sorting
- !/usr/bin/perl
- use strict
- use warnings
-
- Sort strings without regard to case
- my(_at_unsorted) qw/ mouse Rat HUMAN eColi /
- my(_at_sorted) sort lc(a) cmp lc(b)
_at_unsorted - print "_at_unsorted \n"
- print "_at_sorted \n"
- exit
- Output
- mouse Rat HUMAN eColi
- eColi HUMAN mouse Rat
26Sorting Hashes by Value
Sorting
- !/usr/bin/perl
- use strict
- use warnings
-
- my(sales_amount) ( autogt100, kitchengt2000,
hardwaregt200 ) - sub bysales sales_amountb ltgt
sales_amounta - for my dept (sort bysales keys sales_amount)
- printf "s\t4d\n", dept, sales_amountdept
-
- exit
- Output
- kitchen2000
- hardware 200
- auto 100
27Review for Midterm BINF634
Midterm
- Material
- Tisdall Chapters 1-9
- Wall Chapter 5
- Lecture notes
- The exam will be open book and notes
- You cannot work together on it
- You cannot use outside material
- You will have the full period to take the midterm
- You will be asked to program
28Some Example Questions
Midterm
- Given two DNA fragments contained in DNA1 and
DNA2 how can we concatenate these to make a
third string DNA3?
DNA3 DNA1DNA2
29Some Example Questions
Midterm
- What does this line of code do?
- RNA s/T/U/ig
Substitute Ts with Us in a case insensitive
manner globally within the string RNA
30Some Example Questions
Midterm
- What does this statement do?
- revcom tr/ACGT/TGCA/
It performs the mapping A ? T C ? G G ? C T ?
A all at once
31Some Example Questions
Midterm
- What do these four lines do?
- _at_bases (A, C, G, T)
- baset pop _at_bases
- unshift (_at_bases, base1)
- print _at_bases\n\n
T A G C
32Some Example Questions
Midterm
- What does this code snippet do if COND is true
- unless(COND)
- do something
-
nothing
33Some Example Questions
Midterm
- What does this code fragment do?
- protein join(,_at_protein)
Converts the array _at_protein into a scalar
protein with no space between The entries
34Some Example Questions
Midterm
- What does this code fragment do?
- myfile myfile
- Open(MYFILE, gtmyfile)
Opens the file myfile with the file handle
MYFILE for writing
35Some Example Questions
Midterm
- What does this code fragment do?
- while(DNA /a/ig)a
Counts the occurrences of the letter a or
A within the string DNA
36Some Example Questions
Midterm
- What is the effect of using the command
- use strict
- at the beginning of your program?
It insists that your programs have all their
variables declared as my variables
37Some Example Questions
Midterm
- What is contained in the reserved variable 0 and
- in the array _at_ARGV ?
0 contains the name of the program _at_ARGV
contains the command line arguments for the
program
38Some Example Questions
Midterm
- What is the difference between pass by value
and pass by reference ?
In pass by value you provide a subroutine with
a copy of your variable. In pass by reference
you provide a subroutine with a pointer to your
variable. In this manner the subroutine can
change the contents of the variable.
39Some Example Questions
Midterm
- What is a pointer and what does it mean to
dereference a pointer?
A pointer is an address in memory to a particular
variable. Dereferecing a pointer is the act of
obtaining the information that is stored at a
particular pointer location.
40Some Example Questions
Midterm
- How do you invoke perl with the debugger?
perl - d
41Some Example Questions
Midterm
- Given an array _at_verbs what is going on here?
- verbsrand _at_verbs
rand wants an integer so it uses scalar _at_verbs
rand then generates a random number between 0 and
length of the array _at_verbs. This is then
converted to an integer to index into _at_verbs