Title: Varia
1(No Transcript)
2Les 3 (theorie oefeningen)
- Varia
- http//biochema.rug.ac.be/
- Herhaling
- Demos (sequence retrieva, dotplot, pairwise
alignment SW) tijdens les - Installatie
- Windows versus Linux
- Perl
- Herhaling
- Array/Hashes/FileIO
- Oefeningen
3Overview
- Fr 4/10 Introduction History, Database
Biology, Sequence Formats - Fr 11/10 Pairwise comparison, scoring, matrices
- Do 17/10 B10 Theorie Herhaling (Demo) Perl
Herhaling Oefeningen - Fr 18/10 Multiple alignment, Basic and advanced
Database Searching - Do geen practicum / Oefeningen
- Fr 25/10 geen les
- Do geen practicum / Oefeningen
- Fr 1/11 geen les
- Fr 8/11 Phylogenetics, Gene prediction, junk
mining (RNA prediction) - Fr 15/11 geen les
- Fr 22/11 Protein structure, classification and
engineering
4Genetic Code Matrix
5Other similarity scoring matrices might be
constructed from any property of amino acids
that can be quantified - partition coefficients
between hydrophobic and hydrophilic phases -
charge - molecular volume Unfortunately
,
Overview
6Principles of Scoring Matrix Construction
- 1978 1991
- A 100 100
- C 20 44
- D 106 86
- E 102 77
- F 41 51
- G 49 50
- H 66 91
- I 96 103
- K 56 72
- L 40 54
- M 94 93
- N 134 104
- P 56 58
- Q 93 84
- R 65 83
- S 120 117
- T 97 107
- V 74 98
7 Which matrix should I use?
- When comparing sequences that were not known in
advance to be related, for example when database
scanning - default scoring matrix used is the BLOSUM62
matrix - if one is restricted to using only PAM scoring
matrices, then the PAM120 is recommended for
general protein similarity searches - When using a local alignment method, Altschul
suggests that three matrices should ideally be
used PAM40, PAM120 and PAM250, the lower PAM
matrices will tend to find short alignments of
highly similar sequences, while higher PAM
matrices will find longer, weaker local
alignments.
8Overview
- C K H V F C R V C I
- --------------------
- C 5 3 3 3 2 2 1 1 1 0
- K 4 4 3 3 2 1 1 1 0 0
- K 3 4 3 3 2 1 1 1 0 0
- C 4 3 3 3 2 2 1 1 1 0
- F 3 2 2 2 3 1 1 1 0 0
- C 4 2 2 2 2 2 1 1 1 0
- K 2 3 2 2 2 1 1 1 0 0
- C 2 1 1 1 1 2 1 0 1 0
- V 0 0 0 1 0 0 0 1 0 0
- C K H V F C R V C I
- C K K C F C - K C V
- C K H V F C R V C I
- C K K C F C K - C V
- C - K H V F C R V C I
- C K K C - F C - K C V
- C K H - V F C R V C I
- C K K C - F C - K C V
9Overview
- C K H V F C R V C I
- --------------------
- C 5 3 3 3 2 2 1 1 1 0
- K 4 4 3 3 2 1 1 1 0 0
- K 3 4 3 3 2 1 1 1 0 0
- C 4 3 3 3 2 2 1 1 1 0
- F 3 2 2 2 3 1 1 1 0 0
- C 4 2 2 2 2 2 1 1 1 0
- K 2 3 2 2 2 1 1 1 0 0
- C 2 1 1 1 1 2 1 0 1 0
- V 0 0 0 1 0 0 0 1 0 0
- C K H V F C R V C I
- C K K C F C - K C V
- C K H V F C R V C I
- C K K C F C K - C V
- C - K H V F C R V C I
- C K K C - F C - K C V
- C K H - V F C R V C I
- C K K C - F C - K C V
10Get Sequences
- Entrez
- Simple
- ZK822 (genomic)
- ZK822.4 (gene)
- Limits Details
- Eg. all ion channels, complete CDS
- Batch-Entrez
- SRS Sequence Retrieval System
- Swissport
11SQL structured query language
- SQL is very powerful because it consists of only
4 statements, sometimes referred to as CRUD - 1) Create - INSERT - to store new data
- 2) Read - SELECT - to retrieve data
- 3) Update - UPDATE - to change or modify data.
- 4) Delete - DELETE - delete or remove data
- Select from table where .
12- Select from table where . (http//sqlcourse2.co
m/select2.html) - ((ion channelAll Fields AND "homo
sapiens"Organism) AND ((((((1900MDAT
3000MDAT) NOT gbdiv_estPROP) AND ((1900MDAT
3000MDAT) NOT gbdiv_stsPROP)) AND
((1900MDAT 3000MDAT) NOT gbdiv_gssPROP))
AND ((1900MDAT 3000MDAT) NOT
gbdiv_htgPROP)) AND ((1900MDAT 3000MDAT)
NOT gbdiv_patPROP)))
13Get Sequences
- Entrez
- Simple
- ZK822 (genomic)
- ZK822.4 (gene)
- Limits Details
- Eg. all ion channels, complete CDS
- Batch-Entrez
- SRS Sequence Retrieval System
- Swissport
14- Perform multiple alignment
- Blosum62 12 -2
- Change gap opening and extenion to get correct
alignment - Goal Align two similar proteins
15Extensions to basic dynamic programming method
Overview
- use similarity function in initialization step -gt
scoring tables - use gap penalties
- constant gap penalty for gap gt 1
- gap penalty proportional to gap size
- one penalty for starting a gap (gap opening
penalty) - different (lower) penalty for adding to a gap
(gap extension penalty)
16Sequence comparison with dot matrices
Dot matrices
- Goal Graphically display regions of similarity
between two sequences (e.g., domains in common
between two proteins of suspected similar
function) - Extremely USEFULL !!
- Remember DNA is double stranded (plot again RCC)
17Overview
18Overview
- Window size changes with goal of analysis
- size of average exon
- size of average protein structural element
- size of gene promoter
- size of enzyme active site
19 20Les 3 (theorie oefeningen)
- Varia
- http//biochema.rug.ac.be/
- Herhaling
- Demos (sequence retrieva, dotplot, pairwise
alignment SW) tijdens les - Installatie
- Windows versus Linux
- Perl
- Herhaling
- Array/Hashes/FileIO
- Oefeningen
21Perl installation
- Perl
- Perl is available for various operating systems.
To download Perl and install it on your computer,
have a look at the following resources - www.perl.com (O'Reilly).
- Downloading Perl Software
- ActiveState. ActivePerl for Windows, as well as
for Linux and Solaris. - ActivePerl binary packages.
- CPAN
- PHPTriad
- bevat Apache/PHP en MySQL http//sourceforge.net/
projects/phptriad
22BioPerl installation
- Bioperl
- Download bioperl bioperl-1.0.2.zip van
http//bioperl.org/Core/Latest/ - perl Makefile.PL
- (n)make (oftwel nmake, download van microsoft
http//download.microsoft.com/download/vc15/Patch/
1.52/W95/EN-US/Nmake15.exe) - (n)make install
- Voor bundle-bioperl
- Download bundle from cpan http//search.cpan.org/a
uthor/CRAFFI/Bundle-BioPerl-2.03/BioPerl.pm - perl Makefile.PL
- (n)make (oftwel nmake, download van microsoft
http//download.microsoft.com/download/vc15/Patch/
1.52/W95/EN-US/Nmake15.exe) - (n)make install
- DBI en DBDMysql met PPM (perl package manager
van ActivePerl)
23What is Perl ?
- Perl is a High-level Scripting language
- Faster than sh or csh, slower than C
- No need for sed, awk, tr, wc, cut,
- Compiles at run-time
- Perl is a computer language that is
- Interpreted
- Loosely typed
- String/text oriented
- Capable of using multiple syntax formats
- In Perl, theres more than one way to do it
24Why use Perl for bioinformatics ?
- Ease of use by novice programmers
- Fast software prototyping
- Flexible language
- Compact code
- Powerfull pattern matching via regular
expressions (Best Regular Expressions on Earth) - Availability of Perl modules for Bioinformatics
and Internet. - Available for Unix, PC, Mac
- Portability, Best for CGI-programming.
- Open Source easy to extend and custumize
- No licensing fees
- Some tasks are still better done with other
languages (heavy computations / graphics). - With perl you can write simple programs fast, but
on the other hand it is also suitable for large
and complex programs. (yet, it is not adequate
for very large projects).
25What bioinformatics tasks are suited to Perl ?
- Sequence manipulation and analysis
- Parsing results of sequence analysis programs
(Blast, Genscan, Hmmer etc) - Parsing database (eg Genbank) files
- Obtaining multiple database entries over the
internet
26General Remarks
- Perl is mostly a free format language add
spaces, tabs or new lines wherever you want. - For clarity, it is recommended to write each
statement in a separate line, and use indentation
in nested structures. - Comments Anything from the sign to the end of
the line is a comment. (There are no multi-line
comments). - A perl program consists of all of the Perl
statements of the file taken collectively as one
big routine to execute.
27How does the real perl program look like
!/usr/local/bin/perl print Hello everyone\n
Mandatory first line !
How to run it
1. Save the text of your code as a file --
program.pl 2. Execute it perl program.pl
Hello everyone
2822 ?
- indicates a variable
a 2 b 2 c a b
- ends every command
- assigns a value to a variable
or
c 2 2
or
c 2 2
or
c 2 / 2
or
24 lt-gt 24 16
c 2 4
or
c 1.35 2 - 3 / (0.12 1)
29Ok, c is 4. How do we know it?
c 4 print c
print command
- bracket output expression
print Hello \n
\n
- print a end-of-the-line character (equivalent
to pressing Enter)
Strings concatenation
print Hello everyone\n print Hello .
everyone . \n
Expressions and strings together
2 2 4
print 2 2 . (22) . \n
expression
30Loops and cycles (for statement)
Output all the numbers from 1 to 100 for (n1
nlt100 n1) print n \n
1. Initialization
for ( n1 )
2. Increment
for ( n1 )
3. Termination (do until the criteria is
satisfied)
for ( nlt100 )
4. Body of the loop - command inside curly
brackets
for ( )
31FOR IF -- all the even numbers from 1 to 100
for (n1 nlt100 n1) if ((n 2) 0)
print n
Note a b -- Modulus -- Remainder
when a is divided by b
32Text Processing Functions
- The substr function
- Definition
- The substr function extracts a substring out of a
string and returns it. The function receives 3
arguments a string value, a position on the
string (starting to count from 0) and a length. - Example
- a "university"
- k substr (a, 3, 5)
- k is now "versi" a remains unchanged.
- If length is omitted, everything to the end of
the string is returned.
33- !/usr/local/bin/perl
- use strict
- use warnings
- my (sp_file, line, id, ac, de)
- sp_file "sp.txt"
- open (SP, sp_file) die "cannot open
\"sp_file\" !" - while (line ltSPgt)
- chomp (line)
-
- my field substr (line, 0, 2)
- my value substr (line, 5)
-
- if (field eq "ID")
- id value
-
- if (field eq "AC")
- ac value
-
34Text Processing Functions
- The split function
- The split function splits a string to a list of
substrings according to the positions of a given
delimiter. The delimiter is written as a pattern
enclosed by slashes /PATTERN/. Examples - string "programmingcourseforbioinformatic
s" - _at_list split (//, string)
- _at_list is now ("programming", "course", "for",
"bioinformatics") string remains unchanged. - string "protein kinase C\t450 Kilodaltons\t120
Kilobases" - _at_list split (/\t/, string) \t indicates tab
- _at_list is now ("protein kinase C", "450
Kilodaltons", "120 Kilobases")
35Text Processing Functions
- The join function
- The join function does the opposite of split. It
receives a delimiter and a list of strings, and
joins the strings into a single string, such that
they are separated by the delimiter. - Note that the delimiter is written inside quotes.
- Examples
- _at_list ("programming", "course", "for",
"bioinformatics") - string join ("", _at_list)
- string is now "programmingcourseforbioinf
ormatics" - name "protein kinase C" mol_weight "450
Kilodaltons" seq_length "120 Kilobases" - string join ("\t", name, mol_weight,
seq_length) - string is now "protein kinase C\t450
Kilodaltons\t120 Kilobases"
36Regular Expressions
- Match to a sequence of characters
- The EcoRI restriction enzyme cuts at the
consensus sequence GAATTC. - To find out whether a sequence contains a
restriction site for EcoR1, write - if (sequence /GAATTC/)
- ...
-
37Regular Expressions
- Match to a character class
- Example
- The BstYI restriction enzyme cuts at the
consensus sequence rGATCy, namely A or G in the
first position, then GATC, and then T or C. To
find out whether a sequence contains a
restriction site for BstYI, write - if (sequence /AGGATCTC/) ... This
will match all of AGATCT, GGATCT, AGATCC, GGATCC.
- Definition
- When a list of characters is enclosed in square
brackets , one and only one of these characters
must be present at the corresponding position of
the string in order for the pattern to match. You
may specify a range of characters using a hyphen
-. - A caret at the front of the list negates the
character class. - Examples
- if (string /AGTC/) ... matches any
nucleotide - if (string /a-z/) ... matches any
lowercase letter - if (string /chromosome1-6/) ...
matches chromosome1, chromosome2 ... chromosome6 - if (string /xyzXYZ/) ... matches any
character except x, X, y, Y, z, Z
38- We would like to find out whether the concensus
sequence is contained (somewhere) in a given
sequence a. - Without quantifiers
- if (a /ACCCCAGAGAGGTGT/) ...
- With quantifiers
- if (a /AC4AG3(GT)2/) ...
39Regular Expressions
- Case-insensitive pattern matching
- To achieve a case-insensitive pattern matching,
add the i modifier after the closing slash of the
regular expression. - Example
- When searching for HTML tags, we preferably do a
case-insensitive search. e.g. if (doc
/ltTABLE.gt/i) - This would match ltTABLE...gt, lttablegt, ltTAblEgt,
etc.
40Alternation
- Alternation allows matching any one of several
subexpressions. The alternative subexpressions
are separated by vertical bar(s) . - Example 1
- extract all lines including either human, rat or
mouse proteins - if (line /HUMANRATMOUSE/) match line
against either HUMAN, RAT or MOUSE - Example 2
- In the same file, let us now restrict our search
only for the ACM1 receptors in either human, rat
or mouse. if (line /ACM1_(HUMANRATMOUSE)/)
we enclosed the alternative subexpressions
in parentheses (HUMANRATMOUSE) and added the
receptor name prefix ACM1_ before them.
41Anchoring a pattern to the beginning or end of a
string
- To force matching of your pattern to the
beginning of the string, write a caret as the
first character of the regular expression. - To force the matching to the end of the string,
write a dollar sign as the last character of
the regular expression. - To print the "description" line, which starts
with DE, we write - !/usr/local/bin/perl
- my sp_file "sources/sp_entry"
- open (SP, sp_file) die "cannot open
\"sp_file\" !" - while (my line ltSPgt)
- if (line /DE/)
- print line
-
-
- Result
- DE MUSCARINIC ACETYLCHOLINE RECEPTOR M1.
- Note if we omitted the caret from the regular
expression - if (line /DE/)
42Regex
- OReilly book Mastering regular expressions (2nd
edition) - Regular Expressions Tutorial
43Substitutions
44Translations
45GC
- sub gc_content
- my seq shift
- print "\seq",seq
- my win shift
- print "length ",length(seq),"\n"
- for (my i 0 i lt length(seq) - win i)
- my segment substr(seq,i,win)
- my gc_count segment tr/GCgc/GCgc/
- print i1,"\t",segment,"\t",gc_count,"\n"
-
46Les 3 (theorie oefeningen)
- Varia
- http//biochema.rug.ac.be/
- Herhaling
- Demos (sequence retrieva, dotplot, pairwise
alignment SW) tijdens les - Installatie
- Windows versus Linux
- Perl
- Herhaling
- Array/FileIO/Hashes
- Oefeningen
47Arrays
- Definitions
- A scalar variable contains a scalar value one
number or one string. A string might contain many
words, but Perl regards it as one unit. - An array variable contains a list of scalar data
a list of numbers or a list of strings or a mixed
list of numbers and strings. The order of
elements in the list matters. - Syntax
- Array variable names start with an _at_ sign.
- You may use in the same program a variable named
var and another variable named _at_var, and they
will mean two different, unrelated things. - Example
- Assume we have a list of numbers which were
obtained as a result of some measurement. We can
store this list in an array variable as the
following - _at_msr (3, 2, 5, 9, 7, 13, 16)
48The foreach construct
- The foreach construct iterates over a list of
scalar values (e.g. that are contained in an
array) and executes a block of code for each of
the values. - Example
- foreach i (_at_some_array)
- statement_1
- statement_2
- statement_3
- Each element in _at_some_array is aliased to the
variable i in turn, and the block of code inside
the curly brackets is executed once for each
element. - The variable i (or give it any other name you
wish) is local to the foreach loop and regains
its former value upon exiting of the loop. - Remark _
49Binary assignment operators
- A short hand for
- k k - 2
- is
- k - 2
- Similarly, you may use
- k 2 same as k k 2
- k 2 same as k k 2
- k / 2 same as k k / 2
- or even
- k . "some string" same as k k . "some
string" These are called binary assignment
operators, and are very useful in iterative
(looping) constructs.
50Examples for using the foreach construct - cont.
- Calculate sum of all array elements
- !/usr/local/bin/perl
- _at_msr (3, 2, 5, 9, 7, 13, 16)
- sum 0
- foreach i (_at_msr)
- sum i
- print "sum is sum\n"
51Accessing individual array elements
- Individual array elements may be accessed by
indicating their position in the list (their
index). - Example
- _at_msr (3, 2, 5, 9, 7, 13, 16)
- index value 0 3 1 2 2 5 3 9 4 7 5 13 6 16
- First element msr0 (here has the value of 3),
- Third element msr2 (here has the value of 5),
- and so on.
52The sort function
- The sort function receives a list of variables
(or an array) and returns the sorted list. - _at_array2 sort (_at_array1)
- !/usr/local/bin/perl
- _at_countries ("Israel", "Norway", "France",
"Argentina") - _at_sorted_countries sort ( _at_countries)
- print "ORIG _at_countries\n", "SORTED
_at_sorted_countries\n" - Output
- ORIG Israel Norway France Argentina
- SORTED Argentina France Israel Norway
- !/usr/local/bin/perl
- _at_numbers (1 ,2, 4, 16, 18, 32, 64)
- _at_sorted_num sort (_at_numbers)
- print "ORIG _at_numbers \n", "SORTED _at_sorted_num
\n" - Output
- ORIG 1 2 4 16 18 32 64
- SORTED 1 16 18 2 32 4 64
53The push and shift functions
- The push function adds a variable or a list of
variables to the end of a given array. - Example
- a 5
- b 7
- _at_array ("David", "John", "Gadi")
- push (_at_array, a, b)
- _at_array is now ("David", "John", "Gadi", 5, 7)
- The shift function removes the first element of a
given array and returns this element. - Example
- _at_array ("David", "John", "Gadi")
- k shift (_at_array)
- _at_array is now ("John", "Gadi") k is now
"David" - Note that after both the push and shift
operations the given array _at_array is changed!
54How can I know the length of a given array?
- You have three options
- Assing the array variable into a scalar variable,
as in the previous slide. This is not
recommended, because the code is confusing. - Use the scalar function. Example
- x scalar (_at_array) x now contains the
number of elements in _at_array. - Use the special variable array_name to get the
index value of the last element of _at_array_name.
Example - _at_fruits ("apple", "orange", "banana", "melon")
- a fruits
- a is now 3
- b fruits 1
- b is now 4, i.e. the no. of elements in
_at_fruits.
55File input / output
- Opening a filehandle
- In order to use a filehandle other than STDIN,
STDOUT and STDERR, the filehandle needs to be
opened. The open function opens a file or device
and associates it with a filehandle. - It returns 1 upon success and undef otherwise.
- Examples
- open a filehandle for reading open
(SOURCE_FILE, "filename") - or open (SOURCE_FILE, "ltfilename")
- open a filehandle for writing open
(RESULT_FILE, "gtfilename") - open a filehandle for appending open (LOGFILE,
"gtgtfilename"
56File input / output
- Closing a filehandle
- When you are finished with a filehandle, you may
close it with the close function. The close
function closes the file or device associated
with the filehandle. - Example
- close (MY_FILE_HANDLE) Filehandles are
automatically closed when the program exits, or
when the filehandle is reopened.
57File input / output
- The die function
- Sometimes the open function fails. For example,
opening a file for input might fail because the
file does not exist, and opening a file for
output might fail because the file does not have
a write permission. A perl program will
nevertheless use the filehandle, and will not
warn you that all input and output activities are
actually meaningless. - Therefore, it is recommended to explicitly check
the result of the open command, and if it fails
to print an error message and exit the program. - This is easily done using the die function.
- Example
- my k open (FILEHANDLE, "filename") unless
(k) die ("cannot open file filename !")
in case file "filename" cannot be opened, the
argument of die will be printed on the screen
and the program will exit. ! is a special
variable that contains the respective error
message sent by the operating system.. A short
hand - open (FILEHANDLE, "filename") die "cannot open
file filename !"
58Using filehandles for writing
- Example
- !/usr/local/bin/perl use strict
- use warnings
- open (OUTF, "gtout_file") die "cannot open
out_file !" open (LOGF, "gtgtlog_file") die
"cannot open log_file !" - print OUTF "Here is my program output\n"
- print LOGF "First task of my program
completed\n" - print "Nice, isn't it?\n" will be printed on
the screen close (OUTF) - close (LOGF)
59Using filehandles for reading (1/3)
- !/usr/local/bin/perl
- use strict
- use warnings
- my infile "CEACAM3.txt"
- my (line1, line2, line3)
- open (FH, infile) die "cannot open
\"infile\" !" - line1 ltFHgt read first line
- print line1 proccess line (here we only
print it) - line2 ltFHgt read next line
- print line2 proccess line (here we only
print it) - line3 ltFHgt read next line
- print line3 proccess line (here we only
print it) - close (FH)
60Using filehandles for reading (2/3)
- When ltFILEHANDLEgt is assigned into an array
variable, all lines up to the end of the file are
read at once. Each line becomes a separate
element of the array. - !/usr/local/bin/perl
- use strict
- use warnings
- my infile "CEACAM3.txt"
- open (FH, infile) die "cannot open
\"infile\" !" - my _at_lines ltFHgt
- chomp (_at_lines) chomp each element of _at_lines
- close (FH)
- to process the lines you might wish to iterate
- over the _at_lines array with a foreach loop
- my line
- foreach line (_at_lines)
- process line. here we just print it.
- print "line\n"
61Using filehandles for reading (3/3)
- Using a while loop, read one line at a time and
assign it into a scalar variable, as long as the
variable is not an empty string (which will
happen at end-of-file). - Note that a blank line read from the file will
not result in an empty string, since it still
contains the terminating \n. - !/usr/local/bin/perl
- use strict
- use warnings
- my infile "CEACAM3.txt"
- open (FH, infile) die "cannot open
\"infile\" !" - my line or, in one line
- while (line ltFHgt) while (my line
ltFHgt) - chomp (line)
- print "line\n" process line. here we just
print it. -
-
- close (FH)
62Hashes
- Definition
- A hash variable contains a collection of
key/value pairs, arranged such that you can
easily use any key to find its associated value.
The order of the key/value pairs in the hash is
not important. - Hashes are also called associative arrays.
- Hash variable names start with a sign.
- You may use in the same program a variable named
var and another variable named _at_var, and a third
variable named var, and they will mean three
different, unrelated things.
63A Hash Is a Lookup Table
- A hash is a lookup table.
- We use a key to find an associated value. my
translate - translate'atg' 'M'
- translate'taa' ''
- translate'ctt' 'K' oops
- translate'ctt' 'L' fixed
- print translate'atg'
- Getting All Keys
- keys translate
- Removing Key, Value Pairs
- delete translate'taa' keys translate
- Initializing From a List
- translate ( 'atg' gt 'M', 'taa' gt '', 'ctt'
gt 'L', 'cct' gt 'P', )
64- AA1 ("TGT" gt "Cys",
- "TGC" gt "Cys",
- "GAT" gt "Asp",
- "GAC" gt "Asp",
- "GAA" gt "Glu",
- "GAG" gt "Glu",
- "TTT" gt "Phe")
65- Accessing individual hash elements
- Whereas array elements are accessed by their
(numerical) index, hash elements (values) are
accessed by their keys. - Example
- !/usr/local/bin/perl
- use strict
- use warnings
- my (prices, s, t)
- prices ("shirt" gt 45,
- "pullover" gt 90,
- "trousers" gt 120,
- "socks" gt 15)
- s prices"shirt"
- t prices"trousers"
66- Adding an element to a hash
- Simply assign a value to a hash individual
element, e.g. prices"coat" 250 - coat, 250 will be added to the prices hash
- Deleting an element from a hash
- use the delete function, e.g. delete
prices"coat" - Checking whether a hash is empty
- if (hash_name) will be false if hash is
empty ....... - Using a Hash for Counting
- number_of_nuc"g"
- Using a Hash for Eliminating Duplicates
- genbankaccession 1In this case, the keys
in the hash are what's important. The values may
be irrelevant.
67The keys function
- The keys function yields a list of all the
current keys in a given hash. - Example
- !/usr/local/bin/perl
- use strict
- use warnings
- my prices ("shirt" gt 45, "pullover" gt 90,
"trousers" gt 120, "socks" gt 15) - my _at_items keys (prices)
- print "ITEMS _at_items\n"
- Result ITEMS pullover shirt socks trousers
68Iterating over all hash elements using the keys
function
- Example - printing all keys and values of the
prices hash - my _at_items_list keys (prices)
- foreach item (_at_items_list)
- print "item pricesitem NIS\n"
- or, shorter
- foreach item (keys (prices))
- print "item pricesitem NIS\n"
- Result pullover 90 NIS shirt 45 NIS socks
15 NIS trousers 120 NIS
69The values function
- The values function yields a list of all the
current values in a given hash. - Example
- !/usr/local/bin/perl
- use strict
- use warnings
- my prices ("shirt" gt 45, "pullover" gt 90,
"trousers" gt 120, "socks" gt 15) - my _at_EURO values (prices)
- print "PRICES _at_EURO\n"
- Result PRICES 90 45 15 120
70Oefeningen http//biochema.rug.ac.be/
- Which genes are involved in the PRADER-WILLI
SYNDROME ? - How may different human PDE (phosphodiesterases)
are available in Genbank ? - How big is the anthrax genome and how many genes
are present ? - Which of the 4 sequences (seq1/2/3/4)
- Contains a hexokinases signature
(LIVM-G-F-TN-F-S-FY-P-x(5)-LIVM-DNST-x(3
)-LIVM- x(2)-W-T-K-x-LF) - How many of them?
- Where (hint) ?
- Write program (random.pl) to generate 10 random
sequences of 1000 bp and write them to a file in
fasta format - Find the answer in ultimate-sequence.txt
- (hint use AA1 to perform translation(s))
- What is the restriction enzyme which the longest
recognition site ?
71- gtSEQ1
- MGNLFENCTHRYSFEYIYENCTNTTNQCGLIRNVASSIDVFHWLDVYIST
TIFVISGILNFYCLFIALYT YYFLDNETRKHYVFVLSRFLSSILVIISL
LVLESTLFSESLSPTFAYYAVAFSIYDFSMDTLFFSYIMIS
LITYFGVVHYNFYRRHVSLRSLYIILISMWTFSLAIAIPLGLYEAASNSQ
GPIKCDLSYCGKVVEWITCS LQGCDSFYNANELLVQSIISSVETLVGSL
VFLTDPLINIFFDKNISKMVKLQLTLGKWFIALYRFLFQMT
NIFENCSTHYSFEKNLQKCVNASNPCQLLQKMNTAHSLMIWMGFYIPSAM
CFLAVLVDTYCLLVTISILK SLKKQSRKQYIFVVVRLSAAILIALCIII
IQSTYFIDIPFRDTFAFFAVLFIIYDFSILSLLGSFTGVAM
MTYFGVMRPLVYRDKFTLKTIYIIAFAIVLFSVCVAIPFGLFQAADEIDG
PIKCDSESCELIVKWLLFCI ACLILMGCTGTLLFVTVSLHWHSYKSKKM
GNVSSSAFNHGKSRLTWTTTILVILCCVELIPTGLLAAFGK
SESISDDCYDFYNANSLIFPAIVSSLETFLGSITFLLDPIINFSFDKRIS
KVFSSQVSMFSIFFCGKR - gtSEQ2
- MLDDRARMEA AKKEKVEQIL AEFQLQEEDL KKVMRRMQKE
MDRGLRLETH EEASVKMLPT YVRSTPEGSE VGDFLSLDLG
GTNFRVMLVK VGEGEEGQWS VKTKHQMYSI PEDAMTGTAE
MLFDYISECI SDFLDKHQMK HKKLPLGFTF SFPVRHEDID
KGILLNWTKG FKASGAEGNN VVGLLRDAIK RRGDFEMDVV
AMVNDTVATM ISCYYEDHQC EVGMIVGTGC NACYMEEMQN
VELVEGDEGR MCVNTEWGAF GDSGELDEFL LEYDRLVDES
SANPGQQLYE KLIGGKYMGE LVRLVLLRLV DENLLFHGEA
SEQLRTRGAF ETRFVSQVES DTGDRKQIYN ILSTLGLRPS
TTDCDIVRRA CESVSTRAAH MCSAGLAGVI NRMRESRSED
VMRITVGVDG SVYKLHPSFK ERFHASVRRL TPSCEITFIE
SEEGSGRGAA LVSAVACKKA CMLGQ - gtSEQ3
- MESDSFEDFLKGEDFSNYSYSSDLPPFLLDAAPCEPESLEINKYFVVIIY
VLVFLLSLLGNSLVMLVILY SRVGRSVTDVYLLNLALADLLFALTLPIW
AASKVTGWIFGTFLCKVVSLLKEVNFYSGILLLACISVDRY
LAIVHATRTLTQKRYLVKFICLSIWGLSLLLALPVLIFRKTIYPPYVSPV
CYEDMGNNTANWRMLLRILP QSFGFIVPLLIMLFCYGFTLRTLFKAHMG
QKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTWVIQET
CERRNDIDRALEATEILGILHSCLNPLIYAFIGQKFRHGLLKILAIHGLI
SKDSLPKDSRPSFVGSSSGH TSTTL - gtSEQ4
- MEANFQQAVK KLVNDFEYPT ESLREAVKEF DELRQKGLQK
NGEVLAMAPA FISTLPTGAE TGDFLALDFG GTNLRVCWIQ
LLGDGKYEMK HSKSVLPREC VRNESVKPII DFMSDHVELF
IKEHFPSKFG CPEEEYLPMG FTFSYPANQV SITESYLLRW
TKGLNIPEAI NKDFAQFLTE GFKARNLPIR IEAVINDTVG
TLVTRAYTSK ESDTFMGIIF GTGTNGAYVE QMNQIPKLAG
KCTGDHMLIN MEWGATDFSC LHSTRYDLLL DHDTPNAGRQ
IFEKRVGGMY LGELFRRALF HLIKVYNFNE GIFPPSITDA
WSLETSVLSR MMVERSAENV RNVLSTFKFR FRSDEEALYL
WDAAHAIGRR AARMSAVPIA SLYLSTGRAG KKSDVGVDGS
LVEHYPHFVD MLREALRELI GDNEKLISIG IAKDGSGIGA
ALCALQAVKE KKGLA MEANFQQAVK KLVNDFEYPT ESLREAVKEF
DELRQKGLQK NGEVLAMAPA FISTLPTGAE TGDFLALDFG
GTNLRVCWIQ LLGDGKYEMK HSKSVLPREC VRNESVKPII
DFMSDHVELF IKEHFPSKFG CPEEEYLPMG FTFSYPANQV
SITESYLLRW TKGLNIPEAI NKDFAQFLTE GFKARNLPIR
IEAVINDTVG TLVTRAYTSK ESDTFMGIIF GTGTNGAYVE
QMNQIPKLAG KCTGDHMLIN MEWGATDFSC LHSTRYDLLL
DHDTPNAGRQ IFEKRVGGMY LGELFRRALF HLIKVYNFNE
GIFPPSITDA WSLETSVLSR MMVERSAENV RNVLSTFKFR
FRSDEEALYL WDAAHAIGRR AARMSAVPIA SLYLSTGRAG
KKSDVGVDGS LVEHYPHFVD MLREALRELI GDNEKLISIG
IAKDGSGIGA ALCALQAVKE KKGLA
72- gtultimate-sequence
- ACTCGTTATGATATTTTTTTTGAACGTGAAAATACTTTTCGTGCTATGGA
AGGACTCGTTATCGTGAAGTTGAACGTTCTGAATGTATGCCTCTTGAAAT
GGAAAATACTCATTGTTTATCTGAAATTTGAATGGGAATTTTATCTACAA
TGTTTTATTCTTACAGAACATTAAATTGTGTTATGTTTCATTTCACATTT
TAGTAGTTTTTTCAGTGAAAGCTTGAAAACCACCAAGAAGAAAAGCTGGT
ATGCGTAGCTATGTATATATAAAATTAGATTTTCCACAAAAAATGATCTG
ATAAACCTTCTCTGTTGGCTCCAAGTATAAGTACGAAAAGAAATACGTTC
CCAAGAATTAGCTTCATGAGTAAGAAGAAAAGCTGGTATGCGTAGCTATG
TATATATAAAATTAGATTTTCCACAAAAAATGATCTGATAA
73- my AA1 (
- 'UUU','F',
- 'UUC','F',
- 'UUA','L',
- 'UUG','L',
- 'UCU','S',
- 'UCC','S',
- 'UCA','S',
- 'UCG','S',
- 'UAU','Y',
- 'UAC','Y',
- 'UAA','',
- 'UAG','',
- 'UGU','C',
- 'UGC','C',
- 'UGA','',
- 'UGG','W',
-
- 'CUU','L',