Roadmap - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Roadmap

Description:

One can write all statements on one line. All Perl statements end in a ... names = ('mary', 'tom', 'mark', 'john', 'jane'); $names [1] ; ? _at_names [1..4] ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 32

Provided by: Unive48

Category:

more less

Transcript and Presenter's Notes

Title: Roadmap

1
Roadmap

The topics
basic concepts of molecular biology
Gene, protein
Central dogma of molecular biology
PCR, DNA sequencing
Elements of Perl
overview of the field
biological databases and database searching
sequence alignments
phylogenetics
structure prediction
microarray data analysis

2
Programming and Perlfor BioinformaticsPart I
3
A Taste of Perl print a message

perltaste.pl Greet the entire world.
!/usr/bin/perl
greet the entire world
x 6e9
print Hello world!\n
print All x of you!\n

- command interpretation header
- a comment
- variable assignment statement

- function calls (output statements)
4
Basic Syntax and Data Types

whitespace doesnt matter to Perl. One can write
all statements on one line
All Perl statements end in a semicolon just
like C
Comments begin with and Perl ignores
everything after the until end of line.
Example this is a comment
Perl has three basic data types
scalar
array (list)
associative array (hash)

5
Scalars

Scalar variables begin with followed by an
identifier
Example this_is_a_scalar
An identifier is composed of upper or lower case
letters, numbers, and underscore '_'. Identifiers
are case sensitive (like all of Perl)
progname first_perl
numOfStudents 4
sets the content of progname to be the string
first_perl numOfStudents to be the integer 4

6
Scalar Values

Numerical Values
integer 5, 3, 0, -307
floating point 6.2e9, -4022.33
hexadecimal/octal 0xd4f, 0477
binary 0b011011
NOTE all numerical values stored as
floating-point numbers (double precision)

7
Do the Math

Mathematical functions work pretty much as you
would expect
47
64
43-27
256/12
2/(3-5)
Example
!/usr/bin/perl
print "45\n"
print 45 , "\n"
print "45" , 45 , "\n"
myNumber 88
Note use commas to separate multiple items in a
print statement

45 9 459
What will be the output?
8
Scalar Values

String values
Example
day "Monday "
print "Happy Monday!\n"
print "Happy day!\n"
print 'Happy Monday!\n'
print 'Happy day!\n'
Double-quoted interpolates (replaces variable
name/control character with its value)
Single-quoted no interpolation done (as-is)

Happy Monday!ltnewlinegt
Happy Monday!ltnewlinegt
Happy Monday!\n
Happy day!\n
What will be the output?
9
String Manipulation

Concatenation
dna1 ACTGCGTAGC
dna2 CTTGCTAT
juxtapose in a string assignment or print
statement
new_dna dna1dna2
Use the concatenation operator .
new_dna dna1 . dna2
Substring
dna ACTGCGTAGC
exon1 substr(dna,2,5)

TGCGT
10
Substitution

DNA transcription T ? U
Substitution operator s///
dna GATTACATACACTGTTCA
rna dna
rna s/T/U/g GAUUACAUACACUGUUCA
is a binding operator indicating to exam the
contents of rna for a match pattern
Ex Start with dna gaTtACataCACTgttca
and do the same as above. What will be the output?

11
Example

transcribe.pl
dna "gaTtACataCACTgttca"
rna dna
rna s/T/U/g
print "DNA dna\n"
print "RNA rna\n"
Does it do what you expect? If not, why not?
Patterns in substitution are case-sensitive! What
can we do?
Convert all letters to upper/lower case
(preferred when possible)
If we want to retain mixed case, use
transliteration/translation operator tr///
rna tr/tT/uU/ replace all t by u, all T
by U

12
Case conversion

string acCGtGcaTGc
Upper case
dna uc(string) ACCGTGCATGC
or dna uc string
or dna \Ustring
Lower case
dna lc(string) accgtgcatgc
or dna \Lstring
Sentence case
dna ucfirst(string) Accgtgcatgc
or dna \u\Lstring

13
Reverse Complement

5- A C G T C T A G C . . . . G C A T -3
3- T G C A G A T C G . . . . C G T A -5
5- A T G C . . . . G C T A G A C G T -3
Reverse reverses a string
string "ACGTCTAGC"
string reverse(string) "CGATCTGCA
Complementation use transliteration operator
string tr/ACGT/TGCA/

14
More on String Manipulation

String length
length(dna)
Index
index STR,SUBSTR,POSITION
index(strand, primer, 2)

15
Flow Control

Conditional Statements
parts of code executed depending on truth value
of a logical statement
truth (logical) values in Perl
false 0, 0.0, 0e0, , undef, default
true anything else, default 1
(a, b) (75, 83)
if ( a lt b )
a b
print Now a b!\n
if ( a gt b ) print Yes, a gt b!\n
Compact

16
Comparison Operators
17
Logical Operators
18
if/else/elsif

allows for multiple branching/outcomes
randDNA ""
while ( length(randDNA) lt 200 )
a rand()
if ( a lt0.25 )
randDNA . "A"
elsif (a lt0.50 )
randDNA . "C"
elsif ( a lt 0.75 )
randDNA . "G"
else
randDNA . "T"
print randDNA

19
Conditional Loops

while ( statement ) commands
repeats commands until statement is no longer
true
do commands while ( statement )
same as while, except commands executed as least
once
NOTE the after the while statement!!
Short-circuiting commands next and last
next jumps to end, do next iteration
last jumps out of the loop completely

20
while

Example
while (alive)
if (needs_nutrients)
print Cell needs nutrients\n
Any problem?

21
for and foreach loops

Execute a code loop a specified number of times,
or for a specified list of values
for and foreach are identical use whichever you
want
Incremental loop (C style)
for ( i0 i lt 50 i )
x ii
print "i squared is x.\n"
Loop over list (foreach loop)
foreach name ( "Billy", "Bob", "Edwina" )
print "name is my friend.\n"

22
Basic Data Types

Perl has three basic data types
scalar
array (list)
associative array (hash)

23
Arrays

An array (list) is an ordered group of scalar
values.
_at_ is used to refer to the entire array
Example
(1,2,3) Array of three values 1, 2, and 3
("one","two","three") Array of 3 values
"one", "two", "three"
_at_names ("mary", "tom", "mark", "john", "jane")
names 1 ?
_at_names 1..4

tom
24
Basic Data Types

Perl has three basic data types
scalar
array (list)
associative array (hash)

25
More on Arrays

_at_a () empty list
_at_b (1,2,3) three numbers
_at_c ("Jan","Joe","Marie") three strings
_at_d ("Dirk",1.92,46,"20-03-1977") a mixed
list
Variables and sublists are interpolated in a list
_at_b (a,a1,a2) variable interpolation
_at_c ("Jan",("Joe","Marie")) list
interpolation
_at_d ("Dirk",1.92,46,(),"20-03-1977") empty
list interpolation
_at_e ( _at_b, _at_c ) same as (1,2,3,"Jan","Joe","M
arie")
Practical construction operators (x..y)
_at_x (1..6) same as (1, 2, 3, 4, 5, 6)
_at_y (1.2..4.2) same as (1.2, 2.2, 3.2, 4.2,
5.2)
_at_z (2..5,8,11..13) same as
(2,3,4,5,8,11,12,13)

26
Array Manipulations

reverse Reverses the order of array elements
_at_a (1, 2, 3)
_at_b reverse _at_a _at_b (3, 2, 1)
split Splits a string into a list/array
line "John Smith 28"
(first, last, age) split /\s/, line
DNA "ACGTTTGA"
_at_DNA split ('', DNA)
join Joins a list/array into a string
gene join "", (exon1, exon3)
name join "-", ("Zhong", "Hui")
scalar Returns the number of elements in
_at_array

27
Exercise

Determine freq of nucleotides
dna "gaTtACataCACTgttca"
?

28
Ex Determine freq of nucleotides

dna "gaTtACataCACTgttca"
dna uc(dna) GATTACATACACTGTTCA
count_A 0
count_C 0
count_G 0
count_T 0
_at_dna split '', dna
foreach base (_at_dna)
if (base eq 'A') count_A
elsif (base eq 'C') count_C
elsif (base eq 'G') count_G
elsif (base eq 'T') count_T
else print "error!\n"
print "count of A count_A \n"
print "count of C count_C \n"
print "count of G count_G \n"
print "count of T count_T \n"

29
Filehandles

File I/O (input/output) reading from/writing to
files
Files represented in Perl by a filehandle
variable
(for clarity, usu. written as a bare word in
UPPERCASE)
Open a file on a filehandle using the open
function
for reading (input)
open INFILE, lt datafile.txt
or open (INFILE, lt datafile.txt)
for writing (output), overwriting the file
open OUTFILE, gt output
for appending to the end of the file
open OUTFILE, gtgt output
Close a file on a filehandle
Close (OUTFILE)

30
Special Filehandles

Special files that are always open
STDIN (standard input)
input from command window read only
STDOUT (standard output)
output to command window write only
print STDOUT Have fun with Perl!\n
or just
print Have fun with Perl!\n

31
Input from Filehandles

Angle Bracket input operator
reads one line of input (up to newline/carriage
return)
from STDIN
print "Enter name of protein "
line ltSTDINgt
chomp line removes \n from end of line
print \nYou entered line.\n
from a file
open (INPUT, aminos.txt)
amino1 ltINPUTgt
amino2 ltINPUTgt
chomp (amino1, amino2)

32
sequences.fasta

gtgi145536gbL04574.1Escherichia coli DNA
polymerase III chi subunit (holC) gene, complete
cds
TAACGGCGAAGAGTAATTGCGTCAGGCAAGGCTGTTATTGCCGGATGCGG
CGTGAACGCCTTATCCGACC
TACACAGCACTGAACTCGTAGGCCTGATAAGACACAACAGCGTCGCATCA
GGCGCTGCGGTGTATACCTG
ATGCGTATTTAAATCCACCACAAGAAGCCCCATTTATGAAAAACGCGACG
TTCTACCTTCTGGACAATGA
CACCACCGTCGATGGCTTAAGCGCCGTTGAGCAACTGGTGTGTGAAATTG
CCGCAGAACGTTGGCGCAGC
GGTAAGCGCGTGCTCATCGCCTGTGAAGATGAAAAGCAGGCTTACCGGCT
GGATGAAGCCCTGTGGGCGC
GTCCGGCAGAAAGCTTTGTTCCGCATAATTTAGCGGGAGAAGGACCGCGC
GGCGGTGCACCGGTGGAGAT
CGCCTGGCCGCAAAAGCGTAGCAGCAGCCGGCGCGATATATTGATTAGTC
TGCGAACAAGCTTTGCAGAT
TTTGCCACCGCTTTCACAGAAGTGGTAGACTTCGTTCCTTATGAAGATTC
TCTGAAACAACTGGCGCGCG
AACGCTATAAAGCCTACCGCGTGGCTGGTTTCAACCTGAATACGGCAACC
TGGAAATAATGGAAAAGACA
TATAACCCACAAGATATCGAACAGCCGCTTTACGAGCACTGGGAAAAGCA
GGGCTACTTTAAGCCTAATG
GCGATGAAAGCCAGGAAAGTTTCTGCATCATGATCCCGCCGCCGAA