Title: Perl
1Perl
2Conceptual Biology
- H. sapiens did not create the genetic code but
they did invent the transistor - Biological life is not optimized the modern
synthesis - Nature vs. Nurture
- What are the best ways to understand the
important differences the make the difference?
3A Molecular Primer
- Hierarchy of the eukaryote
- Organism gt System gt Organ gt Tissue gt Cell gt
Organelle gt Protein gt RNA gt DNA - Put Simply DNA ? RNA ? Protein
4The Building Blocks
- DNA is composed of four building blocks
- Nucleic acids, nucleotides, bases
- Adenine, Cytosine, Guanine, Thymine
- RNA also has four building blocks
- Adenine, Cytosine, Guanine, Uracil
- Proteins are composed of 20 building blocks
- Amino acids, residues
- Fragments of proteins are called peptides
- DNA, RNA and Proteins are polymers
5Code Nucleic Acid(s) w/ Sugar w/P
A Adenine Adenosine Adenylic Acid
C Cytosine Cytodine Cytidylic Acid
G Guanine Guanosine Guanylic Acid
T Thymine Tymidine Thymidylic Acid
U Uracil Uridine Uridylic Acid
M A or C (amino) Code Nucleic Acid
R A or G (purine) V A or C or G
W A or T (weak) H A or C or T
S C or G (strong) D A or G or T
Y C or T (pyrimidine) B C or G or T
K G or T (keto) N A, G, C, T (any)
6Code Nucleic Acid(s) w/ Sugar w/P
A Adenine Adenosine Adenylic Acid
C Cytosine Cytodine Cytidylic Acid
G Guanine Guanosine Guanylic Acid
T Thymine Tymidine Thymidylic Acid
U Uracil Uridine Uridylic Acid
M A or C (amino) Code Nucleic Acid
R A or G (purine) V A or C or G
W A or T (weak) H A or C or T
S C or G (strong) D A or G or T
Y C or T (pyrimidine) B C or G or T
K G or T (keto) N A, G, C, T (any)
DNA DNA DNA RNA
A T ? A
C G ? C
G C ? G
C G ? C
T A ? U
T A ? U
M K ? M
W W ? ?
N N ? N
C G ? C
C G ? C
T A ? U
Y R ? ?
B V ? ?
N N ? N
K M ? ?
S S ? S
T A ? U
T A ? U
7DNA DNA DNA RNA
A T ? A
C G ? C
G C ? G
C G ? C
T A ? U
T A ? U
M K ? M
W W ? ?
N N ? N
C G ? C
C G ? C
T A ? U
Y R ? ?
B V ? ?
N N ? N
K M ? ?
S S ? S
T A ? U
T A ? U
- One Dimensional
- Two Dimensional
- Three Dimensional
8DNA DNA DNA RNA
A T ? A
C G ? C
G C ? G
C G ? C
T A ? U
T A ? U
M K ? M
W W ? ?
N N ? N
C G ? C
C G ? C
T A ? U
Y R ? ?
B V ? ?
N N ? N
K M ? ?
S S ? S
T A ? U
T A ? U
9DNA DNA DNA RNA
A T ? A
T A ? U
G C ? G
C G ? C
T A ? U
T A ? U
M K ? M
W W ? ?
N N ? N
C G ? C
C G ? C
T A ? U
Y R ? ?
B V ? ?
N N ? N
K M ? ?
S S ? S
T A ? U
T A ? U
10DNA DNA DNA RNA
A T ? A
T A ? U
G C ? G
C G ? C
T A ? U
T A ? U
M K ? M
W W ? ?
N N ? N
C G ? C
C G ? C
T A ? U
Y R ? ?
B V ? ?
N N ? N
K M ? ?
S S ? S
T A ? U
T A ? U
11One-Letter Code Amino Acid Three-Letter Code One-Letter Code Amino Acid Three-Letter Code
C Cysteine Cys D Aspartic acid Asp
E Glutamic Acid Glu F Phenylalanin Phe
G Glycine Gly H Histidine His
I Isoleucine Ile K Lysine Lys
L Leucine Leu M Methionine Met
N Asparagine Asn P Proline Pro
Q Glutamine Gln R Argine Arg
S Serine Ser T Threonine Thr
V Valine Val W Tryptophan Trp
X Unknown Xxx Y Tyrosine Tyr
Z Glutamic acid or Glutimine Glutamic acid or Glutimine Glutamic acid or Glutimine Glutamic acid or Glutimine Glx
12DNA DNA DNA RNA
A T ? A
T A ? U
G C ? G
C G ? C
T A ? U
T A ? U
M K ? M
W W ? ?
N N ? N
C G ? C
C G ? C
T A ? U
Y R ? ?
B V ? ?
N N ? N
K M ? ?
S S ? S
T A ? U
T A ? U
Met (Start)
Leu
AA?, AU?, CA?, CU? -gt Asn, Lys, Ile, Met, His,
Gln, Val
Pro
UU?, UG?, UC?, CU?, CG?, CC? -gt Phe, Leu, Cys,
Stop, Trp, Ser, Leu, Arg, Pro
UCU, UGU, GCU, GGU -gt Ser, Cys, Ala, Gly
13DNA DNA DNA RNA
A T ? A
T A ? U
G C ? G
C G ? C
T A ? U
T A ? U
M K ? M
W W ? ?
N N ? N
C G ? C
C G ? C
T A ? U
Y R ? ?
B V ? ?
N N ? N
K M ? ?
S S ? S
T A ? U
T A ? U
Cys
Phe, Leu
A?C, U?C -gt Ile, Thr, Asn, Ser, Phe, Ser, Tyr, Cys
Leu
U?U, U?G, C?U, C?G -gt Phe, Ser, Tyr, Cys, Leu,
Stop, Trp, Leu, Pro, His, Arg, Gln
GUU, CUU -gt Val, Leu
14(No Transcript)
15Lecture II
- Part II One-Dimensional Strings
16Hello World
- A few perls of wisdom
- Concatenating Sequences
- Making a reverse complement
- Read sequences from data files
17Every journey starts with a first 10bp
!/usr/bin/perl w storing DNA in a variable,
and printing it out First, storing DNA in a
variable called DNA DNA CGGGCTATTC Next,
print the DNA onto the screen print
DNA Finally, specifically tell the program to
end exit
18Every journey starts with a first 10bp
!/usr/bin/perl w storing DNA in a variable,
and printing it out First, storing DNA in a
variable called DNA DNA CGGGCTATTC Next,
print the DNA onto the screen print
DNA Finally, specifically tell the program to
end exit
19Every journey starts with a first 10bp
!/usr/bin/perl w storing DNA in a variable,
and printing it out First, storing DNA in a
variable called DNA DNA CGGGCTATTC Next,
print the DNA onto the screen print
DNA Finally, specifically tell the program to
end exit
20Every journey starts with a first 10bp
!/usr/bin/perl w storing DNA in a variable,
and printing it out First, storing DNA in a
variable called DNA DNA CGGGCTATTC Next,
print the DNA onto the screen print
DNA Finally, specifically tell the program to
end exit
21Concatenating DNA Fragments
!/usr/bin/perl w Store DNA in 2
variables DNA1 AGTGCGTCGCTAG DNA2
ACCGCATGCATTG using string interpolation DNA3
DNA1DNA2 print DNA3\n\n dot
operator DNA3 DNA1 . DNA2 print
DNA3\n\n Print DNA1,DNA2,\n exit
22Transcription DNA to RNA
!/usr/bin/perl w DNA ACGACTGCACGATCGTACG
print the DNA onto the screen print
DNA\n\n Transcribe the DNA-gtRNA by
substituting all Ts with Us RNA DNA RNA
s/T/U/g print the result to the screen print
Here is the result of DNA-gtRNA\tRNA\n\n exit
23Variable
Binding Operator
Delimiters to separate the operator
RNA s/T/U/g
Substitute operator
Pattern modifier g globally i case
insensititve m multiline s single line x
permit comments o compile only once
for speed e treat replacement as Perl code
Pattern to be replaced
Replacement Text of replace pattern
24Calculating the Reverse Complement
!usr/bin/perl w DNA ACGTCAGTCGAGCT print
the starting DNA onto the screen print Here is
the starting DNA\tDNA\n\n Calculate the
reverse complement, first copying the DNA onto
a new variable called revcom revcom reverse
DNA substitute all bases by their
complement revcom s/A/T/g revcom
s/T/A/g revcom s/C/G/g revcom
s/G/C/g print revcom\n
25Calculating the Reverse Complement
!usr/bin/perl w DNA ACGTCAGTCGAGCT print
the starting DNA onto the screen print Here is
the starting DNA\tDNA\n\n Calculate the
reverse complement, first copying the DNA onto
a new variable called revcom revcom reverse
DNA substitute all bases by their
complement revcom tr/ACGTacgt/TGCAtgca/ print
revcom\n
26Reading Data from Files
Sample Data in FASTA Format gtNM_012345
Sample Data Muppet Stuffing
Protein MNIDDKLEFGDEMGOSSRTMV FGDLVRSMPHOEILAADEVL
ISHEE GLOYAKLEFGDEMGOGHDDEFGVY
27Reading Files
!/usr/bin/perl w The filename of the file
containing the sequence data proteinFilename
NM_012345.pep open the file, and associate a
filehandle with it open (PROTEINFILE IN,
proteinFilename) assign file with an input
operator muppetProtein ltPROTEINFILEgt print
the protein file print Here is the
protein\tmuppetProtein\n\n exit
28Reading Data from Files
Sample Data in FASTA Format gtNM_012345
Sample Data Muppet Stuffing
Protein MNIDDKLEFGDEMGOSSRTMV FGDLVRSMPHOEILAADEVL
ISHEE GLOYAKLEFGDEMGOGHDDEFGVY
29Lets try this again !usr/bin/perl
w proteinFilename NM_012345.pep open(PROTEI
NFILE, proteinFilename) muppetProtein
ltPROTEINFILEgt print Here is the first
line\tmuppetProtein\n\n muppetProtein
ltPROTEINFILEgt print Here is the second
line\tmuppetProtein\n\n muppetProtein
ltPROTEINFILEgt print Here is the third
line\tmuppetProtein\n\n close
PROTEINFILE exit
30Using Arrays to Read Files !usr/bin/perl
w proteinFilename NM_012345 open the
file open(PROTEINFILE, proteinFilename) Read
the sequence data from the file, and store it in
the array variable _at_protein _at_protein
ltPROTEINFILEgt print the protein onto the
screen print _at_protein close PROTEINFILE exit
31Arrays
Heres one way to declare an array _at_bases
(A,C,G,T) Now print each element of the
array print \nFirst element ,
bases0 print \nSecond Element ,
bases1 print \nThird Element ,
bases2 print \nFourth Element , bases3
32Arrays
Heres one way to declare an array _at_bases
(A,C,G,T) Now print each element of the
array in a row print \nHere are all of the
bases , _at_bases
This prints out Here are all of the bases
ACGT But, you can print them out with spaces in
between print \nHere they are with spaces ,
_at_bases
33Arrays
Heres one way to declare an array _at_bases
(A,C,G,T) Heres how to take an element
off of the end base1 pop _at_bases print Heres
the last element , base1, \n\n
The other elements still remain print \nHere
are the remaining elements , _at_bases
34Arrays
Heres one way to declare an array _at_bases
(A,C,G,T) Heres how to take an element
off of the front base2 shift _at_bases print
Heres the first element , base2, \n\n
The other elements still remain print \nHere
are the remaining elements , _at_bases
35Arrays
Heres one way to declare an array _at_bases
(A,C,G,T) Heres how you put an element
at the beginning of an array Our example will
put the last element at the beginning base1
pop _at_bases unshift (_at_bases, base1) print
Heres the last element put first ,
_at_bases\n\n
36Arrays
Heres one way to declare an array _at_bases
(A,C,G,T) Heres how you put an element
at the end of an array Our example will put the
first element at the end base1 shift
_at_bases push (_at_bases, base1) print Heres the
first element put last , _at_bases\n\n
37Arrays
Heres one way to declare an array _at_bases
(A,C,G,T) Heres how to reverse an
array _at_reverse reverse _at_bases Heres how to
get the length print scaler _at_bases,
\n\n Heres how to insert an element at an
arbitrary place splice (_at_bases, 2, 0, X)
38Arrays
Arrays can be evaluated as lists and
scalers _at_bases (A,C,G,T) Heres how
to print the array print _at_bases\n Heres how
to assign it to a scaler a _at_bases print
a Heres how to assign an array to a list (a)
_at_bases print a