Title: Introduction to Bioinformatics
1Introduction to Bioinformatics
2Genetic Material
- DNA (deoxyribonucleic acid) is the genetic
material - Information stored in DNA
- the basis of inheritance
- distinguishes living things from nonliving things
- Genes
- various units that govern living things
characteristics at the genetic level
3Nucleotides
- Genes themselves contain their information as a
specific sequence of nucleotides found in DNA
molecules - Only four different bases used in DNA molecules
- Guanine (G)
- Adenine (A)
- Thymine (T)
- Cytosine (C)
- Each base is attached to a phosphate group and a
deoxyribose sugar to form a nucleotide. - The only thing that makes one nucleotide
different from another is which nitrogenous base
it contains
Base
P
Sugar
4Nucleoside
5Nucleotides
- Complicated genes can be many thousands of
nucleotides long - All of an organisms genetic instructions, its
genome, can be maintained in millions or even
billions of nucleotides
6Orientation
- Strings of nucleotides can be attached to each
other to make long polynucleotide chains - 5 (5 prime) end
- The end of a string of nucleotides with a 5'
carbon not attached to another nucleotide - 3 (3 prime) end
- The other end of the molecule with an unattached
3' carbon
75
1
2
4
3
8Base Pairing
- Structure of DNA
- Double helix
- Paper by Watson and Crick in 1953
- Information content on one of those strands
essentially redundant with the information on the
other - Not exactly the sameit is complementary
- Base pair
- G paired with C (G ? C)
- A paired with T (A T)
9(No Transcript)
10Base Pairing
- Reverse complements
- 5' end of one strand corresponding to the 3' end
of its complementary strand and vice versa - Example
- one strand 5'-GTATCC-3
- the other strand 3'-CATAGG-5 ? 5'-GGATAC-3'
- Upstream Sequence features that are 5' to a
particular reference point - Downstream Sequence features that are 3' to a
particular reference point
11DNA Structure
12DNA Structure
- Lets see what Watson and Crick said about their
discovery
13Chromosome
- Threadlike "packages" of genes and other DNA in
the nucleus of a cell
14(No Transcript)
15Chromosome
- Different kinds of organisms have different
numbers of chromosomes - Humans
- 23 pairs
- 46 in all
16Central Dogma of Molecular Biology
- DNA information storage
- Protein function unit, such as enzyme
- Gene instructions needed to make protein
- Central dogma
17Central Dogma of Molecular Biology
- RNA (ribonucleic acid)
- Single-stranded polynucleotide
- Bases
- A
- G
- C
- U (uracil), instead of T
- Transcription
- A ? A, G ?G, C ? C, T ? U
- Lets see what Crick said about his proposal
DNA
H
RNA
OH
18(No Transcript)
19(No Transcript)
20DNA Replication (DNA ? DNA)
21DNA Replication (DNA ? DNA)
22DNA Replication Animation
Courtesy of Rob Rutherford, St. Olaf University
23Transcription (DNA ? RNA)
- Messenger RNA (mRNA)
- carries information to be translated
- Ribosomal RNA (rRNA)
- the working spine of the ribosome
- Transfer RNA (tRNA)
- the decoder keys that will translate nucleic
acids to amino acids
24 Transcription Animation
Courtesy of Rob Rutherford, St. Olaf University
25Peptides and Proteins
- mRNA ? Sequence of amino acids connected by
peptide bond - Amino acid sequence
- Peptide lt 30 50 amino acids
- Protein longer peptide
26(No Transcript)
27(No Transcript)
28Genetic Code Codon
- Codon
- 3-base RNA sequence
Stop codons
Start codon
29List of Amino Acids
- Amino acid Symbol Codon
- A Alanine Ala GC
- C Cysteine Cys UGU, UGC
- D Aspartic Acid Asp GAU, GAC
- E Glutamic Acid Glu GAA, GAG
- F Phenylalanine Phe UUU, UUC
- G Glycine Gly GG
- H Histidine His CAU, CAC
- I Isoleucine Ile AUU, AUC, AUA
- K Lysine Lys AAA, AAG
- L Leucine Leu UUA, UUG, CU
30List of Amino Acids
- Amino acid Symbol Codon
- M Methionine Met AUG
- N Asparagine Asn AAU, AAC
- P Proline Pro CC
- Q Glutamine Gln CAA, CAG
- R Arginine Arg CG, AGA, AGG
- S Serine Ser UC, AGU, AGC
- T Threonine Thr AC
- V Valine Val GU
- W Tryptophan Trp UGG
- Y Tyrosine Tyr UAU, UAC
- 20 letters, no B J O U X Z
31Codon and Reading Frame
- 4 AA letters ? 43 64 triplet possibilities
- 20 (lt 64) known amino acids
- Wobbling 3rd base
- Redundant ? Resistant to mutation
- Reading frame linear sequence of codons in a
gene - Open Reading Frame (ORF) a potential
protein-coding region of DNA sequence - a reading frame that begins with a start codon
and end at a stop codon - a series of codons in a DNA sequence
uninterrupted by the presence of a stop codon
32Open Reading Frame
- Given a nucleotide sequence
- What to begin with? ATG
- How many reading frames? 6
- 3 forward and 3 backward
- Example ATGACCGTGGGCTCTTAA
- ATG ACC GTG GGC TCT TAA ? M T V G S
- TGA CCG TGG GCT CTT AA ? P W A L
- GAC CGT GGG CTC TTA A ? D R G L L
- Figure out the three backward reading frames
- In random sequence, a stop codon will follow a
Met in 20 AA - Substantially longer ORFs are often genes or
parts of them
33Translation (RNA ? Protein)
34 Translation Animation
Courtesy of Rob Rutherford, St. Olaf University
35Gene Expression
- Gene expression
- Process of using the information stored in DNA to
make an RNA molecule and then a corresponding
protein - Cells controlling gene expression by
- reliably distinguishing between those parts of an
organisms genome that correspond to the
beginnings of genes and those that do not - determining which genes code for proteins that
are needed at any particular time.
36Promoter
- The probability (P) that a string of nucleotides
will occur by chance alone if all nucleotides are
present at the same frequency P (1/4)n, where n
is the strings length - Promoter sequences
- Sequences recognized by RNA polymerases as being
associated with a gene - Example
- Prokaryotic RNA polymerases scan along DNA
looking for a specific set of approximately 13
nucleotides marking the beginning of genes - 1 nucleotide that serves as a transcriptional
start site - 6 that are 10 nucleotides 5' to the start site,
and - 6 more that are 35 nucleotides 5' to the start
site
37Gene Regulation
- Regulatory proteins
- Capable of binding to a cells DNA near the
promoter of the genes - Control gene expression in some circumstances but
not in others - Positive regulation
- binding of regulatory proteins makes it easier
for an RNA polymerase to initiate transcription - Negative regulation
- binding of the regulatory proteins prevents
transcription from occurring
38Promoter and Regulatory Example
39Gene Structure
40Exons and Introns
41Exons and Introns Example
42General sequence of steps in the formation of
eukaryotic mRNA
Courtesy of Ben King, Jackson Lab
43Protein Structure and Function
- Genes encode the recipes for proteins
44Protein Structure and Function
- Proteins are amino acid polymers
45Proteins Molecular Machines
- Proteins in your muscles allows you to
movemyosinandactin
46Proteins Molecular Machines
- Enzymes(digestion, catalysis)
- Structure (collagen)
47Proteins Molecular Machines
- Signaling(hormones, kinases)
- Transport(energy, oxygen)
48Protein Structures
49Information Flow in Nucleated Cell
50Point Mutation Example Sickle-cell Disease
- Wild-type hemoglobin
- DNA
- 3----CTT----5
- mRNA
- 5----GAA----3
- Normal hemoglobin
- ------Glu------
- Mutant hemoglobin
- DNA
- 3----CAT----5
- mRNA
- 5----GUA----3
- Mutant hemoglobin
- ------Val------
51image credit U.S. Department of Energy Human
Genome Program, http//www.ornl.gov/hgmis.
52Thinking about the Human Genome
- 50 is high copy number repeats
- About 10 is transcribed
- (made into RNA)
- Only 1.5 actually codes for protein
- 98.5 Junk DNA
53Thinking about the Human Genome
- 3.2X109 bp
- If each base were one mm long
- 2000 miles, across the center of Africa
- Average gene about 30 meters long
- Occur about every 270 meters between them
- Once spliced the message would only be 1meter
long