Title: CS 177 DNA, RNA, protein overview
1CS 177 DNA, RNA, protein overview
DNA Â RNA Â Â Mutations Amino acids, protein
structure
2DNA, RNA, protein overview
Questions about the genome in an organism How
much DNA, how many nucleotides? How many genes
are there? What types of proteins appear to be
coded by these genes?
Questions about the proteome What proteins are
present? Where are they? When are they present
- under what conditions?
DNA Â RNA Â Â Mutations Amino acids, protein
structure
3DNA, RNA, protein overview
Lecture 2 DNA and its components RNA and
its components Mutations Amino acids, review
of protein structure
DNA Â RNA Â Â Mutations Amino acids, protein
structure
4DNA overview
DNA
deoxyribonucleic acid
4 bases
Pyrimidine (C4N2H4)
Purine (C5N4H4)
Pyrimidine (C4N2H4)
A T C G
Thymine
Cytosine
Nucleoside
Nucleotide
base
sugar (deoxyribose)
base
sugar
phosphate
DNA Â RNA Â Â Mutations Amino acids, protein
structure
Numbering of carbons?
5Linking nucleotides
Hydrogen bonds
N-H------N N-H------O
Linking nucleotides The 3-OH of one nucleotide
is linked to the 5-phosphate of the next
nucleotide
What next?
Thymine
Adenine
Cytosine
DNA Â RNA Â Â Mutations Amino acids, protein
structure
Guanine
6Base pairing
A
T
Base pairing (Watson-Crick) A/T (2 hydrogen
bonds) G/C (3 hydrogen bonds)
C
G
Always pairing a purine and a pyrimidine yields a
constant width
A
T
DNA base composition A G T C (Chargaffs
rule)
T
A
DNA Â RNA Â Â Mutations Amino acids, protein
structure
C
G
7DNA Â RNA Â Â Mutations Amino acids, protein
structure
8DNA conventions
1. DNA is a right-handed helix
2. The 5 end is to the left by convention
5 3
-ATCGCAATCAGCTAGGTT-
sense (forward)
antisense (reverse)
-TAGCGTTAGTCGATCCAA-
3 5
5 -ATCGCAATCAGCTAGGTT- 3 3 -TAGCGTTAGTCGATCCAA-
5
DNA Â RNA Â Â Mutations Amino acids, protein
structure
9DNA structure
Some more facts
1. Forces stabilizing DNA structure
Watson-Crick-H-bonding and base stacking
(planar aromatic bases overlap geometrically and
electronically ? energy gain)
2. Genomic DNAs are large molecules Eschericia
coli 4.7 x 106 bp 1 mm contour length Human
3.2 x 109 bp 1 m contour length
3. Some DNA molecules (plasmids) are circular and
have no free ends mtDNA bacterial DNA (only
one circular chromosome)
4. Average gene of 1000 bp can code for average
protein of about 330 amino acids
5. Percentage of non-coding DNA varies greatly
among organisms
Organism Base pairs Genes
Non-coding DNA
small virus 4 x 103 3
very little
typical virus 3 x 105 200
very little
DNA Â RNA Â Â Mutations Amino acids, protein
structure
bacterium 5 x 106 3000
10 - 20
yeast 1 x 107 6000 gt
50
human
3.2 x 109 30,000? 99
amphibians lt 80 x 109 ?
?
plants lt 900 x 109 23,000 - gt50,000
gt 99
103 major types of RNA
messenger RNA (mRNA) template for protein
synthesis transfer RNA (tRNA) adaptor molecules
that decode the genetic coderibosomal RNA
(rRNA) catalyzing the synthesis of proteins
ribonucleic acid
4 bases
A U C G
DNA Â RNA Â Â Mutations Amino acids, protein
structure
11Base interactions in RNA
Base pairing U/A/(T) (2 hydrogen bonds) G/C
(3 hydrogen bonds)
RNA base composition A G U C
/ Chargaffs rule does not apply
(RNA usually prevails as single strand)
RNA structure - usually single stranded - many
self-complementary regions ? RNA commonly
exhibits an intricate secondary structure
(relatively short, double helical segments
alternated with single stranded regions) -
complex tertiary interactions fold the RNA in its
final three dimensional form - the folded RNA
molecule is stabilized by interactions (e.g.
hydrogen bonds and base stacking)
DNA Â RNA Â Â Mutations Amino acids, protein
structure
12RNA structure
Primary structure
formed by unpaired nucleotides
Secondary structure
double helical RNA (A-form with 11 bp per turn)
duplex bridged by a loop of unpaired nucleotides
nucleotides not forming Watson-Crick base pairs
unpaired nucleotides in one strand,other strand
has contiguous base pairing
DNA Â RNA Â Â Mutations Amino acids, protein
structure
three or more duplexes separated by
singlestranded regions
tertiary interaction between bases of hairpin
loopand outside bases
13RNA structure
Primary structure
Secondary structure
Tertiary structure
C
D
G
E
F
B
DNA Â RNA Â Â Mutations Amino acids, protein
structure
A
14RNA structure
How to predict RNA secondary/tertiary structure?
Probing RNA structure experimentally - physical
methods (single crystal X-ray diffraction,
electron microscopy) - chemical and enzymatic
methods - mutational analysis (introduction of
specific mutations to test change in some
function or protein-RNA interaction)
Thermodynamic prediction of RNA structure - RNA
molecules comply to the laws of thermodynamics,
therefore it should be possible to deduce RNA
structure from its sequence by finding the
conformation with the lowest free energy -
Pros only one sequence required no difficult
experiments does not rely on alignments -
Cons thermodynamic data experimentally
determined, but not always accurate possible
interactions of RNA with solvent, ions, and
proteins
Comparative determination of RNA structure -
basic assumption secondary structure of a
functional RNA will be conserved in the
evolution of the molecule (at least more
conserved than the primary structure) when a
set of homologous sequences has a certain
structure in common, this structure can be
deduced by comparing the structures possible from
their sequences - Pros very powerful in finding
secondary structure, relatively easy to use, only
sequences required, not affected by
interactions of the RNA and other molecules -
Cons large number of sequences to study
preferred, structure constrains in fully
conserved regions cannot be inferred, extremely
variable regions cause problems with alignment
DNA Â RNA Â Â Mutations Amino acids, protein
structure
15Amino acids/proteins
The central dogma of modern biology DNA ? RNA
? protein
Getting from DNA to protein Two parts 1.
Transcription in which a short portion of
chromosomal DNA is used to make a RNA
molecule small enough to leave the nucleus.
2. Translation in which the RNA code is
used to assemble the protein at the
ribosome
The genetic code
- The code problem 4 nucleotides in RNA, but 20
amino acids in proteins
- Bases are read in groups of 3 ( a codon)
- The code consists of codons
64 (43 64)
- All codons are used in protein synthesis - 20
amino acids - 3 stop codons
- AUG (methionine) is the start codon (also used
internally)
- The code is non-overlapping and punctuation-free
DNA Â RNA Â Â Mutations Amino acids, protein
structure
- The code is degenerate (but NOT ambiguous)
each amino acid is specified by at least one
codon
- The code is universal (virtually all organisms
use the same code)
16The genetic code
methionine and tryptophan
five proline, threonine,valine, alanine,
glycine
DNA Â RNA Â Â Mutations Amino acids, protein
structure
AUG
17Amino acids
Hydrophobic
G
A
V
L
I
DNA Â RNA Â Â Mutations Amino acids, protein
structure
M
F
W
P
18Amino acids
Hydrophyllic
S
T
C
Y
N
Q
DNA Â RNA Â Â Mutations Amino acids, protein
structure
E
D
K
R
H
19Reading frames
Reading frame (also open reading frame) The
stretch of triplet sequence of DNA that
potentially encodes a protein. The reading frame
is designated by the initiation or start codon
and is terminated by a stop codon.
- mutations within an open reading frame that
delete or add nucleotides can disrupt the
reading frame (frameshift mutation)
DNA Â RNA Â Â Mutations Amino acids, protein
structure
Wildtype CAG AUG AGG UCA GGC AUA GAG
gln met arg ser gly ile
glu  Mutant CAG AUG AGU CAG GCA UAG AG
gln met ser gln ala
20Mutations
Mutation
any heritable change in DNA
Sources of mutation
Spontaneous mutations mutations occur for
unknown reasons
Induced mutations exposure to substance
(mutagen) known to cause mutations,
e.g. X-rays, UV light, free radicals
Mutations may influence one or several base
pairs a) Nucleotide substitutions (point
mutation) 1) Transitions (Pu ? Pu Py ? Py)2)
Transversions (Pu ? Py)
In-class exercise How many transition and
transversion events are possible?
2 transitions T ? C A ? G4 transversions
T ? A T ? G C ? A C ? G
b) Insertion or deletion (indels) - one
to many bases can be involved -
frequently associated with repeated sequences
(hot spots) - lead to frameshift in
protein-coding genes, except when N 3X
- also caused by insertion of transposable
elements into genes
DNA Â RNA Â Â Mutations Amino acids, protein
structure
Weighting of mutation events plays important
role for phylogenetic analyses (model of sequence
evolution)
21Mutations
Mutations may influence phenotype a) Silent (or
synonymous) substitution - nucleotide
substitution without amino acid change- no
effect on phenotype- mostly third codon
position- other possible silent substitutions
changes in non-coding DNA
b) Replacement substitution - causes
amino acid change - neutral protein
still functions normally - missense
protein loses some functions (e.g. sickle cell
anemia mutation in ß-globin)
c) Sense/nonsense substitution - sense
involves a change from a termination codon
to one that codes for an amino
acid - nonsense creates premature
termination codon
Mutation rates a measure of the frequency of a
given mutation per generation - mutation rates
are usually given for specific loci (e.g. sickle
cell anemia)- the rate of nucleotide
substitutions in humans is on the order of 1 per
100,000,000- range varies from 1 in 10,000 to 1
in 10,000,000,000- every human has about 30 new
mutations involving nucleotide substitutions-
mutation rate is about twice as high in male as
in female meiosis
DNA Â RNA Â Â Mutations Amino acids, protein
structure
22Mutations
A single amino acid substitution in a protein
causes sickle-cell disease
DNA Â RNA Â Â Mutations Amino acids, protein
structure
23Review of protein structure
DNA Â RNA Â Â Mutations Amino acids, protein
structure
Making a polypeptide chain
24Review of protein structure
Primary structure
Proteins are chains of amino acids joined by
peptide bonds
The structure of two amid acids
The N-C?-C sequence is repeated throughout the
protein, forming the backbone
The bonds on each side of the C? atom are free to
rotate within spatial constrains,the angles of
these bonds determine the conformation of the
protein backbone
DNA Â RNA Â Â Mutations Amino acids, protein
structure
The R side chains also play an important
structural role
25Review of protein structure
Secondary structure
Interactions that occur between the CO and N-H
groups on amino acidsMuch of the protein core
comprises ? helices and ? sheets, folded into a
three-dimensional configuration- regular
patterns of H bonds are formed between
neighboring amino acids- the amino acids have
similar angles- the formation of these
structures neutralizes the polar groups on each
amino acid- the secondary structures are tightly
packed in a hydrophobic environment- Each R side
group has a limited volume to occupy and a
limited number of interactions with other R
side groups
? sheet
? helix
DNA Â RNA Â Â Mutations Amino acids, protein
structure
26 Secondary structure
Other Secondary structure elements(no
standardized classification)
- random coil
- loop
- others (e.g. 310 helix, ?-hairpin, paperclip)
Super-secondary structure
- In addition to secondary structure elements
that apply to all proteins (e.g. helix, sheet)
there are some simple structural motifs in some
proteins
DNA Â RNA Â Â Mutations Amino acids, protein
structure
- These super-secondary structures (e.g.
transmembrane domains, coiled coils,
helix-turn-helix, signal peptides) can give
important hints about protein function
27Secondary structure
Structural classification of proteins (SCOP)
Class 2 mainly beta
Class 1 mainly alpha
Class 3 alpha/beta
Class 4 few secondary structures
DNA Â RNA Â Â Mutations Amino acids, protein
structure
28Secondary structure
Alternative SCOP
Class ? antiparallel ? sheets
Class ?/? mainly ? sheetswith intervening ?
helices
Class ? only ? helices
Class ?? mainlysegregated ? helices
withantiparallel ? sheets
Membrane structurehydrophobic ? helices
withmembrane bilayers
Multidomain containmore than one class
DNA Â RNA Â Â Mutations Amino acids, protein
structure
29Review of protein structure
Q If we have all the Psi and Phi angles in a
protein, do we then have enough information
to describe the 3-D structure?
DNA Â RNA Â Â Mutations Amino acids, protein
structure
30Tertiary structure
The tertiary structure describes the organization
in three dimensionsof all the atoms in the
polypeptide The tertiary structure is
determined by a combination of different types of
bonding(covalent bonds, ionic bonds, h-bonding,
hydrophobic interactions, Van der Waals forces)
between the side chains Many of these bonds
are very week and easy to break, but hundreds or
thousands working together give the protein
structure great stability If a protein
consists of only one polypeptide chain, this
level then describes thecomplete structure
DNA Â RNA Â Â Mutations Amino acids, protein
structure
31Tertiary structure
Proteins can be divided into two general classes
based on their tertiary structure - Fibrous
proteins have elongated structure with the
polypeptide chains arranged in long strands.
This class of proteins serves as major structural
component of cells Examples silk, keratin,
collagen
- Globular proteins have more compact, often
irregular structures. This class of proteins
includes most enzymes and most proteins
involved in gene expression and regulation
DNA Â RNA Â Â Mutations Amino acids, protein
structure
32Quaternary structure
The quaternary structure defines the conformation
assumed by a multimeric protein.The individual
polypeptide chains that make up a multimeric
protein are often referred toas protein
subunits. Subunits are joined by ionic, H and
hydrophobic interactions ExampleHaemoglobin(4
subunits)
DNA Â RNA Â Â Mutations Amino acids, protein
structure
33Structure displays
Common displays are (among others) cartoon,
spacefill, and backbone
spacefill
backbone
cartoon
DNA Â RNA Â Â Mutations Amino acids, protein
structure
34Summary protein structure
DNA Â RNA Â Â Mutations Amino acids, protein
structure
35Next week
First quiz Lecture 1
- Bioinformatics definitions - The human genome
project
Lecture 2
- - DNA structure
- - RNA structure
- - Mutations
- Amino acids
- - Proteins
DNA Â RNA Â Â Mutations Amino acids, protein
structure