Title: Biology for Bioinformatics
1Biology for Bioinformatics
2The Big Picture of Biology
3What is Life?
- NASA life is a self-sustaining set of chemical
reactions capable of reproducing similar copies
of itself. - We can separate this into
- chemistry movement of electrons between atoms
- reproduction, which immediately leads to natural
selection offspring that are better at surviving
and reproducing end up taking over in future
generations.
4Chemistry
- Life is applied chemistry. All living systems
are based on the interactions of chemical
compounds, the sharing of electrons between
atoms. - Life occurs in cells, with a membrane separating
the inside from the outside. - membrane is impermeable to almost everything (but
not gasses or water). - other molecules enter or leave using specific
channels - Homeostasis maintaining a constant internal
state despite external changes. Homeostasis
requires - Metabolism capture matter and energy from the
outside world, and use it to maintain, grow,
reproduce. - Irritability Detect and respond to
environmental changes.
5All life on Earth is similar
- Current belief life originated on Earth at least
3.5 billion years ago, not long after the Earths
surface cooled. - There may have been many forms of life early in
our history, many semi-independent origins, but
we believe all life on Earth today can be traced
to a single common ancestor. Sometimes referred
to as the Last Universal Common Ancestor (LUCA).
- All organisms are made of same molecules in
similar structures DNA used for instructions and
heredity, proteins do the necessary work, cells
are surrounded by lipid membranes - Many seemingly arbitrary decisions, such as the
handedness of molecules and the genetic code, are
identical in all organisms that have been
studied. - Individuals are born and then die, but each
individuals life comes from previous life.
Life is an unbroken chain of living cells
extending back 3 billion years ago to our
original common ancestor. - DNAs point of view individuals exist merely as
temporary carriers of the DNA.
6Reproduction
- All the information needed to produce a living
thing is coded in its DNA. - this is a well-supported belief, but as is usual
in biology, there is some fuzziness around the
edges - To reproduce, organisms replicate their DNA, then
use the DNA instructions to create a new
organism. - for microorganisms, this usually means growing
large, and then splitting into 2 halves, each of
which gets a copy of the DNA. - fancier process in multicellular organisms
7Evolution by Natural Selection
- Offspring resemble their parents because the
offspring are built from their parents genes. - Random changes in the DNA (mutations) occur at a
slow but steady rate. This produces a lot of
variation within a species. - Some members of a species are more fit better
able to survive and reproduce than other members
of the species. This is natural selection the
more fit individuals are selected by Nature to
reproduce more than the less fit individuals. - this can also happen by artificial selection,
where a human decides which individuals will be
allowed to reproduce. - The genes from the more fit individuals will
slowly take over the species. - Thus, the genes within a species slowly change
(or occasionally, change rapidly). - However, most mutations have no effect on
fitness, and all organisms contain large numbers
of DNA positions that are different from other
members of their species.
8DNA Sharing
- An important consideration DNA is traded between
different organisms, so innovations can spread
very widely. - which is the reason antibiotic resistance is so
widespread among pathogenic bacteria. - higher organisms use a formal sexual mechanism
each offspring gets half its DNA from each
parent. Also, DNA is almost always confined
within a species. - bacteria share DNA less formally, with small
segments being passed around by several different
mechanisms, often between species. - cross-species transfer is called lateral or
horizontal transfer - This process is quite widespread, and I have
heard estimates that up to 1/3 of all genes in
bacteria have been transferred in from another
species, as opposed to have come from the common
ancestor by vertical transfer, regular
parent-to-offspring descent.
9Diversity of Life
- Lots of different ways to make a living
- Several million different species
- Species a group of similar organisms that can
reproduce with each other but not with others. - Easy to define in sexually-reproducing organisms,
but not in bacteria - Speciation one species splits into two different
species very easily - Isolate two groups so they cant mate with each
other - Different random mutations quickly cause
differences in sexual attractiveness and
fertility - When brought back together, the two groups no
longer want to mate with each other, or they
cant produce fertile offspring. - They are now two different species.
- Phylogeny the branching pattern of descent of a
species starting at the universal last common
ancestor. - a binary tree each parent node has 2 offspring
nodes. Of course, any given species may be
extinct.
10Basic Division of Life
- Prokaryotes simple cells with no internal
compartments especially, no separate nucleus
that contains the DNA. - Eukaryotes more complex cells with internal
compartments and membranes, with the DNA
contained in the nucleus, a special
membrane-bound compartment. - Viruses have no metabolism, arent composed of
cells, are parasites use cells to reproduce - called bacteriophage or just phage in
bacteria - conventional theory viruses are escaped bits of
cellular machinery - but viruses often have genes with no homologues
in living cells
11Major Sub-divisions
- Prokaryotes Eubacteria (common bacteria found
everywhere) and Archaea (special forms found in
extreme conditions such as hot, high salt,
acidic). - Eukaryotes Protists (single celled), Fungi
(digest their food externally), Plants (produce
food from sunlight), Animals (move under their
own power for part of their lives). - Protist is a catch-all group, containing many
different lineages that are no more related than
animals and plants are. Also, multicellular
seaweeds are considered protists. - In contrast, plants, animals and fungi each seem
to have had a single common ancestor.
12(No Transcript)
13Older Trees
Above is Darwins original tree of life, from his
1837 notebook.
To the right is A tree from Haeckel (1866)
14Another View
I just like this one. I dont know its origin I
found it on a creationist web site.
15Ring of Life
- This is rather speculative, based on the idea
that eukaryotes arose from a fusion of a
bacterium and an archaean. - Bacteria contributed the basic metabolic pathways
- Archaea contributed information handling system
16Biological Molecules
17Molecules in the Cell
- The most common molecule in cells is water, which
is the universal solvent that all the other
molecules are dissolved in. - Various small ions, dissolved salts, keep the
cell in osmotic balance. - The main positively charged ions are sodium (Na)
potassium (K), magnesium (Mg2) , and calcium
(Ca2). - The main negative ions are chloride (Cl-) ,
bicarbonate (HCO3-) , and phosphate (PO4-). - Four main classes of macromolecule nucleic
acids, proteins, polysaccharides, and lipids.
These molecules are usually in the form of
polymers, long chains of similar subunits, which
are called monomers. - Miscellaneous small molecules that act as
helpers (co-factors) in enzymatic reactions.
Many of these are vitamins co-factors we
humans cant synthesize for ourselves.
18Carbohydrates
- Sugars and starches saccharides.
- The name carbohydrate comes from the
approximate composition a ratio of 1 carbon to 2
hydrogens to one oxygen (CH2O). For instance the
sugar glucose is C6H12O6. - Carbohydrates are composed of rings of 5 or 6
carbons, with alcohol (-OH) groups attached.
This makes most carbohydrates water-soluble. - Carbohydrates are used for energy production and
storage, and for structure. - Glucose, a simple 6-carbon sugar, is the primary
fuel source for most living things. It is broken
down by the process of glycolysis. - Starches are glucose polymers, used to store
fuel. - Structural carbohydrates include cellulose
(another glucose polymer) and chitin, the outer
coating of insects and many fungi.
19Lipids
- Lipids are the main non-polar component of cells.
Mostly hydrocarbonscarbon and hydrogen. - They are used primarily as energy storage and
cell membranes. - Energy storage triglycerides (fats). Composed
of glycerol attached to 3 fatty acid molecules.
Fatty acids are long chains of carbon and
hydrogen. Double bonds kink the chains and lower
the melting temperature. - Cell membranes are composed primarily of
phospholipids. These have 2 fatty acids attached
to glycerol, plus a phosphate-containing polar
head group. - The heads stick into the water outside the
membrane, while the non-polar tails stay in the
hydrophobic interior of the membrane. This acts
as a waterproof coat that keeps most other
molecules from passing through the membrane. The
membrane consists of 2 layers of phospholipids
the lipid bilayer.
20Proteins
- The most important type of macromolecule.
- Roles
- Structure collagen in skin, keratin in hair,
crystallin in eye. - Enzymes all metabolic transformations, building
up, rearranging, and breaking down of organic
compounds, are done by enzymes, which are
proteins. - Transport oxygen in the blood is carried by
hemoglobin, everything that goes in or out of a
cell (except water and a few gasses) is carried
by proteins. - Also nutrition (egg yolk), hormones, defense,
movement - Proteins are composed of linear chains of amino
acids. - There are 20 different kinds of amino acids in
proteins. Each one has a functional group (the
R group) attached to it. - Different R groups give the 20 amino acids
different properties, such as charged ( or -),
polar, hydrophobic, etc. - The different properties of a protein come from
the arrangement of the amino acids.
21Protein Structure
- A polypeptide is one linear chain of amino acids.
A protein may contain one or more polypeptides.
Proteins also sometimes contain small helper
molecules such as heme. - Each gene codes for one polypeptide
- After the polypeptides are synthesized by the
cell, they spontaneously fold up into a
characteristic conformation which allows them to
be active. The proper shape is essential for
active proteins. For most proteins, the amino
acids sequence itself is all that is needed to
get proper folding. - Proteins fold up because they form hydrogen bonds
between amino acids. The need for hydrophobic
amino acids to be away from water also plays a
big role. Similarly, the charged and polar amino
acids need to be near each other. - The joining of polypeptide subunits into a single
protein also happens spontaneously, for the same
reasons. - Enzymes are usually roughly globular, while
structural proteins are usually fiber-shaped.
Proteins that transport materials across
membranes have a long segment of hydrophobic
amino acids that sits in the hydrophobic interior
of the membrane.
22Nucleic Acids
- Only 2 types DNA and RNA
- Both DNA and RNA are linear chains of nucleotides
- DNA 2 chains running anti-parallel twisted
together into a double helix - RNA usually 1 chain of nucleotides, with
secondary structure caused by base pairing
between nucleotides on the same strand.
23Nucleotides
- Each nucleotide has 3 parts sugar, phosphate,
base. - Sugar is ribose (RNA) or deoxyribose (DNA)
- Bases are attached to the 1 carbon of the sugar
- Base (sometimes called nitrogenous base) is
purine or pyrimidine. - Purines 2 carbon-nitrogen rings, adenine (A) or
guanine (G) - Pyrimidines 1 carbon-nitrogen ring, cytosine
(C), thymine (T) (DNA only), uracil (RNA only) - In the backbone, nucleotides are bonded together
between the phosphate on the 5' carbon and the
-OH on the 3' carbon. - Thus each nucleic acid has a free 5' phosphate on
one end and a free 3' -OH on the other. - Used to write the polarity of the molecule
each nucleotide chain has a 5 end and a 3 end. - DNA has -H on 2' carbon of the sugar RNA has
-OH. - This difference makes DNA more stable and allows
it to form a regular double helix structure
24Base Pairing
- A bonds with T (or U) G bonds with C. Held
together by hydrogen bonds - A-T has 2 hydrogen bonds G-C has 3. This makes
G-C stronger and more stable at high
temperatures. - In DNA, 2 antiparallel chains are held together
by this pairing. - Implies that the amount of A amount of T, and G
C in DNA. - One characteristic of genomes is their GC
content the percentage of G and C. This can vary
between from about 20 to 70. Eukaryotes
generally have GC contents around 40. Also,
there are large scale variations in GC content
along the length of chromosomes called
isochores, which may be the result of
horizontal gene transfer. - RNA is usually single stranded and held in a
folded conformation by base pairing within the
RNA molecule. e.g. tRNA.
25Genetic Information Processing
26Central Dogma of Molecular Biology
- Concerns the flow of information in the cell.
- DNA is long term information storage
- RNA is produced from individual genes when needed
by the cell - Protein is the actual usable product of each gene
27Replication
- Main enzyme DNA polymerase. Several other
enzymes also involved (see below) - Replication is semiconservative
- DNA helix is opened up and unwound by a helicase
- Each old strand gets a new strand built on it.
- DNA polymerase can only add bases to the 3 OH
group on a pre-existing nucleic acid that is
base-paired with the template strand it is
copying. This means that DNA synthesis starts
with the enzyme primase synthesizing a short RNA
primer. DNA polymerase then adds bases to this
primer. - DNA polymerase can only add new bases to 3' end,
so one strand is synthesized continuously
(leading strand) and the other is built up of
short fragments discontinuous synthesis on the
lagging strand. - The short (100-1000 bp ) DNA fragments, called
Okazaki fragments, are built in the opposite
direction of fork movement and then ligated
together (by DNA ligase). - In eukaryotes, the whole process starts at
several points on each chromosome and goes in
both directions. Takes 8 hr to complete. - In bacteria (which have circular chromosomes),
there is a single origin of replication, with
replication proceeding in both directions and
meeting at the opposite side of the circular
chromosome.
28Replication
29Transcription
- Transcription is making an RNA copy of a short
region of DNA. - Only part of the DNA is transcribed. A
transcribed region is called a transcription
unit, which is approximately equivalent to
gene. - most transcription units code for proteins
- However, some code for functional RNAs that never
get translated into proteins (RNA genes). - When transcription starts, the DNA double helix
is unwound and only one strand is used as a
template for the RNA. - the template DNA strand is called the antisense
strand, and the other DNA strand, not used in
transcription is called the sense strand. This
is because the sense strand has the same base
sequence as the RNA transcript. - Genes are oriented from 5' to 3' based on
transcription direction (even though the template
DNA is read 3' to 5'). Thus, 5' end of a gene is
where transcription starts. Upstream and
downstream also relate to this direction. - In the scientific literature, only the sense
strand is written, with the 5 end on the left. - The antisense strand is implied.
- Sequences are written as DNA (using T) and not
RNA (using U).
30Transcription Process
- The primary enzyme used for transcription is RNA
polymerase - RNA polymerase binds to a promoter sequence just
upstream from the transcription start point, with
the help of several proteins called transcription
factors. - some transcription factors are used for all
transcriptions, but others are very specific for
cell type, hormonal stimulus, developmental time,
etc. - RNA polymerase then moves in a 3 direction,
adding new RNA nucleotides to the growing RNA
molecule. - New bases are always added to the 3 end of the
growing RNA molecule - In prokaryotes, transcription ends at a specific
terminator sequence - In eukaryotes, there is no definite transcription
terminator, but the RNA molecules are cut off at
a poly-A addition site (part of RNA processing)
31Gene Regulation
- What makes cells within an organism different
from each other is which genes are being
expressed and which are not gene regulation. - Most of the control of gene expression occurs at
the point of transcription. - Transcription regulation is based on interactions
between transcription factors (proteins) and DNA
sequences near the gene . - transcription factors are trans-acting they
diffuse freely through the cell and affect any
DNA sequence they can bind to. - in contrast, DNA sequences near the gene are
cis-acting they can only affect transcription of
the gene they are next to. (and not, for example,
the same gene on the other homologous
chromosome). - Types of cis-acting sequence
- promoters several short regions within 100 bp of
transcription start, especially the TATA box,
which are all similar to TATAAA. - enhancers can be up to several kilobases from
the gene, either upstream or downstream, and in
either orientation. Increase transcription
level. - silencers similar to enhancers, but opposite
effect. - Regulatory sequences are short consensus
sequences imperfect variants on a common
sequence - Genes are also affected by the region of
chromosome they are in some areas are highly
condensed and unable to be transcribed (depending
on cell type).
32RNA Processing
- In prokaryotes, transcription and translation are
essentially simultaneous translation of the
messenger RNA starts before transcription is
completed. - In eukaryotes, transcription occurs in the
nucleus (where the DNA is), and translation
occurs in the cytoplasm. This de-coupling of
transcription and translation requires several
steps specific to eukaryotes RNA processing - The initial RNA molecule produced by
transcription is called a primary transcript.
It is an exact copy of the DNA. Before it can be
translated into protein, it must be processed,
then transported to the cytoplasm. RNA
processing has 3 steps - Splicing out of introns, which are non-protein
coding regions in the middle of protein-coding
genes. . Most eukaryotic genes are interrupted
by introns up to 99 of the gene in some cases.
Exons are the regions of genes that code for
protein. Primary transcript contains introns,
but spliceosomes (RNA/protein hybrids) splice out
the introns. There are signals on the RNA for
this, but it can vary between tissues
(alternative splicing). - 5' cap a 7-methyl guanine linked 5 to 5 with
the first nucleotide of the RNA. - 3' poly A tail several hundred adenosines added
to 3 end. The signal for poly A marks end of
gene, but transcription continues past this
without having a definite end point. All except
histone genes have poly A. Stability of mRNA is
probable reason for it. - After processing, the RNA is called messenger
RNA, and it gets transported to the cytoplasm.
33Intron Splicing and RNA Processing Overview
34Translation
- After transcription, the messenger RNA molecules
are translated into polypeptides. That is, the
base sequence of the mRNA is used as a code to
construct an entirely different molecule, the
polypeptide. - The polypeptide is synthesized from N-terminus to
C-terminus, based on free -NH2 and -COOH groups
on terminal amino acids of the polypeptide. The
polypeptide is collinear with the mRNA the
N-terminal of the polypeptide corresponds to the
5 end (beginning) of the mRNA. correspond to
the ribosome moving down the messenger RNA from
5 end to 3 end. - Translation is performed by the ribosome, a
protein/RNA hybrid structure. - Each group of 3 RNA bases is a codon. Each codon
codes for a specific amino acid. - The ribosome starts translation at a start codon
- There are untranslated regions (UTRs) at both
ends of the mRNA. - Start codons are also used internally in the
polypeptides. - In eukaryotes, translation starts at first AUG in
the messenger RNA, goes to first stop codon.
(So, only one polypeptide per messenger RNA.) - In bacteria (but not archaea, which are like
eukaryotes in this), AUG, GUG, and UUG can all be
used as start codons. - The ribosome then moves down the mRNA, adding one
new amino acid for each codon. - Translation stops when the ribosome reaches a
stop codon. - Most mRNA molecules are translated multiple times.
35More on Translation
- A key actor in translation is transfer RNA
short RNA molecules that act as adapters between
codons on the mRNA and the amino acids. - The ribosome holds the growing polypeptide chain
attached to a transfer RNA, and it also holds a
transfer RNA carrying the next amino acid. - At each step in the synthesis process, the
ribosome catalyzes the transfer of the growing
polypeptide to the next amino acid
36Genetic Code
- Three bases of DNA or RNA code for 1 amino acid
codon. - Since there are 4 bases, there are 43 64
codons. 61 of these code for amino acids, while
the last 3 are stop codons that end the
translation process. - Most amino acids have more than 1 possible codon
code is degenerate. Most variation is in third
position of codon. - Nearly all organisms use the same code, with
minor variations mostly in mitochondria and
chloroplasts. - mitochondria often use a slightly altered genetic
code - All translations start with methionine (N-formyl
methionine in bacteria), regardless of which
start codon is used (only AUG in eukaryotes).
37Reading Frames
- Codons are groups of 3 bases. Since translation
can start at any nucleotide, the same region of
DNA can be read in 3 ways, starting one base
apart. Each of these 3 modes is a reading frame.
- The DNA might also be read on the opposite
strand, giving a total of 6 possible reading
frames. - Genes occur in open reading frames (ORFs), areas
where there are no stop codons. Genes end at the
first stop codon that exists in their reading
frame. - 3 out of every 64 codons is a stop codon, so
large open reading frames are rare in random,
unselected DNA. Since genes are under selection
pressure, most long open reading frames contain
genes.
38Protein folding
- After they have been synthesized, most proteins
fold spontaneously to the most stable (lowest
energy) configuration. - Some proteins are assisted by chaperone proteins,
which also assist in recovery from heat shock by
causing re-folding to proper configuration. - Thus, chaperone proteins are also often called
heat shock proteins. (Actually, these proteins
were first discovered in Drosophila as proteins
synthesized in large amounts when the flies were
given a heat shock.) - However, predicting protein structure from the
amino acid sequence is (so far) an unsolved and
very difficult problem in biochemistry.
39Post-translational modification
- Various chemical modifications occur on many
proteins - Glycosylation adding sugars. occurs in smooth
ER. Mostly for proteins that are secreted or on
outside of plasma membrane or inside of
lysosomes. Large blocks of sugars added.
Proteins called glycoproteins. - Phosphorylation adding phosphates. An important
way to active various enzymes , especially for
turning genes on and off. On serine, threonine,
or tyrosine. - Adding lipids so proteins get anchored to
membrane. Various names depending on which lipid
is added. For example, myristoyation,
prenylation, palmitoylation, etc. Proteins
called lipoproteins. - Others as well.
- Cleavage. Often the N-terminal Met is removed.
Other regions can also be removed middle region
of insulin, removal of signal peptides.
40Localization
- How do proteins get to the proper location in the
cell? - Polypeptides often contain signal sequences that
cause protein to end up in proper organelle, or
be secreted, or become embedded in the membrane.
Often a leader sequence (or signal sequence) at N
terminus that is then removed. - Best known is for secretion into ER, into
membrane, and extracellular About 20 mostly
hydrophobic amino acids at the N-terminus of the
polypeptide. A Signal Recognition Particle
(RNA/protein hybrid) recognizes this during
translation and guides ribosomes to the rough ER
where translation finishes. - Also signals for nucleus, lysosome,
mitochondria. Some are internal to protein and
not removed.
41A Few Odds and Ends
42Operons
- In eukaryotes, each messenger RNA contains a
single gene. Genes are scattered randomly
throughout the genome, with no grouping of
related genes. - monocistronic having only 1 gene on a mRNA.
- In prokaryotes, genes that make different parts
of the same structure or metabolic pathway are
often grouped together and transcribed as a
single unit. Several different proteins are
independently translated from the same mRNA
molecule. This group of genes is called an
operon. - polycistronic having several genes
co-transcribed onto the same mRNA.
43Exceptions in Prokaryotes
- In addition to the 20 regular amino acids, two
other amino acids coded in the DNA have been
found selenocysteine and pyrolysine. Both of
these use the UGA stop codon, with other bases
around it used to signal that it is to be
interpreted as an amino acid and not a stop. - Bacteria have been seen (rarely) to use several
other start codons, including CTG, ATA, ATC, and
ATT. - Regardless of which start codon is used, all
bacteria (NOT Archaea) use N-formyl methionine as
the first amino acid in the polypeptide. - RNA editing is a process by which certain
messenger RNAs are altered by adding, deleting,
or altering certain bases. It seems rare and (so
far) confined to eukaryotes (including
mitochondria and chloroplasts).
44Reverse Transcription
- A few exceptions to the Central Dogma exist.
- Most importantly, some RNA viruses, called
retroviruses make a DNA copy of themselves
using the enzyme reverse transcriptase. The DNA
copy incorporates into one of the chromosomes and
becomes a permanent feature of the genome. The
DNA copy inserted into the genome is called a
provirus. This represents a flow of
information from RNA to DNA. - Closely related to retroviruses are
retrotransposons, sequences of DNA that make
RNA copies of themselves, which then get
reverse-transcribed into DNA that inserts into
new locations in the genome. Unlike
retroviruses, retrotransposons always remain
within the cell. They lack genes to make the
protein coat that surrounds viruses. - Some viruses use RNA for their genome, and
directly copy it into more RNA without any DNA
intermediate. The enzyme involved is called a
replicase or RNA dependent RNA polymerase.