Title: Bioinformatics
1 Bioinformatics Essentials Stephanie Tatem
Murphy smurphy_at_bcc.ctc.edu
2DNA
Protein
Model organisms
PNSADADNDFEDRL RAGLCDHDKEVQGL QVRCAVUEEHMHK KQQEFE
NIRLDAQRL EFFAYIFQKEHMKR
ATGCATTTCGGT TTACGCCATATA GCTCGGGAATCA TGCATCGATCG
A GTAGCTAGCTAG
3TGT AAT AGT TAT ATT TTC ATT ATA AAT TGT GTT TGT
AGA CAT CAT AAA TTT AAA ACA TGG CTT TTT AAC
CTG ATA AAT CCT ACG AAT ATT TGT AAT AGT TAT GTT
ATT GCA GTA AGT ACC GTT TGT ATT ATA AAT TGT GTT
CTG
What is Bioinformatics?
TGT AAT AGT TAT ATT TTC ATT ATA AAT TGT GTT TGT
AGA CAT CAT AAA TTT AAA ACA TGG CTT TTT AAC
CTG ATA AAT CCT ACG AAT ATT TGT AAT AGT TAT GTT
ATT GCA GTA AGT ACC GTT TGT ATT ATA AAT TGT GTT
CTG
Which genes are turned off then on ?
Courtesy of Dr. Young Moo Lee UC Davis
4Human Genome Program, U.S. Department of Energy,
Genomics and Its Impact on Medicine and Society
A 2001 Primer, 2001
5Genome Transcriptome Proteome
6Fundamental Dogma
DNA
Although a few databases already exist to
distribute molecular information,
RNA
Development ?
Gene Expression?
Proteins
the post-genomic era will need many more to
collect, manage, and publish the coming flood of
new findings.
Pathways
Metabolism?
Regulatory Pathways?
Phenotypes
Map Databases
Neuroanatomy?
Clinical Data ?
Populations
PDB
Biodiversity?
GenBank EMBL DDBJ
Molecular Epidemiology?
SwissPROT PIR
Comparative Genomics?
Bob Robbins http//www.esp.org/rjr/canberra.pdf
7 Gene a b c d e
ATGGCCCTGTGGATGCGCCTCCTGCCCCTG.. DNA
base sequence recipe for amino acids
Met Ala Leu Trp Met Arg Leu Leu Pro
Leu Amino acid sequence protein trait
Art by Yelena Ponirovskaya
8 The Biology Project University of
Arizona http//www.biology.arizona.edu DNA
acitivity RFLP, Inheritance http//www.biology.a
rizona.edu/human_bio/activities/blackett/introduct
ion.html DNA replication fork http//www.biology.
arizona.edu/molecular_bio/problem_sets/nucleic_aci
ds/03t.html DNA base pairing http//www.biology.a
rizona.edu/molecular_bio/problem_sets/nucleic_acid
s/08t.html DNA translation http//www.biology.ari
zona.edu/molecular_bio/problem_sets/nucleic_acids/
10t.html The Genetic Code http//www.biology.ariz
ona.edu/molecular_bio/problem_sets/nucleic_acids/1
2t.html http//www.biology.arizona.edu/molecular_b
io/problem_sets/nucleic_acids/13t.html DNA
transcription http//www.biology.arizona.edu/molec
ular_bio/problem_sets/nucleic_acids/15t.html
9Bioinformatics a Definition bio informatics
bioinformatics is conceptualizing biology in
terms of molecules and applying informatics
techniques to understand and organise the
information associated with these molecules, on a
large scale. In short, bioinformatics is a
management information system for molecular
biology and has many practical applications. As
submitted to the Oxford English
Dictionary. What is Bioinformatics? N. M.
Luscombe, et al. Yale University Method Inform
Med 4/2001
10Bioinformatics a Definition
The field of science in which biology, computer
science, and information technology merge into
a single discipline. NCBI, Aug 2001
11Whats in a name?
GenomeMapping
ProteinAnalysisProteomics
MultipleSequenceAlignment
DatabaseHomologySearching
3DModeling
Life Science Informatics
HomologyModelingDocking
SequenceAnalysis
SampleRegistration Tracking
IntellectualPropertyAuditing
CommonVisualInterfaces
IntegratedDataRepositories
12Bioinformatics Needs
Multidisciplinary teams biologists,
mathematicians, computer scientists, laboratory
technicians Users and Developers to use /
create scalable database infrastructure standar
ds to control vocabulary and annotation new ways
of visualizing, analyzing and searching data new
ways of delivering information, tools and
results Faster and larger computer systems
13Demo Bioinformatics Company Onconomics
Corporation http//www.bscs.org/onco/default.htm
From nonprofit BSCS Biological Sciences
Curriculum Study
14Growth of Bioinformatics
Computer Programming 50 yrs ago DNA Protein
Structure Personal Computers/ Internet
20 yrs ago PCR w.w.w.
Last 10 yrs Human Genome Project All
fields use computers Now Biological (art,
law, communication) Research Bioinformatics
Computer Skills
www.oreilly.com
15Why informatics? Large size of data sets Allow
students to ask questions of data Integrate
current research into classroom
http//www.ncbi.nlm.nih.gov/Genbank/genbankstats.h
tml
16gt100,000 species are represented in GenBank
all species 128,941 viruses 6,137 bacteria 31,
262 archaea 2,100 eukaryota 87,147
17The most sequenced organisms in GenBank
Homo sapiens 10.7 billion bases Mus musculus
6.5b Rattus norvegicus 5.6b Danio rerio
1.7b Zea mays 1.4b Oryza sativa
0.8b Drosophila melanogaster 0.7b Gallus
gallus 0.5b Arabidopsis thaliana 0.5b
Table 2-2 Page 18
Updated 8-12-04 GenBank release 142.0
18Online datasets for all the Life
Sciences Environment and EcologyPopulation
http//www.prb.orgWater
http//www.waterontheweb.org/
http//www.neptune.washington.edu/ Geography
http//nhd.usgs.gov/ http//data.geocomm.com/ C
hemistry Physics Biology Anatomy
Physiology Earth http//www.dlese.org
/educators/usingdata.html Agriculture
Nutrition Plant
http//allometra.com/ath_fasta_mpss.shtml
19Why use Bioinformatics?
Data mining requires a testable hypothesis
generated with regard to the function or
structure of a gene or protein by identifying
similar sequences in better characterized
organisms. To help in uncovering phylogenetic
relationships and evolutionary
patterns. www.tigr.org
20What is Bioinformatics? N. M. Luscombe, et al.
Yale University Method Inform Med 4/2001
21Biotechnology Did You or Will You Ever? Ride in
a car? Genetically engineered micro-organisms
will someday be used to extract oil from rocks.
Micro-organisms that break down oil spills are
already in use. Drink tap water? Genetically
engineered micro-organisms will someday be used
to attract and filter out harmful substances from
drinking water. Have a dog or cat? Vaccines for
a number of pet diseases such as rabies will be
improved by genetic engineering. Wear brightly
colored clothes? Many clothing dyes can be made
less expensively with biotechnology, and will
last longer. Take vitamins? Vitamins can be made
more potent and less expensively with
biotechnology. Go to the bathroom?
Micro-organisms are already an important part of
sewage treatment genetic engineering will
produce bacteria that are more efficient at
breaking down wastes.
22What Good is Recombinant DNA?
People with diabetes need to take a drug called
insulin. In the past, this drug was extracted and
purified from ground-up animal glands. It takes
several pounds of cow or pig glands to produce a
fraction of an ounce of insulin.
Today, the DNA with the instructions for making
insulin can be spliced into a plasmid, And
produced by bacteria? Its faster, easier, and
cheaper this way.
http//www.chourave.ch/init/kid/cartoon-00.html
There are still many technical problems to be
solved. Not all gene splices work, and some that
do may fail over time.
There are also social and environmental concerns
about biotechnology. Some people fear we will
upset the balance of nature if genetically
engineered organisms escape. Others fear that
recombinant DNA will be used to influence human
size, race, or intelligence.
The best way for people to enjoy the benefits and
avoid the problems is to stay informed and up to
date about whats happening in biotechnology.
23How Do You Make Recombinant DNA?
First, you need to isolate a specific bit of DNA
with the instructions you want. To do this, you
use restriction enzymes that break up DNA strands
in specific places.
After you have DNA fragments, you sort them by
size, using a gel. DNA is loaded onto the top of
the gel, and then electricity is passed through
it. This causes the DNA pieces to migrate down,
and the small pieces travel further than the
large pieces.
Next, you need to add the DNA fragment into a
host. In most research, the host is a plasmid, a
ring of DNA found in some bacteria.
The host DNA has to be exposed to restriction
enzymes to make split ends that will attach to
the fragment. After you mix the new and host DNA
fragments, you need to add enzymes that will glue
them together.
24How Do You Make Recombinant DNA?
If you used a plasmid as a host, you need to put
it back into a bacterium. When the bacterium
replicates itself, it will copy the new DNA too.
A small population of gene-spliced bacteria can
develop into a large population in just a few
days.
http//www.gene.com/gene/research/ biotechnology
25What is an Enzyme?
Enzymes are molecules that speed up biological
reactions.
For example, the enzyme carbonic anhydrase
enables red blood cells to pick up and dump
carbon dioxide 1 million times faster than they
could without it.
Some characteristics of enzymes
Enzymes increase the rate of a chemical reaction.
Enzymes are highly specific. Like a wrench that
will only fit a 5/16-inch bolt, each enzyme
generally works with only a particular kind of
molecule.
Enzymes dont enter into the reaction themselves.
Theyre not physically changed as a result of the
reaction. A single enzyme can act thousands of
times.
An enzyme increases the odds that two molecules
will meet, so an enzyme is a matchmaker.
26Why try to Design Better Enzymes?
Enzymes are fragile. they lose their shape
(de-nature) if the temperature or acidity go up
even a little. They also de-nature in alcohol or
oils.
This is a drag! If youre adding an enzyme to a
laundry detergent youd like it to function in
hot water, with bleach!
As we understand more and more about DNA and how
it is de-coded, we can re-write the instructions
for making some enzymes. By altering their
shapes, we may be able to make enzymes that are
sturdier and able to function under harsher
conditions. We may even be able to invent some
completely new enzymes!
27Examples of Enzymes
SubtilisinThis enzyme is added to laundry
detergent. It breaks down proteins (like yucky
egg yolk stains or gross dried blood) into tiny
fragments that can be rinsed away from the fibers
of the cloth.
Papain-This enzyme breaks up proteins, and is
extracted from the papaya fruit. Its now added
to contact lens cleaner solution to help
dissolve away gross crusty things from soft
contact lenses.
Ceredase-Several thousand people in the United
States have Gaucher disease (low levels of a
crucial enzyme that dissolves fatty deposits in
the liver, spleen and bone marrow). They suffer
from bone pain, fractures, swelling and bleeding.
Ceredase is a variation of the enzyme, produced
in the laboratory, which can be used to treat
disease.
Vianain-Originally derived from pineapples, this
enzyme offers hope to burn victims. It helps
prepare burned areas of skin grafts by safely
dissolving damaged skin layers that would
otherwise have to be removed surgically.
28 Journals Books Public Library of Science
- Open Access Journals http//www.plosbiology.or
g International Society for Computational Biology
Book Reviews http//www.iscb.org/bioinformatics
Books.shtml Free Journals Biotechniques
http//www.BioTechniques.com Genomeweb
http//www.genomeweb.com Books The Cartoon Guide
to Genetics, Larry Gonick Mark Wheelis ISBN
0062730991 Harper 1983 Introduction to
Bioinformatics, Arthur Lesk http//www.oup.com/uk/
lesk/bioinf ISBN 0199251967 Oxford
2002 Fundamental Concepts of Bioinformatics, Dan
Krane Michael Raymer ISBN 0805346333
Benjamin Cummings 2003 Discovering Genomics,
Proteomics, Bioinformatics, A. Campbell L.
Heyer ISBN 0805347224 Benjamin Cummings
2002 Understanding Biotechnology, George
Acquaah ISBN 0130945005 Pearson Prentice Hall
2004 Understanding Biotechnology, A. Borem, F.
Santos, D. Bowen ISBN 0131010115 Pearson
Prentice Hall 2003
29Human Genome Project http//www.ornl.gov/sci/tech
resources/Human_Genome/publicat/primer2001/index.s
html Genomics and Its Impact on Science and
Society The Human Genome Project and
Beyond U.S. Department of Energy Genome
Programs http//doegenomes.org
30National Center for Biotechnology Informati
on
31(No Transcript)
32A users guide to human genome
Nature Genetics www.nature.com/ng/ vol 32, pg
1-79, 01 Sep 2002
- Introduction putting it together Â
- Question 8 How can one find all the members of
a human gene family?  - Question 12 How does a user find characterized
mouse mutants - corresponding to human genes? Â
- Web resources Internet resources featured in
this guide
33Get Schooled for Bioinformatics
- Biology
- Know basics Have sense of biological
experimentation - Computer Science
- Programming
- C, C, Perl, JAVA, SAS, CGI
- Database construction UNIX, LINUX
- Algorithm design
- Math/Statistics
- Probability, Experimental design
- Ethics
- Core Bioinformatics
- LIMS
- EST clustering
- Sequence analysis annotation
34Fundamental Dogma
Although a few databases already exist to
distribute molecular information,
the post-genomic era will need many more to
collect, manage, and publish the coming flood of
new findings.
Biological Research To enable the discovery of
new biological insights as well as create a
global perspective from which unifying principles
in biology can be discerned. NCBI,
Aug 2001
35- Ultra Conserved element
- Only 6 SNPs
- mouse, rat, human
- TGATCCCGGACTCTATGAATTATTGATGAGATATGAGCGTTGATTTCCCC
TTTCAG - GATGCAAACTCCATTATATTGTTAAAATGGCGATTTAATCGTTGAGAATA
GCTTTG - GTGTGGGTTTTTTCCCCCAACTCATTTGCGCCTCCTTCCTTTTCATTTAA
CTCTCT - TAATTAAATCCTTTAACAGATTTTAATCACTTTTTGGAG