Title: Introduction to Bioinformatics
1Introduction to Bioinformatics
- Lecturer Dr. Yael Mandel-Gutfreund
- Teaching Assistance
- Oleg Rokhlenko
- Ydo Wexler
http//webcourse.cs.technion.ac.il/236523
2What is Bioinformatics?
3Course Objectives
- To introduce the bioinfomatics discipline
- To make the students familiar with the major
biological questions which can be addressed by
bioinformatics tools - To introduce the major tools used for sequence
and structure analysis and explain in general
how they work (limitation etc..)
4Course Structure and Requirements
- Class Structure
- Each class (except the first one) will be divided
into two parts - Lecture (in lecture room)
- A Training Lab (in computer lab)
- For the Training Lab the class will be divided to
2 groups. - Each one of the groups will meet every second
week, - starting from the second week.
- The work in the Training Labs will be in pairs.
- Lab assignments will be submitted at the end of
each lab. - Preparing yourself for the lab- A tutorial
including self home exercise and their answers
will be posted on the web a week before the lab
-
- 2. A final home exam
5Grading
- 30 lab assignments
- 70 final exam
6Literature list
- Gibas, C., Jambeck, P. Developing Bioinformatics
Computer Skills. O'Reilly, 2001. - Lesk, A. M. Introduction to Bioinformatics.
Oxford University Press, 2002. - Mount, D.W. Bioinformatics Sequence and Genome
Analysis. 2nd ed.,Cold Spring Harbor Laboratory
Press, 2004.
Advanced Reading
Jones N.C Pevzner P.A. An introduction to
Bioinformatics algorithms MIT Press, 2004
7Course syllabus
8What is Bioinformatics?
9What is Bioinformatics?
The field of science in which biology, computer
science, and information technology merge to form
a single discipline Ultimate goal to enable
the discovery of new biological insights as well
as to create a global perspective from which
unifying principles in biology can be discerned.
10from purely lab-based science to an information
science
Bioinformatics Bio Informatics
11Central Paradigm in Molecular Biology
mRNA
Gene (DNA)
Protein
12Genome
- Chromosomal DNA of an organism
- Coding and non-coding DNA
- Genome size and number of genes does not
necessarily determine organism complexity
13Transcriptome
- Complete collection of all possible mRNAs
(including splice variants) of an organism. - Regions of an organisms genome that get
transcribed into messenger RNA. - Transcriptome can be extended to include all
transcribed elements, including non-coding RNAs
used for structural and regulatory purposes.
14Proteome
- The complete collection of proteins that can be
produced by an organism. - Can be studied either as static (sum of all
proteins possible) or dynamic (all proteins found
at a specific time point) entity
15From DNA to Genome
First protein sequence
Watson and Crick DNA model
1955
1960
First protein structure
1965
1970
1975
1980
1985
161990
First bacterial genome Hemophilus Influenzae
1995
Yeast genome
First human genome draft
2000
17The Human Genome Project
- Initiated in 1986 Completed
in 2003 - Project goals were to
- identify all the genes in human DNA,
- determine the sequences of the 3 billion chemical
base pairs that make up human DNA, - store this information in databases,
- improve tools for data analysis and develop new
tools - address the ethical, legal, and social issues
that may arise from the project.
18Human Genome Project
International Human Genome Organization founded
Celera Genomics founded
First working drafts published
1995
1985
1990
2000
USA Department of Energy announces project
Low resolution linkage map published
Project successfully completed
19The Human Genome Project
- Initiated in 1986 Completed
in 2003 - How did we do??
- identify all the genes in human DNA ? ?
- determine the sequences of the 3 billion chemical
base pairs that make up human DNA ? ? ? - store this information in databases ? ? ?
- improve tools for data analysis and develop new
tools ? ? ? - address the ethical, legal, and social issues
that may arise from the project ?
20What makes us human?
21How humans are chimps?
Perhaps not surprising!!! Comparison between the
full drafts of the human and chimp
genomes revealed that they differ only by 1.23
22Complete Genomes
- 1994 0
- 1995 1
- 2004 234
- 2005 303
- eukaryotes 24
- bacteria 240
- archaea 39
23The post-genomics era
Whats Next ?
Annotation
Comparative genomics
Structural genomics
Functional genomics
Goal to understand the functional networks of a
living cell
24Open reading frames
Functional sites
Annotation
Structure, function
25CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAA
ATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGT
TTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCG
GGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACG
GAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG
AAT ...... .............. TGAAAAACGTA
26CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAA
ATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGT
TTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCG
GGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACG
GAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG
AAT ................................. ............
.. TGAAAAACGTA
27Comparative genomics
28Chimps and Us
29Comparative genomics
30- Researchers have learned a great deal about the
function of human genes by examining their
counterparts in simpler model organisms such as
the mouse.
Conservation of the IGFALS (Insulin-like growth
factor) Between human and mouse.
31Functional genomics
32Understanding the function of genes and other
parts of the genome
33Functional genomics
34A network of interactions can be built For all
proteins in an organism
A large network of 8184 interactions among 4140
S. Cerevisiae proteins
35Structural genomics
Assign structure to all proteins encoded in a
genome
36Protein Structure
37Resources and Databases
- The different types of data are collected in
database - Sequence databases
- Structural databases
- Databases of Experimental Results
- All databases are connected
38Database Types
Sequence databases General special GenBank,
embl TF binding sites PIR, Swissprot Promoters
Genomes Structure databases General Spe
cial PDB Specific protein families folds
Databases of experimental results Co-expressed
genes, prot-prot interaction, etc.
39Sequence databases
- Gene database
- Genome database
- SNPs database
- Disease related mutation database
40What can we learn about a Gene
41mRNA, full length, EST
42EST
- Expressed Sequence Tags
- Partial copies of mRNA found within a particular
cell - Can be used to identify genic regions splicing
patterns of genes etc
43Different transcripts can be related to the same
gene!
44Gene database
- Give information into gene functionality
- Alternative splicing of genes
- Alternative pattern of exons included to create
gene product - EST
45Genome Databases
- Data organized by species
- Clones assembled into contigous pieces contigs
or whole chromosomes - Information on non-coding regions
- Relativity
46Genome Browsers
- Annotation adds value to sequence
- Easy walk through the genome
- Comparative genomics
47Genome Browsers
- Ensembl Genome Browser (http//www.ensembl.org)
- UCSC Genome Browser http//genome.ucsc.edu/
- WormBase http//www.wormbase.org/
- AceDB http//www.acedb.org/
- Comprehensive Microbial Resource
http//www.tigr.org/tigr-scripts/CMR2/CMRHomePage.
spl - FlyBase http//flybase.bio.indiana.edu/
48beta globin
49(No Transcript)
50RefSeq
- Set of mRNA sequences cureted at NCBI
- Many experimentally validated
- Some partially validated via ESTs
- Some computationally predicted
51(No Transcript)
52(No Transcript)
53(No Transcript)
54(No Transcript)
55(No Transcript)
56SNP database
- Single Nucleotide Polymorphisms (SNPs)
- Single base difference in a single position among
two different individuals of the same species - Play an important role in differentiation and
disease
57Sickle Cell Anemia
- Due to 1 swapping an A for a T, causing inserted
amino acid to be valine instead of glutamine in
hemoglobin
Image source http//www.cc.nih.gov/ccc/ccnews/nov
99/
58Healthy Individual
- gtgi28302128refNM_000518.4 Homo sapiens
hemoglobin, beta (HBB), mRNA - ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACC
ATGGTGCATCTGACTCCTGA - GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAG
TTGGTGGTGAGGCCCTGGGC - AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGG
GGATCTGTCCACTCCTGATG - CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGT
GCCTTTAGTGATGGCCTGGC - TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT
GTGACAAGCTGCACGTGGAT - CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCA
TCACTTTGGCAAAGAATTCA - CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAAT
GCCCTGGCCCACAAGTATCA - CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCC
CTAAGTCCAACTACTAAACT - GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAA
CATTTATTTTCATTGC - gtgi4504349refNP_000509.1 beta globin Homo
sapiens - MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS
TPDAVMGNPKVKAHGKKVLG - AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG
KEFTPPVQAAYQKVVAGVAN - ALAHKYH
59Diseased Individual
- gtgi28302128refNM_000518.4 Homo sapiens
hemoglobin, beta (HBB), mRNA - ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACC
ATGGTGCATCTGACTCCTGA - GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAG
TTGGTGGTGAGGCCCTGGGC - AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGG
GGATCTGTCCACTCCTGATG - CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGT
GCCTTTAGTGATGGCCTGGC - TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT
GTGACAAGCTGCACGTGGAT - CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCA
TCACTTTGGCAAAGAATTCA - CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAAT
GCCCTGGCCCACAAGTATCA - CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCC
CTAAGTCCAACTACTAAACT - GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAA
CATTTATTTTCATTGC - gtgi4504349refNP_000509.1 beta globin Homo
sapiens - MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS
TPDAVMGNPKVKAHGKKVLG - AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG
KEFTPPVQAAYQKVVAGVAN - ALAHKYH
60Disease Databases
- Genes are involved in disease
- Many diseases are well studied
- Description of diseases and what is known about
them is stored
OMIM - Online Mendelian Inheritance in Man
61(No Transcript)
62Structure Databases
- 3-dimensional structures of proteins, nucleic
acids, molecular complexes etc - 3-d data is available due to techniques such as
NMR and X-Ray crystallography
63(No Transcript)
64(No Transcript)
65Databases of Experimental Results
- Data such as experimental microarray images-
expression data - Clustering information
- Metabolic pathways, protein-protein interaction
data
66PubMed
Literature Databases
http//www.ncbi.nlm.nih.giv/PubMed
Service of the National Library of Medicine
- MEDLINE publication database
- Over 17,000 journals
- 15 million citations since 1950
67Putting it All Together
- Each Database contains specific information
- Like other biological systems also these
databases are interrelated
68PROTEIN PIR SWISS-PROT
DISEASE LocusLink OMIM OMIA
ASSEMBLED GENOMES GoldenPath WormBase TIGR
MOTIFS BLOCKS Pfam Prosite
GENOMIC DATA GenBank DDBJ EMBL
ESTs dbEST unigene
GENES RefSeq AllGenes GDB
SNPs dbSNP
GENE EXPRESSION Stanford MGDB NetAffx ArrayExpress
PATHWAY KEGG COG
STRUCTURE PDB MMDB SCOP
LITERATURE PubMed
69Entrez NCBI Engine
- Entrez is the integrated, text-based search and
retrieval system used at NCBI for the major
databases, including PubMed, Nucleotide and
Protein Sequences, Protein Structures, Complete
Genomes, Taxonomy, and others.
http//www.ncbi.nlm.nih.gov/gquery/gquery.fcgi?ito
oltoolbar
70Entrez NCBI Engine
71- General Bioinformatic Webpages
- USA National Center for Biotechnology
Information www.ncbi.nlm.nih.gov - European Bioinformatics Institute www.ebi.ac.uk
- ExPASy Molecular Biology Server www.expasy.org
- Israeli National Node inn.org.il
http//www.agr.kuleuven.ac.be/vakken/i287/bioinfor
matica.htm