Title: Research in Computational Genomics
1Research in Computational Genomics
Mar Albà Evolutionary Genomics Group Research
Unit on Biomedical Informatics Universitat Pompeu
Fabra
UPC, April 1 2005
21. The genetic information
2. The human genome project
3. Genomics techniques and research
31. The genetic information
4The genetic information inheritance
1865 Mendel
5DNA is the hereditary material
1928 Griffith transforming principle
pneumonia
Infection of mice
Die
deadly bacteria
non deadly bacteria
Live
boiled deadly bacteria
Live
Die
1944 - Avery, MacLeod, McCarthy DNA is the
transforming principle
DNAse
Live
Die
protease
6DNA structure
1953 Watson and Crick discover the structure of
DNA
1953 Rosalind Franklin X difraction image of DNA
7DNA structure antiparallel double helix
nucleotides
A adenine G guanine C citosine T
thymine C-G A-T
8- RNA
- single strand
- uracil instead
- of thimine
- contains ribose
- instead of
- desoxiribose
- A-U
- C-G
9Proteins
QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQES
KPVQMMCMNNSFNVATLPAEKMKILELPFASGDLSMLVLLPDEVSDLERI
EKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTSVLMALGMTDL
FIPSANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSP
ESEQFRADHPFLFLIKHNPTNTIVYFGRYWSP
10Proteins are made of amino acids
amino acid
1120 amino acids
12Proteins amino acid chain
Peptide bond
13(No Transcript)
14DNA replication
15Transcription
The transcription of a gene may be off or on,
depending on the cell type and conditions.
16Translation
17Translation
18Genetic code
19DNA cloning
DNA ligase
vector with insert
DNA fragments
Vectors (replicating DNA)
transformation of bacteria
extraction
amplification
20DNA sequencing
resulting partial labelled fragments
21DNA sequencing
222. The human genome project
23The human genome project
1953 - Discovery of the DNA double helix by
Watson and Crick 1995 - Haemophilus influenzae
genome 2001 - The first draft of the human
genome is published, covering approximately 94
of the genome (Public Consortium Celera) 2003
Human genome sequence complete
242001 Draft of the human genome
15 February 2001
25IMIM (Institut Municipal dInvestigacions
Mèdiques, Barcelona) participates in the
annotation of the human genome
Josep Abril and Roderic Guigó
26Human genome 3.000.000.000 nucleotides
27Human chromosomes
28Whats in the human genome?
parasitic repetitive elements
gene coding part (2)
gene non- coding part
microsatellites
DNA long repeats
29Gene structure
30Comparison with other genomes
31Gene catalogue
30.000 genes
10.000 already known (cDNA)
- Gene prediction programmes
- Homology to other species
- ESTs (expressed sequence tags)
- - the functions of approximately half of the
genes are not known !
32Parasitic repetitive elements
Nature, Feb. 15, 2001
33Parasitic repetitive elements Retrotransposition
cytoplasm
343. Genomics techniques and research
35Genomics
- bioinformatics
- genome sequencing and annotation
- functional genomics
- systems biology
36(No Transcript)
37Genome sequencing and annotation
38Exponential growth of DNA sequences
39How many genomes?
40Recently sequenced eukaryotic genomes
A.gossypii
A.gambiae
A.mellifera
C.intestinalis
T.rubripes
R.norvegicus
41How long does it take to sequence a genome?
bacteria 1 day fungus 1 week insect 1-2
months mammal 1-2 years
42Gene prediction
- DNA coding for protein sequences (exons) only
- accounts for 2 of the human genome
- Information we can use
- splice site signals
- statistics of coding sequences
gene
EXONS
PROTEIN
43Sequence similarity
- To predict genes we can also use sequence
similarity - searches to known proteins
alignment of protein sequences
44Microbial Genomes at NCBI
http//www.ncbi.nlm.nih.gov/genomes/MICROBES/Compl
ete.html
National Center for Biotechnology information,
National Institute of Health
45Functional annotation of all genes in a genome
46Ensembl Genome Browser
http//www.ensembl.org European Bioinformatics
Institute
47Ensembl Genome Browser
48(No Transcript)
49Encode (NIH) Encyclopedia Of DNA Elements
- exhaustive analysis of 1 of the
- human genome
- identification of functional elements
- development and comparison of
- different computational methods
http//www.genome.gov/Pages/Research/ENCODE/ 2003-
50HapMap (Haplotype Map)
Variability map (single nucleotide polymorphism,
SNPs) in Africa, Asia and USA populations.
It will help identify genes involved
in complex disease, by association with
particular haplotypes.
haplotype variants
SNPs
http//www.hapmap.org/ 2002-
51Environmental Genome Shotgun Sequencing of the
Sargasso Sea J.Craig Venter et al. Science, Vol
304, Issue 5667, 66-74, 2 April 2004
1.045 billion base pairs 1800 genomic species
148 previously unknown bacterial phylotypes
52Functional genomics
53DNA microarrays high throughput analysis of gene
transcription
54chIp-chip analysis of protein binding DNA
fragments
cross-link protein and DNA
immunoprecipitation
eliminate protein
hybridize with DNA
55Protein-protein interactions yeast two hybrid
56Protein interaction networks
57(No Transcript)
58Systems biology
- Development of mathematical methods to model the
- behaviour of biological systems, including all
elements in - the system and their interactions.
59(No Transcript)
60National Center for Biotechnology Information
(USA) http//www.ncbi.nlm.nih.gov European
Bioinformatics Institute (UK) http//www.ebi.ac.
uk
61Acknowledgements Grup de Recerca en
Informàtica Biomèdica Ferran Sanz Grup de
Genòmica Computacional Roderic
Guigó Universitat Pompeu Fabra www.imim.es/grib
Genòmica Computacional GRIB