http:creativecommons'orglicensesbysa2'5ca - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

http:creativecommons'orglicensesbysa2'5ca

Description:

http:creativecommons'orglicensesbysa2'5ca – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 66
Provided by: franciso3
Category:

less

Transcript and Presenter's Notes

Title: http:creativecommons'orglicensesbysa2'5ca


1
http//creativecommons.org/licenses/by-sa/2.5/ca/
http//tinyurl.com/3cw4ql
2
CAN SCIENTISTS CURE CANCER WITH COMPUTERS?
  • February 12th, 2008
  • Francis Ouellette francis_at_oicr.on.ca
  • Associate Director, Informatics and Biocomputing,
    Ontario Institute for Cancer Research

3
CAN SCIENTISTS CURE CANCER WITH COMPUTERS?
  • NO

4
Take two bytes and call me in the morning!
5
CAN SCIENTISTS CURE CANCER WITHout COMPUTERS?
  • NO

6
Byte my Genes
  • Using computers to understand our DNA

7
Bioinformatics
  • Computational biology
  • Biocomputing
  • Theoretical biology
  • Biometry
  • Statistical Genomics

8
What is Bioinformatics?
  • Think Pair Share!

9
Bioinformatics is about integrating biological
themes together with the help of computer tools
and biological databases, and gaining new
knowledge about the system in study.
10
National Center for Biotechnology Information
(NCBI)
httpncbi.nlm.nih.gov
11
Computers
Laboratory
Maytag cycle
12
The problem not reinventing the wheel!
  • Pegasys A workflow management tool
  • Atlas a data warehouse
  • Already available Apollo, NCBI toolkit

Apollo
Atlas
Pegasys
gamexml
parser
ASN.1
http//www.fruitfly.org/annot/apollo/
(BDGP-EBI)
FASTA file
13
http//www.cytoscape.org/
14
BLAST Result
  • Basic
  • Local
  • Alignment
  • Search
  • Tool

15
Comparative Analysis in Biology
Jim Ostell
Human
Dog
16
http//upload.wikimedia.org/wikipedia/en/5/5b/Evol
ution_pl.pngCreated by Jerry Crimson Mann 0625,
2 August 2005 (UTC).
17
Comparative Analysis of Genes
Jim Ostell
Human 638 RHACVEVQDEIAFIPNDVYFEKDKQMFHIITGPNMGGKS
TYIRQTGVIVLMAQIGCFVPC 697 Yeast 657
RHPVLEMQDDISFISNDVTLESGKGDFLIITGPNMGGKSTYIRQVGVISL
MAQIGCFVPC 716 E.coli 584 RHPVVEQVLNEPFIANPLNLSPQR
R-MLIITGPNMGGKSTYMRQTALIALMAYIGSYVPA 642
Colon cancer gene sequence
18
Mark Boguski, NCBI
Comparative Analysis of Genomes
Tout ce qui est vrai pour le Colibacille est
vrai pour l'éléphant Jacques Monod, 1972
19
Comparative Genomics Humans vs Rodents
Chris Ponting http//www.stats.ox.ac.uk/hei
n/HumanGenome/Ponting1.ppt
Human and mouse c-kit mutations show similar
phenotypes. The utility of mouse as a biomedical
model for human disease is enhanced when
mutations in orthologous genes give similar
phenotypes in both organisms. In a visually
striking example of this, the same pattern of
hypopigmentation is seen in (a) a patient with
the piebald trait and (b) a mouse with dominant
spotting, both resulting from heterozygous
mutations of the c-kit proto-oncogene.
20
Why is there Bioinformatics?
Fiona Brinkman
Sequencing technology!
  • Lots of new sequences being added
  • Automated sequencers
  • Genome Projects
  • EST sequencing
  • Microarray studies
  • Proteomics
  • Metagenomics (Metagenomics describes the
    functional and sequence-based analysis of the
    collective microbial genomes contained in an
    environmental sample)
  • Whole genome sequencing and WGAS (whole genome
    association studies)
  • Patterns in datasets that can only be analyzed
    using computers

21
High Throughput sequencing
John McPherson
  • Illumina/Solexa GA
  • 25-35 bases
  • 40,000,000 - 60,000,000 reads
  • 1,500,000,000 bases (0.25x genome coverage)
  • 3 day run time
  • OICR has two of these
  • 4 on order.

22
Next-generation Sequencing
John McPherson
  • Applied Biosystems SOLiD
  • 25-35 bases
  • 80,000,000 reads
  • 2,500,000,000 bases (0.4x genome coverage)
  • 3 day run time
  • OICR has two of these 4 on order

23
Ramp up April 2008
John McPherson
24 billion nucleotides in 3 days ? 1 human
genome /day
24
Genomes

  • Number of base
    pairs
  • __________________________________________________
    _________
  • 1971 First published DNA sequence
    12
  • 1977 ?X174
    5,375
  • 1982 ?
    48,502
  • 1992 Saccharomyces cerevisiae Chromosome III
    316,613
  • 1995 Haemophilus influenza
    1,830,138
  • 1996 Saccharomyces cerevisiae
    12,068,000
  • 1998 Caenorhabditis elegans
    97,000,000
  • 2000 Drosophila melanogaster
    120,000,000
  • 2001 Homo sapiens (draft)
    2,600,000,000
  • 2003 Homo sapiens
    2,850,000,000

25
  • Genbank doubles every 14 months

(from the National Centre for Biotechnology
Information)
Shorter than Moores law (computer power doubling
every 20 months!)
26
About Sequences ...
ACGT
27
1000 base pairs
GCAGCGCACGACAGCTGTGCTATCCCGGCGAGCCCGTGGCAGAGGACCTC
GCTTGCGAAA GCATCGAGTACCGCTACAGAGCCAACCCGGTGGACAAAC
TCGAAGTCATTGTGGACCGAA TGAGGCTCAATAACGAGATTAGCGACCT
CGAAGGCCTGCGCAAATATTTCCACTCCTTCC CGGGTGCTCCTGAGTTG
AACCCGCTTAGAGACTCCGAAATCAACGACGACTTCCACCAGT GGGCCC
AGTGTGACCGCCACACTGGACCCCATACCACTTCTTTTTGTTATTCTTAA
ATAT GTTGTAACGCTATGTAATTCCACCCTTCATTACTAATAATTAGCC
ATTCACGTGATCTCA GCCAGTTGTGGCGCCACACTTTTTTTTCCATAAA
AATCCTCGAGGAAAAGAAAAGAAAAA AATATTTCAGTTATTTAAAGCAT
AAGATGCCAGGTAGATGGAACTTGTGCCGTGCCAGAT TGAATTTTGAAA
GTACAATTGAGGCCTATACACATAGACATTTGCACCTTATACATATAC A
CACAAGACAAAACCAAAAAAAATATGACTCTACAAGAATCTGATAAATTT
GCTACCAAG GCCATTCATGCCGGTGAACATGTGGACGTTCACGGTTCCG
TGATCGAACCCATTTCTTTG TCCACCACTTTCAAACAATCTTCTCCAGC
TAACCCTATCGGTACTTACGAATACTCCAGA TCTCAAAATCCTAACAGA
GAGAACTTGGAAAGAGCAGTTGCCGCTTTAGAGAACGCTCAA TACGGGT
TGGCTTTCTCCTCTGGTTCTGCCACCACCGCCACAATCTTGCAATCGCTT
CCT CAGGGCTCCCATGCGGTCTCTATCGGTGATGTGTACGGTGGTACCC
ACAGATACTTCACC AAAGTCGCCAACGCTCACGGTGTGGAAACCTCCTT
CACTAACGATTTGTTGAACGATCTA CCTCAATTGATAAAGGAAAACACC
AAATTGGTCTGGATCGAAACCCCAACCAACCCAACT
28
2,000 base pairs
GCAGCGCACGACAGCTGTGCTATCCCGGCGAGCCCGTGGCAGAGGACCTC
GCTTGCGAAA GCATCGAGTACCGCTACAGAGCCAACCCGGTGGACAAAC
TCGAAGTCATTGTGGACCGAA TGAGGCTCAATAACGAGATTAGCGACCT
CGAAGGCCTGCGCAAATATTTCCACTCCTTCC CGGGTGCTCCTGAGTTG
AACCCGCTTAGAGACTCCGAAATCAACGACGACTTCCACCAGT GGGCCC
AGTGTGACCGCCACACTGGACCCCATACCACTTCTTTTTGTTATTCTTAA
ATAT GTTGTAACGCTATGTAATTCCACCCTTCATTACTAATAATTAGCC
ATTCACGTGATCTCA GCCAGTTGTGGCGCCACACTTTTTTTTCCATAAA
AATCCTCGAGGAAAAGAAAAGAAAAA AATATTTCAGTTATTTAAAGCAT
AAGATGCCAGGTAGATGGAACTTGTGCCGTGCCAGAT TGAATTTTGAAA
GTACAATTGAGGCCTATACACATAGACATTTGCACCTTATACATATAC A
CACAAGACAAAACCAAAAAAAATATGACTCTACAAGAATCTGATAAATTT
GCTACCAAG GCCATTCATGCCGGTGAACATGTGGACGTTCACGGTTCCG
TGATCGAACCCATTTCTTTG TCCACCACTTTCAAACAATCTTCTCCAGC
TAACCCTATCGGTACTTACGAATACTCCAGA TCTCAAAATCCTAACAGA
GAGAACTTGGAAAGAGCAGTTGCCGCTTTAGAGAACGCTCAA TACGGGT
TGGCTTTCTCCTCTGGTTCTGCCACCACCGCCACAATCTTGCAATCGCTT
CCT CAGGGCTCCCATGCGGTCTCTATCGGTGATGTGTACGGTGGTACCC
ACAGATACTTCACC AAAGTCGCCAACGCTCACGGTGTGGAAACCTCCTT
CACTAACGATTTGTTGAACGATCTA CCTCAATTGATAAAGGAAAACACC
AAATTGGTCTGGATCGAAACCCCAACCAACCCAACT
GCAGCGCACGACAGCTGTGCTATCCCGGCGAGCCCGTGGCAGAGGACCT
CGCTTGCGAAA GCATCGAGTACCGCTACAGAGCCAACCCGGTGGACAAA
CTCGAAGTCATTGTGGACCGAA TGAGGCTCAATAACGAGATTAGCGACC
TCGAAGGCCTGCGCAAATATTTCCACTCCTTCC CGGGTGCTCCTGAGTT
GAACCCGCTTAGAGACTCCGAAATCAACGACGACTTCCACCAGT GGGCC
CAGTGTGACCGCCACACTGGACCCCATACCACTTCTTTTTGTTATTCTTA
AATAT GTTGTAACGCTATGTAATTCCACCCTTCATTACTAATAATTAGC
CATTCACGTGATCTCA GCCAGTTGTGGCGCCACACTTTTTTTTCCATAA
AAATCCTCGAGGAAAAGAAAAGAAAAA AATATTTCAGTTATTTAAAGCA
TAAGATGCCAGGTAGATGGAACTTGTGCCGTGCCAGAT TGAATTTTGAA
AGTACAATTGAGGCCTATACACATAGACATTTGCACCTTATACATATAC
ACACAAGACAAAACCAAAAAAAATATGACTCTACAAGAATCTGATAAATT
TGCTACCAAG GCCATTCATGCCGGTGAACATGTGGACGTTCACGGTTCC
GTGATCGAACCCATTTCTTTG TCCACCACTTTCAAACAATCTTCTCCAG
CTAACCCTATCGGTACTTACGAATACTCCAGA TCTCAAAATCCTAACAG
AGAGAACTTGGAAAGAGCAGTTGCCGCTTTAGAGAACGCTCAA TACGGG
TTGGCTTTCTCCTCTGGTTCTGCCACCACCGCCACAATCTTGCAATCGCT
TCCT CAGGGCTCCCATGCGGTCTCTATCGGTGATGTGTACGGTGGTACC
CACAGATACTTCACC AAAGTCGCCAACGCTCACGGTGTGGAAACCTCCT
TCACTAACGATTTGTTGAACGATCTA CCTCAATTGATAAAGGAAAACAC
CAAATTGGTCTGGATCGAAACCCCAACCAACCCAACT
29
What about size?
base pairs x 2,000 cm of
paper 2,000 Small gene
1 5,000
Small virus
2.5 1,000,000 Small bacterial genome
500 5 5,000,000 Large
bacterial genome 2,500 25
13,000,000 Yeast genome
6,500 65 180,000,000 Fruit fly
genome 90,000 900
3,000,000,000 human genome 1,500,000
1,500
Printing all of the nucleotide sequences at the
NCBI, would now be 9.5 km high
30
Top Ten Challenges for Bioinformatics
Chris Burge, Ewan Birney, Jim Fickett. Genome
Technology, issue No. 17, January, 2002
  • Precise, predictive model of transcription
    initiation and termination ability to predict
    where and when transcription will occur in a
    genome
  • Precise, predictive model of RNA
    splicing/alternative splicing ability to predict
    the splicing pattern of any primary transcript in
    any tissue
  • Precise, quantitative models of signal
    transduction pathways ability to predict
    cellular responses to external stimuli
  • Determining effective protein DNA, proteinRNA
    and proteinprotein recognition codes
  • Accurate ab initio protein structure prediction
  • Rational design of small molecule inhibitors of
    proteins
  • Mechanistic understanding of protein evolution
    understanding exactly how new protein functions
    evolve
  • Mechanistic understanding of speciation
    molecular details of how speciation occurs
  • Continued development of effective gene
    ontologies - systematic ways to describe the
    functions of any gene or protein
  • Education development of appropriate
    bioinformatics curricula for secondary,
    undergraduate and graduate education

31
1- Precise, predictive model of transcription
initiation and termination ability to predict
where and when transcription will occur in a
genome
http//tinyurl.com/2t9c6y
  • Understanding the parts list is critical for
    biologist to plan their experiments and to grasp
    the context of the biological problem they are
    workig with.
  • Understanding how these parts are different in
    healthy and cancerous cells is also critical
  • Knowing what these parts are is obviously very
    important

9 - Continued development of effective gene
ontologies, systematic ways to describe the
functions of any gene or protein
32
http//tinyurl.com/2t9c6y
  • 1- Precise, predictive model of transcription
    initiation and termination ability to predict
    where and when transcription will occur in a
    genome

33
4- Determining effective protein DNA,
proteinRNA and proteinprotein recognition codes
  • Formalizing data is something bioinformatics
    people like to do.
  • There are hundreds of databases that define
    protein-protein interaction databases, and
    protein-RNA, protein-DNA and protein- small
    molecules
  • Understanding and capturing this information for
    healthy and cancerous cells is also necessary.

34
Christopher Hogue
35
Christopher Hogue
36
Christopher Hogue and Gary Bader
37
3 - Precise, quantitative models of signal
transduction pathways ability to predict
cellular responses to external stimuli
  • Pathways are the end product of gene expression
    they are the result of complexes coming together,
    networks of all of the cells parts and their
    coordinated orchestration into the expression of
    a biological state.
  • When pathways breakdown the cells die or become
    very sick.
  • Cancer can be studied by studying the pathways of
    the cell gone bad.
  • Formalization of pathway data is very
    complicated, but is being done.
  • There are several database projects whose goal it
    is to represent our biological knowledge of
    pathways.

38
Reactome http//reactome.org/
  • Reactome is to develop a curated resource of core
    pathways and reactions in human biology.
  • Understanding these in normal and cancerous cells
    will provide insights on the biology of cancer.
  • Databases like this one are labor intensive and
    require the input of bioinformaticians and
    biologist alike.

39
(No Transcript)
40
Pathways are inter-linked
Signalling pathway
Genetic network
STIMULUS
Metabolic pathway
41
10- Education development of appropriate
bioinformatics curricula for secondary,
undergraduate and graduate education
  • Ther have been in the last 5 years a number of
    new programs, new courses and several workshops
    in bioinformatics offered here in Ontario, in
    Canada and world-wide.
  • There is still a critical need for many
    bioinformaticians
  • We need to continue supporting many of the
    existing programs

42
10- Development of appropriate bioinformatics
curricula for secondary, undergraduate and
graduate education
  • In the last 5 years a number of programs, courses
    and workshops have been established in Ontario,
    Canada and the world.
  • There is still a shortage of skilled
    bioinformatics people world-wide.
  • There is still a need for bioinformatics
    workshops
  • http//bioinformatics.ca

43
http//bioinformatics.ca
44
New 2008 CBW workshops
http//bioinformatics.ca
  • Putting the Web to Work Tools to Accelerate Life
    Science Research
  • Interpreting Gene Lists from OMICS Studies
  • Informatics on High Throughput Sequencing Data
  • Systems Network Biology
  • Essential Statistics in Biology Getting the
    Numbers Right

45
(No Transcript)
46
Doing Science in a reproducible, predictable,
repeatable, efficient way will require
  • Open Source
  • Public and private sector
  • New business models
  • Open Access
  • All biomolecular data
  • Clinical data
  • Scientific publications
  • Methods will need to be represented and
    delivered in a way that will allow anybody to
    reproduce, use and modify.

47
Open Access to Data
  • DNA sequences in GenBank
  • It is now part of the scientific culture and
    expectations to submit DNA and protein sequences
    to GenBank. This is now expected for gene
    expression, protein structures and protein
    interactions.
  • The motivating agents for getting people to put
    their sequences in GenBank was not for the good
    of humankind, but rather the publishers and the
    funding agencies.

48
Open Access has to be mandated
  • It is now, by
  • CIHR
  • GenomeCanada
  • NIH
  • Welcome/MRC
  • OICR
  • Also needs buy-in from he Universities and all
    provincial funding agencies
  • By the Presidents, Deans of academic and publicly
    funded research institutions.

49
Open Access of Publications Definition
  • An open access publication is a peer reviewed
    publication that can be downloaded free online to
    any user worldwide.

50
How do Grantees Make Pubs Open Access?
  • Publish in Open Access journal or Journal that
    has delayed Open Access (e.g. within 6 months).
  • Publish anywhere - but self-archive. Put the
    peer-reviewed manuscript in PubMed Central and/or
    an institutional repository within (e.g.) 6
    months of publication.

51
http//www.cihr-irsc.gc.ca/e/32005.html
52
http//bioinformatics.ca/links_directory/
The Bioinformatics Links Directory features
curated links to molecular resources, tools and
databases.
53
Bioinformatics Links Directory
54
Things not on the Top 10 list
  • Whole Genome Association studies.
  • Systems biology and data integration
  • High throughput genome sequencing and the
    consequences of that data.
  • Information technology challenges
  • Health informatics

55
Rationale - OICR Blueprint
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
OiCR
62
Power point slides taken from these people
  • Fiona Brinkman, SFU
  • Mark Boguski, NCBI/NIH
  • Jim Ostell, NCBI/NIH
  • Andy Baxevanis, NHGRI/NIH
  • Christopher Hogue
  • Garry Bader, University of Toronto
  • Chris Ponting, Oxford University

63
http//www.oicr.on.ca/research/ouellette.htm
64
Funding provided by the Government of Ontario
65
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com