Reference Genome Project - PowerPoint PPT Presentation

About This Presentation
Title:

Reference Genome Project

Description:

Those organisms were selected because they are ... of common genes between human, fly and zebrafish that were being used as a test ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 19
Provided by: Pasc151
Category:

less

Transcript and Presenter's Notes

Title: Reference Genome Project


1
Reference Genome Project
ZFIN
2
(No Transcript)
3
Purpose
  • - Provide comprehensive annotation for 12
    genomes
  • Arabidopsis thaliana
  • Caenorhabditis elegans
  • Danio rerio
  • Dictyostelium discoideum
  • Drosophila melanogaster
  • Escherichia coli
  • Gallus gallus
  • Homo sapiens
  • Mus musculus
  • Rattus norvegicus
  • Saccharomyces cerevisiae
  • Schizosaccharomyces pombe
  • Those organisms were selected because they are
  • established model organisms with published
    experimental data
  • have a genome database
  • have experienced GO curators

4
Complete genome annotation
  • Breadth every gene in the 12 genomes be
    annotated
  • Depth every gene be annotated to the highest
    level of knowledge
  • The group has agreed that depth of annotation
    is best assessed by the curator annotating the
    gene. - If a gene has less than 5-10 papers, it
    makes sense to read and annotate all papers
  • - If a gene has a lot of literature, the
    preferred strategy is to look at a recent review
    to make sure all important primary literature is
    captured more recent papers be read

5
Metrics assessing breadth and depth of
annotations
Need GFF3 files for all Ref Genomes!
  • Breadth
  • Number of genes (protein coding and functional
    RNAs based on SO)
  • Number of genes with some functional annotation
  • Number of genes with functional annotation based
    on experiments using that organism
  • Number of genes with function inferred by
    sequence similarity
  • Number of genes with function inferred by
    electronic annotations
  • Number of genes for which there is no available
    information (root/ND annotations)
  • Depth
  • Number of papers linked to a gene
  • Number of papers used to produce functional
    annotation
  • Number of papers read but for which no new
    annotations were produced.
  • Ratio of deepest annotation to leaf node to
    measure granularity and use of the ontology (Suzi)

Depth needs to be assessed by curators!!
http//gocwiki.geneontology.org/index.php/Metrics
_breath_and_depth_of_annotations
6
Figures for Reference Genome genes, completely
fictitious data
Figures for Whole Genome genes, completely
fictitious data
From Ruth (word doc sent earlier this week)
7
Mike Cherry
8
(No Transcript)
9
Measuring Information Content
Organism Distance to leaf Distance to leaf Information Content Information Content Coverage Coverage Pubs per gene Pubs per gene Terms per gene Terms per gene
All Ref All Ref All Ref All Ref All Ref
Arabidopsis thaliana 4.29 3.74 11.53 12.01 17.06 28.28 2.02 2.20 2.88 3.46
Caenorhabditis elegans 4.74 4.37 11.40 11.90 27.98 48.94 1.73 3.23 4.72 9.51
Danio rerio 4.35 4.01 11.01 11.66 19.34 33.56 1.65 2.36 6.12 8.10
Dictyostelium discoideum 4.40 3.46 11.37 12.77 25.84 41.56 1.89 2.46 4.44 6.41
Drosophila melanogaster 3.56 3.03 12.86 13.37 33.29 55.69 3.12 5.34 4.26 6.55
H. sapiens 3.65 3.65 12.96 12.87 30.80 73.71 6.01 6.01 4.65 6.82
Mus musculus 4.83 3.50 10.90 13.37 32.44 84.21 2.95 10.69 5.47 13.01
R. norvegicus 4.08 3.26 11.56 13.10 36.27 77.81 2.29 5.71 5.07 8.00
S. pombe 3.82 3.05 11.94 12.96 44.15 59.54 3.13 4.01 6.91 17.55
Saccharomyces cerevisiae 3.22 2.63 12.88 14.08 37.42 62.84 3.04 5.67 4.65 6.82
Chris Mungall
10
Priorities Selection of curation targets
  • Genes that, when mutated, cause a disease
  • Not included upregulated in cancer x,
    interacts with tumor suppressor y, and other
    weak evidence
  • Disease gene lists - OMIM- RGD disease portal
    first group neurological diseases
  • Other lists - list of common genes between
    human, fly and zebrafish that were being used as
    a test case for PATO annotations many were not
    in OMIM revisit??
  • Current status trying to focus on genes with
    the broadest interest, however these often lack
    orthologs in yeast, E. coli, etc, so need to
    balance these factors.

11
Orthologs
  • Curators for each database are responsible for
    identifying orthologs of the disease gene
  • Available tools - YOGY- InParanoid-
    OrthoMCL- TreeFam- Homologene
  • Sequence analysis by curators

12
Software
  • Google spreadsheet - shared by all curators-
    each database keeps track of putative orthologs-
    each database records the curation status for
    each gene
  • Software requirements - Ensures consistent use
    of identifiers- Allow loading of MOD reports-
    Track that no ortholog was found- Provide
    reports to focus curation effort- Record that
    curation is 'comprehensive' as of a certain
    date- Allow a 1many relation between Human gene
    and MOD ortholog- Record orthology determination
    method

13
http//rails-dev.bioinformatics.northwestern.edu2
4000/index.html
14
Annotation Progress
Organism Genes with Ortholog Gene Curated Curated genes with publications
A. thaliana 32 99 32
C. elegans 65 46 99
D. discoideum 40 41 26
D. melanogaster 48 50 67
D. rerio 87 90 26
H. sapiens 44 98
M. musculus 98 84 94
R. norvegicus 96 100 55
S. cerevisiae 30 100 99
S. pombe 34 81 33
  • Curation software will be able to generate that
    information
  • We would like to display the list of selected
    genes, the list of identified orthologs, the
    curation status and a way to access annotations
    (graphs)

15
Annotation Consistency Comparing annotations
16
(No Transcript)
17
Ontology development
Number of Source Forge requests in the "Reference
Genome" group
18
Outreach publicizing the reference genome effort
  • Several suggestions
  • GO newsletter (already have the gene of the
    quarter) could add diseases
  • NCBI/OMIM could display/advertise genes with
    annotations
  • Take advantage of user requests that fit nicely
    in the initiative
  • Set up a reference genome wiki page showing which
    genes are coming up for annotation, which could
    also be used by researchers to suggest target
    genes
  • Make a page on the GO website that would include
    diseases genes we are curating and the gene of
    the quarter articles
  • Special display in AmiGO
  • Provide annotations in a separate file
  • Mark disease genes specifically in MODs

http//gocwiki.geneontology.org/index.php/Outreach
_publicizing_the_project_and_developing_a_web_pre
sence
Write a Comment
User Comments (0)
About PowerShow.com