Reference Genome Project - PowerPoint PPT Presentation

About This Presentation
Title:

Reference Genome Project

Description:

What criteria should we use to collect and prioritize genes for the reference genomes? ... Special display in AmiGO. Provide annotations in a separate file ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 20
Provided by: Pasc151
Category:

less

Transcript and Presenter's Notes

Title: Reference Genome Project


1
Reference Genome Project
ZFIN
2
Purpose
  • - Provide comprehensive annotation for 12
    genomes
  • Arabidopsis thaliana
  • Caenorhabditis elegans
  • Danio rerio
  • Dictyostelium discoideum
  • Drosophila melanogaster
  • Escherichia coli
  • Gallus gallus
  • Homo sapiens
  • Mus musculus
  • Rattus norvegicus
  • Saccharomyces cerevisiae
  • Schizosaccharomyces pombe
  • Those organisms were selected because they are
  • established model organisms with published
    experimental data
  • have a genome database
  • have experienced GO curators

3
Questions for Advisors
  • What criteria should we use to collect and
    prioritize genes for the reference genomes?
  • What measures would be effective for assessing
    progress on the reference genome projects?
  • What level of accuracy and level of detail for
    individual genes, and coverage of all functional
    elements across the genome?

4
Complete genome annotation
  • Breadth every gene in the 12 genomes be
    annotated
  • Depth every gene be annotated to the highest
    level of experimental knowledge in that organism
  • The group has agreed that depth of annotation
    is best assessed by the curator annotating the
    gene. - If a gene has less than 5-10 papers, it
    makes sense to read and annotate all papers
  • - If a gene has a lot of literature, the
    preferred strategy is to look at a recent review
    to make sure all important primary literature is
    captured more recent papers be read

5
Metrics assessing breadth and depth of
annotations
  • Breadth
  • Number of genes (protein coding and functional
    RNAs based on SO)
  • Number of genes with some functional annotation
  • Number of genes with functional annotation based
    on experiments using that organism
  • Number of genes with function inferred by
    sequence similarity
  • Number of genes with function inferred by
    electronic annotations
  • Number of genes for which there is no available
    information (root/ND annotations)
  • Depth
  • Number of papers linked to a gene
  • Number of papers used to produce functional
    annotation
  • Number of papers read but for which no new
    annotations were produced.
  • Ratio of deepest annotation to leaf node to
    measure granularity and use of the ontology (Suzi)

http//gocwiki.geneontology.org/index.php/Metrics
_breath_and_depth_of_annotations
6
Figures for Reference Genome genes, completely
fictitious data
Figures for Whole Genome genes, completely
fictitious data
7
Mike Cherry
8
(No Transcript)
9
Measuring Information Content
Chris Mungall
10
Information Content from an Ontology Perspective
11
Priorities Selection of curation targets
  • Genes that, when mutated, cause a disease
  • Not included upregulated in cancer x,
    interacts with tumor suppressor y, and other
    weak evidence
  • Disease gene lists - OMIM- RGD disease portal
    first group neurological diseases
  • Other lists - list of common genes between
    human, fly and zebrafish that were being used as
    a test case for PATO annotations many were not
    in OMIM revisit??
  • Current status trying to focus on genes with
    the broadest interest, however these often lack
    orthologs in yeast, E. coli, etc, so need to
    balance these factors.
  • ADVICE HOW TO BALANCE?

12
Orthologs
  • Curators for each database are responsible for
    identifying orthologs of the selected gene
    (currently prioritized by OMIM disease set)
  • Available tools - YOGY- InParanoid-
    OrthoMCL- TreeFam- Homologene
  • Sequence analysis by curators
  • REFERENCE GENOME MEETING WED THURSmajor
    discussion topic

13
Software
  • Google spreadsheet - shared by all curators-
    each database keeps track of putative orthologs-
    each database records the curation status for
    each gene
  • Software requirements - Ensures consistent use
    of identifiers- Allow loading of MOD reports-
    Track that no ortholog was found- Provide
    reports to focus curation effort- Record that
    curation is 'comprehensive' as of a certain
    date- Allow a 1many relation between Human gene
    and MOD ortholog- Record orthology determination
    method
  • Software Group currently developing

14
http//rails-dev.bioinformatics.northwestern.edu2
4000/index.html
15
Annotation Progress
  • Curation software will be able to generate that
    information
  • We would like to display the list of selected
    genes, the list of identified orthologs, the
    curation status and a way to access annotations
    (graphs)

16
Annotation Consistency Comparing annotations
17
(No Transcript)
18
Ontology development
Number of Source Forge requests in the "Reference
Genome" group
19
Outreach publicizing the reference genome effort
  • Several suggestions
  • GO newsletter (already have the gene of the
    quarter) could add diseases
  • NCBI/OMIM could display/advertise genes with
    annotations
  • Take advantage of user requests that fit nicely
    in the initiative
  • Set up a reference genome wiki page showing which
    genes are coming up for annotation, which could
    also be used by researchers to suggest target
    genes
  • Make a page on the GO website that would include
    diseases genes we are curating and the gene of
    the quarter articles
  • Special display in AmiGO
  • Provide annotations in a separate file
  • Mark disease genes specifically in MODs

http//gocwiki.geneontology.org/index.php/Outreach
_publicizing_the_project_and_developing_a_web_pre
sence
20
Goals for Upcoming Year
  • Continued curation
  • at least 250 additional genes
  • Review priorities for target selection
  • Software and database implemented
  • Increased visibility
  • Web presence
  • Paper
  • Integration with GO database
  • Meetings
  • Metrics established and tracked

21
Questions for Advisors
  • What criteria should we use to collect and
    prioritize genes for the reference genomes?
  • What measures would be effective for assessing
    progress on the reference genome projects?
  • What level of accuracy and level of detail for
    individual genes, and coverage of all functional
    elements across the genome?
Write a Comment
User Comments (0)
About PowerShow.com