Title: Reference Genome Project
1Reference Genome Project
ZFIN
2Purpose
- - Provide comprehensive annotation for 12
genomes - Arabidopsis thaliana
- Caenorhabditis elegans
- Danio rerio
- Dictyostelium discoideum
- Drosophila melanogaster
- Escherichia coli
- Gallus gallus
- Homo sapiens
- Mus musculus
- Rattus norvegicus
- Saccharomyces cerevisiae
- Schizosaccharomyces pombe
- Those organisms were selected because they are
- established model organisms with published
experimental data - have a genome database
- have experienced GO curators
3Questions for Advisors
- What criteria should we use to collect and
prioritize genes for the reference genomes? - What measures would be effective for assessing
progress on the reference genome projects? - What level of accuracy and level of detail for
individual genes, and coverage of all functional
elements across the genome?
4Complete genome annotation
- Breadth every gene in the 12 genomes be
annotated - Depth every gene be annotated to the highest
level of experimental knowledge in that organism
- The group has agreed that depth of annotation
is best assessed by the curator annotating the
gene. - If a gene has less than 5-10 papers, it
makes sense to read and annotate all papers - - If a gene has a lot of literature, the
preferred strategy is to look at a recent review
to make sure all important primary literature is
captured more recent papers be read
5Metrics assessing breadth and depth of
annotations
- Breadth
- Number of genes (protein coding and functional
RNAs based on SO) - Number of genes with some functional annotation
- Number of genes with functional annotation based
on experiments using that organism - Number of genes with function inferred by
sequence similarity - Number of genes with function inferred by
electronic annotations - Number of genes for which there is no available
information (root/ND annotations) - Depth
- Number of papers linked to a gene
- Number of papers used to produce functional
annotation - Number of papers read but for which no new
annotations were produced. - Ratio of deepest annotation to leaf node to
measure granularity and use of the ontology (Suzi)
http//gocwiki.geneontology.org/index.php/Metrics
_breath_and_depth_of_annotations
6Figures for Reference Genome genes, completely
fictitious data
Figures for Whole Genome genes, completely
fictitious data
7Mike Cherry
8(No Transcript)
9Measuring Information Content
Chris Mungall
10Information Content from an Ontology Perspective
11Priorities Selection of curation targets
- Genes that, when mutated, cause a disease
- Not included upregulated in cancer x,
interacts with tumor suppressor y, and other
weak evidence - Disease gene lists - OMIM- RGD disease portal
first group neurological diseases - Other lists - list of common genes between
human, fly and zebrafish that were being used as
a test case for PATO annotations many were not
in OMIM revisit?? - Current status trying to focus on genes with
the broadest interest, however these often lack
orthologs in yeast, E. coli, etc, so need to
balance these factors. - ADVICE HOW TO BALANCE?
12Orthologs
- Curators for each database are responsible for
identifying orthologs of the selected gene
(currently prioritized by OMIM disease set) - Available tools - YOGY- InParanoid-
OrthoMCL- TreeFam- Homologene - Sequence analysis by curators
- REFERENCE GENOME MEETING WED THURSmajor
discussion topic
13Software
- Google spreadsheet - shared by all curators-
each database keeps track of putative orthologs-
each database records the curation status for
each gene - Software requirements - Ensures consistent use
of identifiers- Allow loading of MOD reports-
Track that no ortholog was found- Provide
reports to focus curation effort- Record that
curation is 'comprehensive' as of a certain
date- Allow a 1many relation between Human gene
and MOD ortholog- Record orthology determination
method - Software Group currently developing
14http//rails-dev.bioinformatics.northwestern.edu2
4000/index.html
15Annotation Progress
- Curation software will be able to generate that
information - We would like to display the list of selected
genes, the list of identified orthologs, the
curation status and a way to access annotations
(graphs)
16Annotation Consistency Comparing annotations
17(No Transcript)
18Ontology development
Number of Source Forge requests in the "Reference
Genome" group
19Outreach publicizing the reference genome effort
- Several suggestions
- GO newsletter (already have the gene of the
quarter) could add diseases - NCBI/OMIM could display/advertise genes with
annotations - Take advantage of user requests that fit nicely
in the initiative - Set up a reference genome wiki page showing which
genes are coming up for annotation, which could
also be used by researchers to suggest target
genes - Make a page on the GO website that would include
diseases genes we are curating and the gene of
the quarter articles - Special display in AmiGO
- Provide annotations in a separate file
- Mark disease genes specifically in MODs
http//gocwiki.geneontology.org/index.php/Outreach
_publicizing_the_project_and_developing_a_web_pre
sence
20Goals for Upcoming Year
- Continued curation
- at least 250 additional genes
- Review priorities for target selection
- Software and database implemented
- Increased visibility
- Web presence
- Paper
- Integration with GO database
- Meetings
- Metrics established and tracked
21Questions for Advisors
- What criteria should we use to collect and
prioritize genes for the reference genomes? - What measures would be effective for assessing
progress on the reference genome projects? - What level of accuracy and level of detail for
individual genes, and coverage of all functional
elements across the genome?