Reference Genome Project - PowerPoint PPT Presentation

About This Presentation

Title:

Reference Genome Project

Description:

What criteria should we use to collect and prioritize genes for the reference genomes? ... Special display in AmiGO. Provide annotations in a separate file ... – PowerPoint PPT presentation

Number of Views:85

Avg rating:3.0/5.0

Slides: 20

Provided by: Pasc151

Learn more at: https://wiki.geneontology.org

Category:

more less

Transcript and Presenter's Notes

Title: Reference Genome Project

1
Reference Genome Project
ZFIN
2
Purpose

- Provide comprehensive annotation for 12
genomes
Arabidopsis thaliana
Caenorhabditis elegans
Danio rerio
Dictyostelium discoideum
Drosophila melanogaster
Escherichia coli
Gallus gallus
Homo sapiens
Mus musculus
Rattus norvegicus
Saccharomyces cerevisiae
Schizosaccharomyces pombe

Those organisms were selected because they are
established model organisms with published
experimental data
have a genome database
have experienced GO curators

3
Questions for Advisors

What criteria should we use to collect and
prioritize genes for the reference genomes?
What measures would be effective for assessing
progress on the reference genome projects?
What level of accuracy and level of detail for
individual genes, and coverage of all functional
elements across the genome?

4
Complete genome annotation

Breadth every gene in the 12 genomes be
annotated
Depth every gene be annotated to the highest
level of experimental knowledge in that organism

The group has agreed that depth of annotation
is best assessed by the curator annotating the
gene. - If a gene has less than 5-10 papers, it
makes sense to read and annotate all papers
- If a gene has a lot of literature, the
preferred strategy is to look at a recent review
to make sure all important primary literature is
captured more recent papers be read

5
Metrics assessing breadth and depth of
annotations

Breadth
Number of genes (protein coding and functional
RNAs based on SO)
Number of genes with some functional annotation
Number of genes with functional annotation based
on experiments using that organism
Number of genes with function inferred by
sequence similarity
Number of genes with function inferred by
electronic annotations
Number of genes for which there is no available
information (root/ND annotations)
Depth
Number of papers linked to a gene
Number of papers used to produce functional
annotation
Number of papers read but for which no new
annotations were produced.
Ratio of deepest annotation to leaf node to
measure granularity and use of the ontology (Suzi)

http//gocwiki.geneontology.org/index.php/Metrics
_breath_and_depth_of_annotations
6
Figures for Reference Genome genes, completely
fictitious data
Figures for Whole Genome genes, completely
fictitious data
7
Mike Cherry
8
(No Transcript)
9
Measuring Information Content
Chris Mungall
10
Information Content from an Ontology Perspective
11
Priorities Selection of curation targets

Genes that, when mutated, cause a disease
Not included upregulated in cancer x,
interacts with tumor suppressor y, and other
weak evidence
Disease gene lists - OMIM- RGD disease portal
first group neurological diseases
Other lists - list of common genes between
human, fly and zebrafish that were being used as
a test case for PATO annotations many were not
in OMIM revisit??
Current status trying to focus on genes with
the broadest interest, however these often lack
orthologs in yeast, E. coli, etc, so need to
balance these factors.
ADVICE HOW TO BALANCE?

12
Orthologs

Curators for each database are responsible for
identifying orthologs of the selected gene
(currently prioritized by OMIM disease set)
Available tools - YOGY- InParanoid-
OrthoMCL- TreeFam- Homologene
Sequence analysis by curators
REFERENCE GENOME MEETING WED THURSmajor
discussion topic

13
Software

Google spreadsheet - shared by all curators-
each database keeps track of putative orthologs-
each database records the curation status for
each gene
Software requirements - Ensures consistent use
of identifiers- Allow loading of MOD reports-
Track that no ortholog was found- Provide
reports to focus curation effort- Record that
curation is 'comprehensive' as of a certain
date- Allow a 1many relation between Human gene
and MOD ortholog- Record orthology determination
method
Software Group currently developing

14
http//rails-dev.bioinformatics.northwestern.edu2
4000/index.html
15
Annotation Progress

Curation software will be able to generate that
information
We would like to display the list of selected
genes, the list of identified orthologs, the
curation status and a way to access annotations
(graphs)

16
Annotation Consistency Comparing annotations
17
(No Transcript)
18
Ontology development
Number of Source Forge requests in the "Reference
Genome" group
19
Outreach publicizing the reference genome effort

Several suggestions
GO newsletter (already have the gene of the
quarter) could add diseases
NCBI/OMIM could display/advertise genes with
annotations
Take advantage of user requests that fit nicely
in the initiative
Set up a reference genome wiki page showing which
genes are coming up for annotation, which could
also be used by researchers to suggest target
genes
Make a page on the GO website that would include
diseases genes we are curating and the gene of
the quarter articles
Special display in AmiGO
Provide annotations in a separate file
Mark disease genes specifically in MODs

http//gocwiki.geneontology.org/index.php/Outreach
_publicizing_the_project_and_developing_a_web_pre
sence
20
Goals for Upcoming Year

Continued curation
at least 250 additional genes
Review priorities for target selection
Software and database implemented
Increased visibility
Web presence
Paper
Integration with GO database
Meetings
Metrics established and tracked

21
Questions for Advisors

What criteria should we use to collect and
prioritize genes for the reference genomes?
What measures would be effective for assessing
progress on the reference genome projects?
What level of accuracy and level of detail for
individual genes, and coverage of all functional
elements across the genome?

Write a Comment

User Comments (0)