Title: Unlocated Arthropod genes and ways to find them
1Unlocated Arthropod genesand ways to find them
- Many bug genes are hard to find
- - Daphnias many tandems were lost for a bit
- Duplicate genes, a bain and a boon
- Genome tile expression picks out many more
April 2008
Don Gilbert
Genome Informatics Lab, Biology Dept., Indiana
University gilbertd_at_indiana.edu
2Environ Stresses find Novels
- Novel Daphnia genes show under stress
- Novel Drosophila species genes are missed by
prediction
3Duplicate genes are common
- Daphnia surpasses C.elegans for rich tandem gene
set. - Bugs have many tandem genes
4Duplicates confuse Finders
- Prediction errors are common in duplicate gene
regions. - None of 13 predictors found all 4 tandems of this
Dwil P450 cluster, but each gene was properly
predicted among them.
5Duplicates find Errors
Prediction cline is artifact of Dmel training.
Retraining with Dmoj removes it.
- Duplicates solve prediction dilemma in
Drosophila.
6Odorant genes concur
Curation of Drosophila Obp genes also removes
prediction cline.
Vieira et al. (2007), and further analysis by
myself recovered genes using Psi-Blast trained on
species Obp genes. Computational errors are
significantly more common in Far-, Mid-mel group.
Obp genes show no overall gain/loss across groups.
7Tile expression finds genes
Daphnia tile expression with gene finding calls
26 coding bases over the genome, compared to 17
from gene predictions, or 5,000 - 10,000 new
genes.
Manak et al 2006, with Drosmel also found 24
CDS/genome, up from 18 CDS/genome from reference
gene set. Computational tools need to mature
gene finding is preliminary.
8Summary Locating novel genes
- More genes are expressed in unusual environs, and
are specific. Use many environmental,
developmental and tissue conditions to see range
of genes via expression. Understand the limits
of gene homology. - Duplicate genes are common, a problem, an aid to
finding genes. Examine duplicate genes carefully.
Tools that distinguish these can be used to find
paralogs missed by traditional methods. - Near species training reduces errors and spurious
effects. Use same-species and near-species data
as much as possible in preparing automated
annotations. Be aware of and control for
informant species-distance as a source of bias. - Genome-wide tile expression finds more genes. As
an alternative to EST studies, it has values and
drawbacks. Computational methods need to improve
to use this data well.
9Genome maps on your laptop
- Genome data sets that I use are available for
your computer. - Includes GMOD GBrowse software in a ready-to-run
bundle - http//eugenes.org/gmod/genomeview-package2008/
- This is fully configured for Intel-MacOSX 10.5,
others need further installation. - See http//www.gmod.org/GBrowse
- Map data (large) are at ftp//eugenes.org/eugenes
/gbrowse/databases/ - daphnia_pulex Daphnia genome data from
wfleabase.org - nasonia Wasp gene predictions, homology,
EST - tribcas Tribolium basic gene set from
NCBI genomes - drospege 12 Drosophila genomes
- drosmel Dros. mel rel 5.5 genome with
Affymetrix transcriptome data
10End note
- Acknowledgements
- I am grateful to support from NSF (DBI-0640462)
and the NIH, including TeraGrid award for making
this work possible. - Daphnia sequencing and portions of the analyses
were provided by DOE Joint Genome Institute and
in collaboration with the Daphnia Genomics
Consortium (DGC). - References
- Gilbert, 2007. New and old genes in Drosophila
genomes. http//insects.eugenes.org/DroSpeGe/abou
t/analysis-doc/ - Gilbert, 2007. Daphnia gene duplicates.
http//wfleabase.org/genome-summaries/gene-duplica
tes/ - Gilbert, 2008. Tandem genes lost found.
http//insects.eugenes.org/DroSpeGe/about/analysis
-doc/ - Manak, JR et al., 2006. .. unannotated
transcription in Dros. mel. Nature Genetics,
doi10.1038/ng1875 - Vieira, F.G. et al. 2007. .. analysis of the
Odorant-Binding genes in Drosophila genomes.
Genome Biology, doi10.1186/gb-2007-8-11-r235