The Gene Wiki, from a BioRDFnave perspective

About This Presentation

Title:

The Gene Wiki, from a BioRDFnave perspective

Description:

44% of genes in Entrez Gene have zero ... Blogs. YouTube. Amazon reviews. American Idol. Wikipedia 'Community intelligence' The Long Tail of encyclopedias. 4 ... – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 21

Provided by: ASU12

Category:

more less

Transcript and Presenter's Notes

Title: The Gene Wiki, from a BioRDFnave perspective

1
The Gene Wiki, from a BioRDF-naïve perspective
W3C / HCLSIGBioRDF SubgroupNovember 17, 2008
2
Patterns of gene annotation
How do we efficiently annotate the function of
the 25,000 genes in the mammalian genome? Goal
Genome-wide functional genomics
P(k) k -a
44 of genes in Entrez Gene have zero linked
references. Over 75 have five or fewer linked
references.
3
The Long Tail of Knowledge

Traditional media revolves around the Short Head
a few number of publishers putting out lots of
content
Web 2.0 media revolves around community
generated content a huge population of
individuals each generating a (relatively) small
amount of content

The Short Head Newspapers TV/Hollywood Consumer
Reports Olympics Encyclopedia Britannica
The Long Tail Blogs YouTube Amazon
reviews American Idol Wikipedia
Community intelligence
4
The Long Tail of encyclopedias

Wiki a website that allows the visitors
themselves to easily add, remove, and otherwise
edit and change available content, typically
without the need for registration.
Wikipedia the free encyclopedia that anyone can
edit.

An expert-led investigation carried out by Nature
revealed numerous errors in both
encyclopaedias, but among 42 entries tested, the
difference in accuracy was not particularly
great the average science entry in Wikipedia
contained around four inaccuracies Britannica,
about three.
http//en.wikipedia.org/wiki/WikipediaSize_compar
isons, July 2008
5
Advantages of a Gene Wiki
1) Existing gene portals are great for structured
content, but a wiki is suited for summarizing
unstructured content
Entrez Gene
Wikipedia
Unstructured content allows for free-text,
images, diagrams, photos, etc.
6
Advantages of a Gene Wiki
2) Wiki articles enable two-way communication of
information, encouraging contributions and edits
from the community.
Dec 18, 2002
Jan 3, 2004
Dec 11, 2004
May 6, 2006
Wikipedia is rarely the last place you look, but
is often a good first place for an overview.
7
Gene stubs

Active MCB community at WP had already developed
650 gene articles
Can we accelerate this process through stub
creation?
In total, created 7500 new articles and edited
650 previously existing articles.

8
Why Wikipedia?

Critical mass of articles to which and from which
we could link gene pages
Critical mass of editors who were experienced in
wiki-related issues (fighting vandalism,
copyediting, governance)
Active group of molecular biologists at the MCB
WikiProject (http//en.wikipedia.org/wiki/WPMCB
)
Alternatives considered
Home-built wiki
Citizendium (citizendium.org)

9
Gene wiki usage
Current have 9000 gene pages or stubs at
Wikipedia
50 of all edits to gene pages are to
newly-created pages
Gene Wiki pages are highly ranked at Google,
ensuring critical mass of users and editors
10
Positive feedback loop
Gene wiki page utility
1
100
2
200
Number of readers
Number of editors
11
25k gene-specific review articles?

Reelin 33 editors, 221 edits since July 2002
Heparin 175 editors, 320 edits since June 2003
AMPK 44 editors, 84 edits since March 2004
RNAi 232 editors, 708 edits since October 2002

Hyperlinks to related concepts
12
Gene Wiki activity

Steady (and growing?) edit rate over time

13
Gene Wiki article growth
http//manyeyes.alphaworks.ibm.com/manyeyes/visual
izations/gene-wiki-top-2500-20081114
14
Welcome to the semantic web

The main concern with plaintext-on-Wikipedia is
that it's not an effective way to truly exploit
the long tail, since you're going to end up with
this massive plaintext disaster that will require
human collating (redundant work- just get it
right the first time).
- public-semweb-lifesci mailing list

15
Primary emphases

Providing useful content scientists will not
find or contribute to a wiki unless it is already
useful
Instant feedback wikis allow changes to be
effective immediately, without approval or
intermediary (e.g., corrections/additions to
NCBI/Ensembl?)
Emphasis on contributors, not data miners
emphasize getting data in, not on getting it out,
since complex protocols encourage
nonparticipation (e.g., MIAME)
Critical mass What will differentiate the Gene
Wiki from the many other wiki efforts that are
stagnant?

16
Secondary emphases

Reliability and accuracy do open and uncurated
data models produce trustworthy content?
Synergy with existing resource how can the Gene
Wiki make the growth of traditional annotation
more efficient?
Enabling semantic queries/structure how can we
structure unstructured content for data mining?
(Semantic Mediawiki? NLP?)

17
Idealized information flow
Long tail scientific contributions
Direct semantic annotation by scientists
Wikipedia
Semantic structure
NCBI
Ensembl

Authoritative annotation databases
18
Figure to scale?
Long tail scientific contributions
Semantic structure
19
Summary

Goal create a complementary resource to existing
tools, not competitive.
Primary emphasis will always be on maximizing
community participation.
How do we structure the unstructured
contributions?

20
Acknowledgements
Serge Batalov Jason Boyer Jennifer Floyd Yue
Hu Jon Huss Jeff Janes Camilo Orozco Steve
Su Julia Turner Chunlei Wu David Delano James
Goodale Phil McClurg Richard Trager
Faramarz Valafar, SDSU Tim Vickers, Washington
Univ
Michael Cooke Pete Schultz
Funding NIGMS, NIH Novartis Research Foundation

Write a Comment

User Comments (0)