The Gene Wiki, from a BioRDFnave perspective - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

The Gene Wiki, from a BioRDFnave perspective

Description:

44% of genes in Entrez Gene have zero ... Blogs. YouTube. Amazon reviews. American Idol. Wikipedia 'Community intelligence' The Long Tail of encyclopedias. 4 ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 21
Provided by: ASU12
Category:

less

Transcript and Presenter's Notes

Title: The Gene Wiki, from a BioRDFnave perspective


1
The Gene Wiki, from a BioRDF-naïve perspective
W3C / HCLSIGBioRDF SubgroupNovember 17, 2008
2
Patterns of gene annotation
How do we efficiently annotate the function of
the 25,000 genes in the mammalian genome? Goal
Genome-wide functional genomics
P(k) k -a
44 of genes in Entrez Gene have zero linked
references. Over 75 have five or fewer linked
references.
3
The Long Tail of Knowledge
  • Traditional media revolves around the Short Head
    a few number of publishers putting out lots of
    content
  • Web 2.0 media revolves around community
    generated content a huge population of
    individuals each generating a (relatively) small
    amount of content

The Short Head Newspapers TV/Hollywood Consumer
Reports Olympics Encyclopedia Britannica
The Long Tail Blogs YouTube Amazon
reviews American Idol Wikipedia
Community intelligence
4
The Long Tail of encyclopedias
  • Wiki a website that allows the visitors
    themselves to easily add, remove, and otherwise
    edit and change available content, typically
    without the need for registration.
  • Wikipedia the free encyclopedia that anyone can
    edit.

An expert-led investigation carried out by Nature
revealed numerous errors in both
encyclopaedias, but among 42 entries tested, the
difference in accuracy was not particularly
great the average science entry in Wikipedia
contained around four inaccuracies Britannica,
about three.
http//en.wikipedia.org/wiki/WikipediaSize_compar
isons, July 2008
5
Advantages of a Gene Wiki
1) Existing gene portals are great for structured
content, but a wiki is suited for summarizing
unstructured content
Entrez Gene
Wikipedia
Unstructured content allows for free-text,
images, diagrams, photos, etc.
6
Advantages of a Gene Wiki
2) Wiki articles enable two-way communication of
information, encouraging contributions and edits
from the community.
Dec 18, 2002
Jan 3, 2004
Dec 11, 2004
May 6, 2006
Wikipedia is rarely the last place you look, but
is often a good first place for an overview.
7
Gene stubs
  • Active MCB community at WP had already developed
    650 gene articles
  • Can we accelerate this process through stub
    creation?
  • In total, created 7500 new articles and edited
    650 previously existing articles.

8
Why Wikipedia?
  • Critical mass of articles to which and from which
    we could link gene pages
  • Critical mass of editors who were experienced in
    wiki-related issues (fighting vandalism,
    copyediting, governance)
  • Active group of molecular biologists at the MCB
    WikiProject (http//en.wikipedia.org/wiki/WPMCB
    )
  • Alternatives considered
  • Home-built wiki
  • Citizendium (citizendium.org)

9
Gene wiki usage
Current have 9000 gene pages or stubs at
Wikipedia
50 of all edits to gene pages are to
newly-created pages
Gene Wiki pages are highly ranked at Google,
ensuring critical mass of users and editors
10
Positive feedback loop
Gene wiki page utility
1
100
2
200
Number of readers
Number of editors
11
25k gene-specific review articles?
  • Reelin 33 editors, 221 edits since July 2002
  • Heparin 175 editors, 320 edits since June 2003
  • AMPK 44 editors, 84 edits since March 2004
  • RNAi 232 editors, 708 edits since October 2002

Hyperlinks to related concepts
12
Gene Wiki activity
  • Steady (and growing?) edit rate over time

13
Gene Wiki article growth
http//manyeyes.alphaworks.ibm.com/manyeyes/visual
izations/gene-wiki-top-2500-20081114
14
Welcome to the semantic web
  • The main concern with plaintext-on-Wikipedia is
    that it's not an effective way to truly exploit
    the long tail, since you're going to end up with
    this massive plaintext disaster that will require
    human collating (redundant work- just get it
    right the first time).
  • - public-semweb-lifesci mailing list

15
Primary emphases
  • Providing useful content scientists will not
    find or contribute to a wiki unless it is already
    useful
  • Instant feedback wikis allow changes to be
    effective immediately, without approval or
    intermediary (e.g., corrections/additions to
    NCBI/Ensembl?)
  • Emphasis on contributors, not data miners
    emphasize getting data in, not on getting it out,
    since complex protocols encourage
    nonparticipation (e.g., MIAME)
  • Critical mass What will differentiate the Gene
    Wiki from the many other wiki efforts that are
    stagnant?

16
Secondary emphases
  • Reliability and accuracy do open and uncurated
    data models produce trustworthy content?
  • Synergy with existing resource how can the Gene
    Wiki make the growth of traditional annotation
    more efficient?
  • Enabling semantic queries/structure how can we
    structure unstructured content for data mining?
    (Semantic Mediawiki? NLP?)

17
Idealized information flow
Long tail scientific contributions
Direct semantic annotation by scientists
Wikipedia
Semantic structure
NCBI
Ensembl

Authoritative annotation databases
18
Figure to scale?
Long tail scientific contributions
Semantic structure
19
Summary
  • Goal create a complementary resource to existing
    tools, not competitive.
  • Primary emphasis will always be on maximizing
    community participation.
  • How do we structure the unstructured
    contributions?

20
Acknowledgements
Serge Batalov Jason Boyer Jennifer Floyd Yue
Hu Jon Huss Jeff Janes Camilo Orozco Steve
Su Julia Turner Chunlei Wu David Delano James
Goodale Phil McClurg Richard Trager
Faramarz Valafar, SDSU Tim Vickers, Washington
Univ
Michael Cooke Pete Schultz
Funding NIGMS, NIH Novartis Research Foundation
Write a Comment
User Comments (0)
About PowerShow.com