Intersection of Semantic Web and Life Sciences - PowerPoint PPT Presentation

About This Presentation
Title:

Intersection of Semantic Web and Life Sciences

Description:

http://www.genomenewsnetwork.org/articles/05_02/spiderman.php ... 1 4 1 2 1 0. 1 5 1 2 1 1. 1 6 1 2 1 0. XML Example. What is XML? eXtensible Markup Language ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 66
Provided by: KEICH2
Category:

less

Transcript and Presenter's Notes

Title: Intersection of Semantic Web and Life Sciences


1
Intersection of Semantic Web and Life Sciences
  • Kei Cheung
  • Yale Center for Medical Informatics

Genomics and Bioinformatics (MBB 452a), November
2, 2005
2
Outline
  • Introduction
  • Overview of RDF and LSID
  • Semantic web applications
  • Connotea
  • Piggy Bank
  • YeastHub

3
Two scientific/technological endeavors that have
impacted the world greatly in the past 15 years
  • Human Genome Project (HGP)
  • International collaboration that began in 1990
    and completed in 2003
  • Understand the blueprint of life (moon-landing of
    the nineties)
  • Sequence the entire human genome
  • World Wide Web (WWW)
  • It was born in 1989/1990 at CERNS (developed by
    Tim Berners-Lee)
  • Revolutionize information access and sharing over
    the Internet (Gutenbergs printing press)
  • Web browsers (e.g., IE, Netscape, FireFox)

4
Relationship between HGP and WWW
  • HGP transformed life sciences into an information
    science, as large amounts of data have been
    generated, which need to be stored and analyzed
  • GenBank, EMBL, and DDBJ have recently reached a
    milestone of 100 billion bases from gt 165,000
    organisms
  • Pubmed has gt 300,000 articles from gt 150 life
    sciences journals
  • WWW has become the most popular medium for life
    scientists to distribute, access, share, and
    integrate different types of biological data over
    the Internet
  • As of 2005, there are 719 publicly available
    databases listed in NAR molecular biology
    database compilation

5
Spider-Man Spidey science gets a genetic makover
http//www.genomenewsnetwork.org/articles/05_02/sp
iderman.php
6
Spider-Man (Tim Berners-Lee) Weaving the Web
7
Semantic Web
  • "The Semantic Web is an extension of the current
    web in which information is given well-defined
    meaning, better enabling computers and people to
    work in cooperation." -- Tim Berners-Lee, James
    Hendler, Ora Lassila, The Semantic Web,
    Scientific American, May 2001
  • It provides a common framework that allows data
    to be shared and reused across application,
    enterprise, and community boundaries
  • It is based on the Resource Description Framework
    (RDF), which integrates a variety of applications
    using XML for syntax and URIs for naming.

8
Semantic Web for Life Sciences(TBL, Bio-IT World
Conference, May 2005)
  • also the people involved in the Semantic Web
    pushing it along are also excited about getting
    involved in the life sciences its one of those
    areas that affect humankind, finding drugs,
    curing AIDS and cancer, etc. There seems to be a
    huge energy, and lots of practical technical
    reasons why this area is crying out to be one of
    the flagship areas that the Semantic Web really
    takes off

9
Data ? Information ? Knowledge
Navarro JD, Niranjan V, Peri S, Jonnalagadda CK,
Pandey A. (2003) From biological databases to
platforms for biomedical discovery. Trends
Biotechnol. (6)263-8.
10
Problem with the Current WWW
11
Problem with the Current Web
12
Keyword Search regulatory variation mammals
13
Data Heterogeneity
  • Lack of standard detailed description of
    resources
  • Data are exposed in different ways
  • Programmatic interfaces
  • Web forms or pages
  • FTP directory structures
  • Data are presented in different ways
  • Structured text (e.g., tab delimited format and
    XML format)
  • Free text
  • Binary (e.g., images)

14
Data Heterogeneity (contd)
  • Nomenclature problem
  • Gene/protein names (based on phenotype, sequence,
    function, organisms, etc)
  • Armadillo (fruitflies) vs. i-catenin (mice)
  • PSM1 (human) PSM2 (yeast) PSM1 (yeast) PSM2
    (human)
  • Sonic Hedgehog
  • ID proliferation
  • Different ID schemes 1OF1  (PDB ID) and P06478
    (SwissProt ID) correspond to Herpes Thymidine
    Kinase
  • Lexcial variation GO1234, GO1234, GO-1234
  • Synonyms vs. homonyms
  • Dopamine receptor D2 DRD2, DRD-2, D2
  • PSA prostate specific antigen,
    puromycin-sensitive aminopeptidase, psoriatric
    arthritis, pig serum albumin
  • Biologists would rather share their toothbrush
    than a gene name Gene nomenclature is beyond
    redemption, said Michael Ashburner

15
From Web to Semantic Web(contd)
  • Human processing ? Machine processing
  • Use of Metadata
  • Free text description ? ontological description
  • HTML ? XML ? RDF or its extensions
  • Vision ? implementation

16
HTML Example
Readme
1 1 0 0 1 1 1 2 0 0 2 0 1 3
1 2 2 0 1 4 1 2 1 0 1 5 1
2 1 1 1 6 1 2 1 0
17
XML Example
What is XML?
  • eXtensible Markup Language
  • It is self describing
  • It is hierarchical
  • It is human- and computer-readable
  • It is a World Wide Web Consortium (W3C) standard
  • It can be validated using DTD or XSchema
  • There is a large software base support

18
Proliferation of Bio-XML Formats
Reasoning (machine intelligence)
19
XML Representation of Proteomics Data
AGML
HUP-ML
20
RDF Representation
21
Resource Description Framework (RDF)
  • It is a standard data model (directed acyclic
    graph) for representing information (metadata)
    about resources in the World Wide Web
  • In general, it can be used to represent
    information about things that can be identified
    (using URIs) on the Web
  • It is intended to provide a simple way to make
    statements (descriptions) about Web resources

22
RDF Statement
  • A RDF statement consists of
  • Subject resource identified by a URI
  • Predicate property (as defined in a name space
    identified by a URI)
  • Object property value (literal) or a resource

For example, the dbSNP Website is a subject,
creator is a predicate, NCBI is an object. A
resource can be described by multiple statements.
23
Graphical XML Representation
http//www.ncbi.nlm.nih.gov/SNP
http//purl.org/dc/elements/1.1/creator
http//purl.org/dc/elements/1.1/language
http//www.ncbi.nlm.nih.gov
en
lt?xml version"1.0"?gt ltrdfRDF
xmlnsrdfhttp//www.w3.org/1999/02/22-rdf-syntax
-ns xmlnsdchttp//purl.org/dc/elements/1.1
xmlnsexhttp//www.example.org/termsgt ltrdfDe
scription abouthttp//www.ncbi.nlm.nih.gov/SNPgt
ltdccreator rdfresourcehttp//www.ncbi.nlm.nih
.govgtlt/dccreatorgt ltdclanguagegtenlt/dclanguagegt
dategt lt/rdfDescriptiongt lt/rdfRDFgt
24
Life Sciences Identifiers (LSIDs)
  • URL vs. URI vs. URN
  • URL http//www.gleaners.org/faq.html
  • URI http//www.gleaners.org/faq.htmlQ04
  • URN www.gleaners.org/faq.htmlQ04
  • LSID is a form of URN

25
Problems of URIs
  • The web server referenced by the URL may be
    broken or become unavailable
  • The syntax of the URL may change over time as the
    underlying data retrieval program evolves
  • The data returned by a URL may change over time
    as the underlying database contents change.

26
LISD Format and Examples
  • URNLSIDnamespacedatabaseobject_idrevision_id
  • Examples
  • URNLSIDncbi.nlm.nih.govgenbankAF271072'
  • URNLSIDchemacx.cambridgesoft.comACXCAS9675821

27
LSID (contd)
  • Globalness A LSID is a name with global scope
    that does not imply a location. It has the same
    meaning everywhere.
  • Uniqueness The same LSID will never be assigned
    to two different objects.
  • Persistence It is intended that the lifetime of
    an LSID be permanent.
  • Scalability LSIDs can be assigned to any data
    element that might conceivably be available on
    the network, for hundreds of years.
  • Legacy Support The LSID naming scheme must
    permit the support of existing legacy naming
    systems
  • Extensibility Any scheme for LSIDs must permit
    future extensions to the scheme.
  • Independence It is solely the responsibility of
    a name issuing authority to determine conditions
    under which it will issue a name.
  • Resolution A URN will not impede resolution
    i.e., translation to a URL..."

28
Semantic Web Applications
  • Connotea (on-line management of web resources)
  • Piggy bank (semantic web browser)
  • YeastHub yeast genome data integration

29
(No Transcript)
30
Connotea Online Reference Management Service
(Nature Publishing Group)www.connotea.org
  • To keep links to the articles/websites of your
    interest
  • To discover new articles and websites through
    sharing your links with other users
  • It is web-accessible

31
TBLs original vision of the Web
  • Active vs. passive
  • Collaborative vs. authoritative
  • Decentralized vs. centralized
  • Semantic vs. syntactic

32
Connotea Online Reference Management Service
(Nature Publishing Group)
33
ALFRED Population Sample
34
Connotea (ALFRED Example)
35
ALFRED Example
36
Google Earth Example
37
Data Integration Using RDF
atagccgtacctgcgagtctagaagct
humanhemoglobin
derives from
atagccgtacctgcgagtctagaagct
GenBank
derives from

humanhemoglobin
oxygentransportprotein
humanhemoglobin
oxygentransportprotein
is a
is a
Gene Ontology

has 3D structure
humanhemoglobin
has 3D structure
Unified view
Protein Data Bank
38
Piggy Bank
  • http//simile.mit.edu/piggy-bank
  • It is an extension to the Firebox Web browser
  • It turns the Firebox Web browser into a Semantic
    Web browser
  • It supports tagging and links to Google Map

39
RDF is the Common Currency
40
Peggy Bank (Data Integration Example)
TRIPLES (Expr. Data)
HubMed
Keyword search
D2RQ
RDF Expr. Dataset
RDF Bib.. Info.
import
import
Pluggin
Browse/ query
41
TRIPLES Expression Data in RDF
42
Peggy Bank (PIM1 Gene)
43
Semantic Bank
44
Yeast Hub
45
Yeast Hub Team
Kei Cheung
Mark Gerstein
Andrew Smith
Kevin Yip
Andy Masiar
Remko deKnikker
46
RDF Technologies
  • Description of data source using Rich Site
    Summary (RSS)
  • Data Conversion into RDF
  • Relational Database to RDF (D2RQ)
  • Tabular-RDF-Conversion
  • RDF Database (Sesame)
  • RDF-based query languages

47
Rich Site Summary (RSS)
User (Application)
Yeast Hub Resource
No RSS
No RSS
RSS
RSS
Resources
48
Resource Description(Use of Dublin Core Metadata)
49
RDF Metadata Example (RSS1.0)
50
Data Conversion and Integration
51
RDF Modeling of Tabular Data
52
Tabular-RDF Data Conversion
53
Example of Data Converted into RDF
54
Motivating Example
  • Genomic analysis of essentiality within protein
    networks.
  • H Yu, D Greenbaum, H Xin Lu, X Zhu, M Gerstein
    (2004) Trends Genet 20 227-31.
  • Jeong, H., Mason, S., Barabási, A.-L., and
    Oltvai, Z. 2001. Lethality and centrality in
    protein networks. Nature 411 4142
  • Fraser, H., Hirsh, A., Steinmetz, L., Scharfe,
    C., and Feldman, M. 2002. Evolutionary rate in
    the protein interaction network. Science 296
    750752
  • Important but hard

55
Example Integrated Query
56
Query Form
57
RQL Syntax and Query Results
58
Next step Data Mining
  • Whole yeast genome analysis (Y6K)
  • Subcellular localization of the yeast proteome.
  • A Kumar, S Agarwal, JA Heyman, S Matson, M
    Heidtman, S Piccirillo, L Umansky, A Drawid, R
    Jansen, Y Liu, KH Cheung, P Miller, M Gerstein,
    GS Roeder, M Snyder (2002) Genes Dev 16 707-19.
  • A Bayesian system integrating expression data
    with sequence patterns for localizing proteins
    comprehensive application to the yeast genome.
  • A Drawid, M Gerstein (2000) J Mol Biol 301
    1059-75.
  • Doing systematic dataming to predict the remining
    3K localizations
  • Important but hard .

59
Once the web has been sufficiently "populated"
with rich metadata, what can we expect? First,
searching on the web will become easier as search
engines have more information available, and thus
searching can be more focused. Doors will also be
opened for automated software agents to roam the
web, looking for information for us or
transacting business on our behalf. The web of
today, the vast unstructured mass of information,
may in the future be transformed into something
more manageable - and thus something far more
useful. (Ora Lassila)
60
Automate humanely!
  • No amount of automation will replace human
    beings, but clumsy and belligerent automation
    will alienate them and suppress their
    creativity.
  • (Tony Kazic)

61
Thanks!Questions?
62
Semantic Graph
Find the most current image of Kei Cheung who is
affiliated with YCMI
affiliated with
Kei Cheung
YCMI
images
Files
member of
member of
File n
File 1
date
date
Feb 1, 2005
Oct 1, 1990
63
Research/Technologies Related to Semantic Web
  • Text mining
  • Agent computing
  • Web services
  • Ontological research

64
Knowledge representation
Jill
  • A person (Joe) is an uncle iff
  • Joe is male
  • He has a parent (Jill) who has a second child
    (Sue) who is parent

has_child
has_parent
Sue
Joe
has_child
?
65
Other things to mention?
  • Taxonomy vs. ontology
  • OWL overview and example(s)
Write a Comment
User Comments (0)
About PowerShow.com