Title: Het Web als wetenschapsversneller
1Het Web als wetenschapsversneller
- Frank van Harmelen
- Vrije Universiteit Amsterdam
Data wants to be free
2Versneller?
3Outline
- Stuff you all know The scientists problem
- The general idea a Web of Data
- What must be done to realise this
- How far away is this
- Why is this relevant for you
- Nex steps, dos, donts
4The Scientists Problem
- Too much unintegrated data
- from a variety of incompatible sources
- no standard naming convention
- each with a custom browsing and querying
mechanism (no common interface) - and poor interaction with other data sources
Esther Jansma
Henk den Heijer
5What are the Data Sources?
- Flat Files
- URLs
- Proprietary Databases
- Public Databases
- Spreadsheets
- Emails
-
Data wants to be free
Ewoud Sanders?
6In which disciplines?
One dataset per site
a new database each month
- Archeology
- Chemistry
- Genomics, proteomics, ... (bio/life-sciences)
- Communication science
- Social history
- Linguistics
- Bio-diversity
- Environmental sciences (climate studies)
- ....
- libraries (KB), archives (beeldgeluid)
Willem Bouten
historical data
laymen data
laymen data
international data
7Outline
- The general idea a Web of Data
- What must be done to realise this
- How far away is this
- Why is this relevant for you
- Nex steps, dos, donts
8Impact of the Web
- The Web has changed the way
- how we read the news
- how we shop
- how we interact with friends and family
- how we search and find information
- ... how we do science ?
- accessing literature, yes, but
- doing science?
9The Web of Data
- a.k.a. the "Semantic Web (TBL)
- recipeexpose databases on the web, use
standard data-formats, integrate - meta-data from
- expressing DB schema semantics in machine
interpretable ways - enable integration and unexpected re-use
10Een korte geschiedenis van het WWW
- Web 1.0 netwerk van plaatjes en tekst
- Web 2.0 netwerk van communities
- Web 3.0 netwerk van data
door mensen, voor mensen
door groepen mensen, voor groepen mensen
door computers, voor computers, nuttig voor
mensen
11The Current Web of text and pictures
The Future Web of Data
and another web page about Frank
This page is about the Vrije Uniersitei
a web page in English about Frank
And this page is about LarKC
And this page is about Stefano
Data wants to be free
?
?
?
linked web-pages, written by people, written
for people, used only by people...
?
?
Many of these pages already come from data, that
is usable by computers!
linked data, usable by computers! useful for
people!
But we cant link the data....
12Outline
- The general idea a Web of Data
- What must be done to realise this
- How far away is this
- Why is this relevant for you
- Nex steps, dos, donts
13machine accessible meaning (What its like
to be a machine)
META-DATA
14What is meta-data?
- it's just data
- it's data describing other data
- its' meant for machine consumption
15Required are
- one or more shared vocabularies
- so data producers and data consumers all speak
the same language - a standard syntax
- so meta-data can be recognised as such
- lots of resources with meta-data attached
- mechanisms for attribution and trust
161. Shared vocabularies
BioMed
- Mesh
- Medical Subject Headings, National Library of
Medicine - 22.000 descriptions
- EMTREE
- Commercial Elsevier, Drugs and diseases
- 45.000 terms, 190.000 synonyms
- UMLS
- Integrates 100 different vocabularies
- SNOMED
- 200.000 concepts, College of American
Pathologists - Gene Ontology
- 15.000 terms in molecular biology
- NCBI Cancer Ontology
- 17,000 classes (about 1M definitions)
172. A standard syntax
Semantic Web data model RDF
things relations between things
18RDF Triples in Life Sciences
19Web of Data anybody can say anything about
anything
- All identifiers are URL's ( on the Web)
- Allows total decoupling of
- data
- vocabulary
- meta-data
Data wants to be free
ltxgt IsOfType ltTgt
x
T
ltprincegt
20Outline
- The general idea a Web of Data
- What must be done to realise this
- How far away is this
- Why is this relevant for you
- Nex steps, dos, donts
21How far away is this ?
- Stable data formats
- Lots of shared vocabularies ( ways to convert
them) - Lots of data sources( ways to convert them)
- Lots of tools
- convert, construct, edit (data, vocabularies)
- store, search, query, reason
- interlink
- visualise
- ...
22How far away is this ?
every book sold by Amazon
rapidly growing Linked Open Data cloud.
already many billions of facts rules
any CD ever recorded (almost)
life-science databases
hierarchical dictionaries (UK, FR, NL)
basic facts on every country on the planet
common sense rules facts (100.000s)
scientific bibliographies
names of artists art works (10.000s)
Geographic names (millions)
Encyclopedia
It gets bigger every month
23Outline
- The general idea a Web of Data
- What must be done to realise this
- How far away is this
- Why is this relevant for you
- Nex steps, dos, donts
24Next steps
Can you get famous by sharing data?
- hunt for shared vocabularies
- try to avoid building them
- wrap legacy data sources
- your own
- from others
- link wrapped sources
- publish linked data on the web
- make noise
- reconstruct some old results
- discover new results
- get famous
in-use systems in communication science, KB,
Beeld Geluid, Europeana
papers in oncology, in communication science,
dedicated conferences in chemistry,
earth-sciences, life-sciences, humanities
funding opportunities in humanities, social
sciences, life sciences
25Vb communicatie wetenschappen
- Lees digitale kranten
- Annoteer (wie zei wat over wie)
- triples, RDF (supercomputer ipv studenten)
- Sla annotaties op in RDF
- Publiceer op het Web
- Integreer met andere datasets
- Nationale studies mogelijk op veel grotere
datasets - Internationaal vergelijkende studies mogelijk
26Scenario wetenschapsdynamica
- Nu citatie-patronen, co-auteur netwerken
- Maar datasets klein en niet representatief
- Wetenschappers doen meer dan publiceren en
citeren -
- Oogst datasets van
- Fondsen (EU, NSF) (NWO?)
- Conferenties, programma Cies
- Email lijsten, blogs, twitter
- Vindt actuelere en accuratere patronen
In RDF Integratie, Semantische analyse
27Dus
- Er zijn uniforme data-modellen
- Er zijn overkoepelende vocabulaires
- Er is data-publicatie technologie
- Er zijn tools voor
- Opslag
- Visualisatie
- Query
Data wants to be free
28Vragen discussie
- Frank.van.Harmelen_at_cs.vu.nl
- http//www.cs.vu.nl/frankh/popularising.html