Title: Uncertainty reasoning for Linked Data
1Uncertainty reasoning for Linked Data
2Uncertainty reasoning for linked data
- Linked data - a strikingly successful model for
exploiting semantic web technology - exhibits uncertainty related issues ambiguity,
misalignment, reliability - what approach could we take address this?
- without losing the simplicity which has enabled
significant adoption
3Linked data
- Use URIs as names for things
- Use HTTP URIs so that people can look up those
names. - When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL) - Include links to other URIs. so that they can
discover more things
4Uncertainty in linked data1. Misalignment of
instance matches
- link datasets by resolving co-references and
publishing links - links published as owlsameAs (all or nothing)
- match errors
- match uncertainties not accessible
- erroneous assumptions (e.g. clinical trial
example) - can partly address by use of skos mapping
vocabulary
5Uncertainty in linked data2. Ambiguity from
merging datasets
- datasets have different assumptions, definitions,
context (esp. time) for different measures - leads to multiple different values
- E.g.
- lthttp//dbpedia.org/resource/Londongt
dbopopulationMetro 12300000 dbppopulationMet
ro 12,300,000 to 13,945,000 dbopopulationTota
l 7556900owlsameAs lthttp//www.okkam.org/ens/i
d...gt. - lthttp//www.okkam.org/ens/id...gt population
7421209.
6Uncertainty in linked data3. Other issues
- Misalignment of models
- e.g. freebase/dbpedia links generated (temporary)
problems Musician owlequivalentClass Person - Source reliability
- not unique to linked data but amplifies it
7Mitigation approaches?1. Weighted link vocabulary
- Develop a simple, common vocabulary for
expressing uncertain co-reference links - Clients or intermediates can choose how to match
the link evidence to equivalence assertions
voidLinkSet
a urWeightedLink urtarget ltgt
urmatch ltgt urweight 0.7
a urUncertainLinkSet urmatchAlorithm
algJaroStringMatch .
8Mitigation approaches?2. Imprecise value
vocabulary
- Develop a simple, common vocabulary for
expressing imprecise values that can arise from
known measurement uncertainty or merge ambiguity
London population a urImpreciseValue sa
mpleValue value 7556900 source dbpedia
context year2009 sampleValue value
7421209 source okkam context
year2008 estimatedValue 7500000 .
9Mitigation approaches?3. Override graphs
- Allow clients to chose which parts of merged data
sources they adopt (trust) and publish that
decision - Allow clients to publish deltas to public
datasets correcting merge or other artefacts
per-link and per-assertion granularity
voidDataSet
urargGraph
voidDataSet
urComputedDataSet
urCombinator
urDifference
Union
10Conclusion
- multiple issues in ambiguity and uncertainty in
linked data - proposed problems and solutions illustrative
rather than definitive - low hanging fruit
- area ripe for contribution