Title: Adding Semantics to Social Websites for Citizen Science
1Adding Semantics to Social Websites for Citizen
Science
- Pranam Kolari
- University of Maryland,Baltimore County
- Joint work with Andriy Parafiynyk, Tim Finin,
Cynthia Parr, Joel Sachs, and Lushan Han - http//ebiquity.umbc.edu/paper/html/id/365
? http//creativecommons.org/licenses/by-nc-sa/2.0
/ This work was partially supported by DARPA
contract F30602-97-1-0215, NSF grants CCR007080
and IIS9875433
2This talk
- Motivation
- Swoogle Semantic Websearch engine
- Social Semantic Web
- Conclusions
3SOCIAL MEDIA
- Social media describes the online technologies
and practices that people use to share opinions,
insights, experiences, and perspectives and
engage with each other.
Wikipedia 07
4Social Media for agents
- Today social media supports information sharing
among communities of people - enables Citizen
Journalism - An infrastructure based on pings, feeds, content
aggregators, and filters (e.g. pipes) aids
scalability - Social media now accounts for 1/3 of new Web
content! - We need to explore how networks of agents can use
the same strategies to share data and knowledge
5This talk
- Motivation
- Swoogle Semantic Websearch engine
- Social Semantic Web
- Conclusions
6Google has made us smarter
7But what about our agents?
- Agents still have a very minimal understanding of
text and images.
8But what about our agents?
- A Google for knowledge on the Semantic Web is
needed by software agents and programs
9- http//swoogle.umbc.edu/
- Running since summer 2004
- 2.2M RDF docs, 434M triples, 10K ontologies,15K
namespaces, 1.5M classes, 185K properties, 49M
instances, 800 registered users
10Swoogle Architecture
Analysis
Ranking
SWD classifier
Index
Search Services
Semantic Web metadata
IR Indexer
Web Service
Web Server
SWD Indexer
html
rdf/xml
Discovery
the Web
document cache
SwoogleBot
Semantic Web
Candidate URLs
Bounded Web Crawler Google Crawler
Archive
human
machine
pings
Information flow
Swoogles web interface
11Applications and use cases
- Supporting Semantic Web developers
- Ontology designers, vocabulary discovery, whos
using my ontologies or data?, use analysis,
errors, statistics, etc. - Searching specialized collections
- Spire aggregating observations and data from
biologists - InferenceWeb searching over and enhancing proofs
- SemNews Text Meaning of news stories
- Supporting SW tools
- Triple shop finding data for SPARQL queries
1
2
3
122
- An NSF ITR collaborative project with
- University of Maryland, Baltimore County
- University of Maryland, College Park
- U. Of California, Davis
- Rocky Mountain Biological Laboratory
13An invasive species scenario
- Nile Tilapia fish have been found in a California
lake. - Can this invasive species thrive in this
environment? - If so, what will be the likelyconsequences for
theecology? - Sowe need to understandthe effects of
introducingthis fish into the food webof a
typical California lake
14Food Webs
- A food web models the trophic (feeding)
relationships between organisms in an ecology - Food web simulators explore consequences of
ecological changes, i.e., species introduction or
removal - Food web are constructed from studies of a
locations species inventory and the known
trophic relations. - Goal automatically construct a food web for a
new species using existing data and knowledge - ELVIS Ecosystem Location Visualization and
Information System
15East River Valley Trophic Web
http//www.foodwebs.org/
16The problem
- We have data on what species are known to be in
the location and can further restrict and fill in
with other ecological models - gt Maybe we can mine social media for species
observations data? - But we dont know which of these the Nile Tilapia
eats of who might eat it. - We can reason from taxonomic data (similar
species) and known natural history data (size,
mass, habitat, etc.) to fill in the gaps.
17Food Web Constructor
- Predict food web links using database and
taxonomic reasoning.
In an new estuary, Nile Tilapia could compete
with ostracods (green) to eat algae. Predators
(red) and prey (blue) of ostracods may be affected
18Status
- ELVIS (Ecosystem Location Visualization and
Information System) as an integrated set of web
services for constructing food webs for a given
location. - Background ontologies
- SpireEcoConcepts concepts and properties to
represent food webs, and ELVIS related tasks,
inputs and outputs - ETHAN (Evolutionary Trees and Natural History)
Concepts and properties for natural history
information on species derived from data in the
Animal diversity web and other taxonomic sources.
250K classes on plants and animals
19This talk
- Motivation
- Swoogle Semantic Websearch engine
- Social Semantic Web
- Conclusions
20- Social media sites have become thebiggest source
of new content on the Web - Blogs, Wikis, Photo sites, forums, etc.
- Accounting for 1/3 of new Web content
21- Social media sites embrace new ways of letting
users add semantic information - Shows users the potential of semantics
- This graph shows the uptake of tags in blogs
22Social Media and the Semantic Web
- Many are exploring how Semantic Web technology
can work with social media - Social media like blogs are typically temporally
organized - valued for their timely and dynamic information!
- If static pages form the Webs long term memory,
then the Blogosphere is its stream of
consciousness - Maybe we can (1) help people publish data in RDF
on their blogs, (2) mine social media sites for
useful information, (3) exploit new
infrastructure ideas for sharing Semantic Web
data.
23A BioBlitz involves going out to an area and
recording every organism you see
The OWL icon links to the data in RDF
24Heres the posts RDF data
25A good Semantic Web opportunity
- We want to make it easy for scientists to enter
and collect information from social media - Professionals, students and amateurs!
- Some early examples
- SPOTter a tool to add Semantic Web data to
blogs - Splickr a system to mine Flickr for images of
organisms - RDF123 an application and Web service to render
spreadsheets as RDF data
26SPOTter SPire Observation Tool
- Weve developed some simple components to help
people add RDF data to blogs and ping Swoogle to
get it indexed. - SPOTter is an initial prototype that uses the
ETHAN ontology and is being used in some BioBlitz
activities with students. - Were working toward a version that uses Twitter
so that people can make the blog entries from the
cell phones via SMS - The SPOTter agent will get the entries (via RSS)
and index the data
27SPOTter button
Once entered, the data isembedded into the blog
postand Swoogle is pinged to index it
28- We can draw a bounding box onthe map and find
observations - An RSS feed provided for eachquery
Prototype SPOTter Search engine
29Flickr
- The Flickr photo sharing site has millions of
photographs - Many of plants and animals
- Most of them have descriptions, timestamps, tags
and even geo-tags - Flickr has even introduced machine tags that
can be mapped into RDF - Any Flickr users (humans or bots) can add
comments and annotations - Theres a good API
- It could be a good source of ecological
information
30(No Transcript)
31(No Transcript)
32Results for people and machines
33RDF123
- An application and web service to generate RDF
data from spreadsheets
Graphically create edit spreadsheet to RDF map
MAP
map spreadsheet gt RDF data
CSV or Googledoc
Some metadata can Be embedded in spreadsheet
See http//ebiquity.umbc.edu/project/html/id/82/
34RDF123
- The Bioblitz project needed a way to collect and
share observational data from students - Spreadsheets selected as a common data format and
templates developed - RDF123 application and web service developed to
ease exporting the data as RDF for a Maryland
BioBlitz group - Supports a web service to generate RDF given URLs
for the sheet and map - Works on CSV files and also Google spreadsheets
35A map provides a template for an RDF subgraph for
each row
36The map is also represented in RDF
37Heres the RDF thats produced from the
spreadsheet
38Metadata, including the URI of a map, can be
embedded in the spreadsheet
39Ping and Feed Design Pattern
- The Web uses a ping and feed design pattern that
is a variant of publish and subscribe - It accounts for the scalable, smooth function of
the Blogosphere and related social media systems - Pings push and feeds pull
- We can use the same approach to managing volumes
of Semantic Web data
40Pings and Feeds in the Blogosphere
- Content provider send pings to ping servers when
they have a new item - Ping servers aggregate pings and stream them to
aggregators and indexers, like Google - Indexing sites retrieve new items from content
providers feed
C1
PingServer
Search Engine
C2
C3
41Pings and Feeds in the Semantic Web
- Content provider send pings to ping-the-semantic-w
eb when they have new RDF data - PTSW aggregates pings and streams them to SW
aggregators and indexers, like Swoogle - Indexing sites retrieve new RDF data from content
providers feed
C1
PTSW
Swoogle
C2
C3
42Semantic Web Feeds drive Mashups
- As in the regular web, sites and query engines
use feeds to capture queries - Accessing a feed runs the query and produces a
list of the first N results (usually 10 N 20) - Such query feeds can drive mashups
- Systems like Yahoo pipes make it easy to compose
feeds
43This talk
- Motivation
- Swoogle Semantic Websearch engine
- Social Semantic Web
- Conclusions
44Conclusion
- The web will contain the worlds knowledge in
forms accessible to people and computers - We need better ways to discover, index, search
and reason over SW knowledge - SW search engines address different tasks than
html search engines - So they require different techniques and APIs
- Swoogle like systems can help create consensus
ontologies and foster best practices - Social media provide new challenges and
opportunities for the Semantic Web
45For more information
http//ebiquity.umbc.edu/
Annotatedin OWL