Title: Research Problems in Semantic Web Search
1Research Problems in Semantic Web Search
____________________________
2Agenda
____________________________
- Introduction
- Swoogle
- Swoogles Competition
- Sindice
- Semantic Web Search Engine (SWSE)
- Watson
- Falcon
- Research Problems and Issues with Swoogle
- References
3Introduction
____________________________
Web
Your Agent
Dr.Finins FOAF Profile
Possible because Data is in machine
understandable form like RDF, OWL But how will
agent find all this data ? Search Engines ?
4Introduction
____________________________
Traditional Search Engine Results
Semantic Web Search Engine Results
5Swoogle
____________________________
- Swoogle is a crawler based indexing and retrieval
system for Semantic Web - Swoogle crawls and discovers documents written in
RDF,OWL - Swoogle classifies a Semantic Web Document(SWD)
as - Semantic Web Ontology (SWO) Defines new terms
- Semantic Web Databases (SWDB) Makes assertions
about individuals
6Swoogle
____________________________
7Swoogle Architecture
____________________________
8Swoogle Architecture
____________________________
- SWD Discovery Component
- Google crawler using the Google web service
- Filetypes with extensions .rdf, .owl, .n3
- Google limits only 1000 results per query
- A focussed crawler
- Crawls documents within a given website
- Extension and Focus constraints
- A Swoogle crawler
- Jena based crawler
- Explores Semantic Links between SWDs
9Swoogle Architecture
____________________________
- Metadata Creation
- Basic Metadata
- Encoding RDF/XML, N-Triple, N3
- Language RDF, RDFS, OWL, DAML OIL
- OWL Species OWL-LITE, OWL-DL, OWL-FULL
- Relations among SWDs
- Reference relationship among SWDs
- Inter ontology relationships
10Swoogle Architecture
____________________________
- Data analysis component
- Classification of SWD as SWO or SWDB
- Compute rank of SWD
- Web based interface
- Human User Interface http//swoogle.umbc.edu
- Web Services using REST interface
- Agent Service
11Sindice
____________________________
- Created at Digital Enterprise Research Institute
(DERI) - Key features of Sindice include
- Sindice collects SWDs and indexes them on
resource URIs, Inverse Functional
Properties(IFPs) and keywords - Sindice uses the Hadoop parallel architecture
12Sindice
____________________________
- Inverse Functional Property (IFP) An OWL
cardinality restriction - Sincdice uses three indexes
- URI index
- IFP index
- Keyword index
- Benefits - Faster retrieval of data
13Sindice
____________________________
- Hadoop architecture is used in the following
manner - Sindice employs Hadoop/Nutch to distribute
crawling job across multiple machines - Collected data is stored in the Hbase distributed
column based store - Efficient handling of large datasets across the
cluster using a MapReduce implementation
14Sindice
____________________________
15SWSE
____________________________
- Semantic Web Search Engine (SWSE) is also a
Semantic Web Search Engine created at Digital
Enterprise Research Institute (DERI) - SWSE uses a Multicrawler a pipelined
architecture for crawling
16Watson
____________________________
- Created at Knowledge Management Institute at the
UK Open University - Major Design Principles
- Considers explicit and implicit relations between
Ontologies - Ranking of Ontologies with focus on quality over
popularity
17Watson
____________________________
18Falcon
____________________________
- Falcon is a Semantic Web Search engine created at
the Institute of Web Science in China - Falcon allows keyword based queries on
- Objects
- Concepts
- Documents
- Falcon performs class subsumption reasoning
19Falcon
____________________________
20Summary
____________________________
- Sindice
- Indexes on URI, IFP, keywords
- Use of Hadoop Architecture
- SWSE
- Pipelined Architecture for Crawling
- Watson
- Implicit relations between SWDs
- Falcon
- Class Subsumption Reasoning
- Keyword based search
- Searches Ontologies and Instance Data
21Issues
____________________________
- Crawling
- Swoogles crawler is running as a single thread
on one machine - Limits the number of SWDs dicovered and revisted
- Possible Solutions
- Use of Hadoop Architecture
- Use of Grub
22Other Issues
____________________________
- Crawling large structured Datasets like DBPedia
- More reasoning
- More services
23References
____________________________
- Li Ding et al., "Swoogle A Search and Metadata
Engine for the Semantic Web", Proceedings of the
Thirteenth ACM Conference on Information and
Knowledge Management, November 2004. - P. Mika, G. Tummarello Web Semantics in the
Clouds, IEEE Intelligent Systems, Volume 23 ,
Issue 5 (September 2008) - E. Oren, R.Delbru, M. Catasta, R. Cyganiak, H.
Stenzhorn, G. - Tummarello Sindice.com A document-oriented
lookup index for open linked data. In
International Journal of Metadata, Semantics and
Ontologies, 3(1), 2008. - Mathieu dAquin et al., Watson A Gateway for
the Semantic Web ,Poster session of the European
Semantic Web Conference, ESWC 2007 - Gong Cheng, Weiyi Ge, Honghan Wu, Yuzhong Qu ,
Searching Semantic Web Objects Based on Class
Hierarchies In WWW 2008 Workshop on Linked Data
on the Web, 2008
24Questions?
____________________________