Title: SWSE: Semantic Web Search Engine
1SWSESemantic Web Search Engine
SWSE is a web search application that provides
answers before links, allowing casual users find
the exact information they desire with minimal
effort. Current search engines do not exploit
available structured data, and mainly index flat
text documents. They do not allow for complex
queries to be posed, such as give me all
co-authors of Tim Berners-Lee or show me all of
the pictures of people I know. SWSE exploits
existing and emerging structured data to index
and provide search and browsing over a large
corpus. Currently the corpus consists of data
retrieved and converted to RDF from large
repositories (such as DBLP, CiteSeer, DMOZ, IMDb,
SwissProt, Wikipedia etc.) and also from the web
(such as HTML, XML i.e. RSS and Podcasts etc. and
RDF i.e. FOAF, DOAP, SIOC etc.). Currently we
index over 90 GB of raw data, summating to about
700M quads or statements. Data is cleansed,
converted and merged through use of object
consolidation. SWSE utilises YARS2 (Yet Another
RDF Store) to index the data retrieved. YARS2 is
a scalable, distributed store for RDF, and offers
keyword query and complex graph based query
functionality through a HTTP interface. The SWSE
user interface boasts a compact and intuitive
user interaction model (modelled to exploit
users experience in using current web search
interfaces) to allow casual users find the exact
information they require with minimal effort.
Users begin by specifying a keyword query, and
from there can incrementally build complex
queries using guided exploration provided by the
nodebrowser compass which ensures results at
each step. Results are initially serialised in a
ranked results listing. Each result can then be
clicked to retrieve its details. Alternatively,
browsing of the data graph is possible to explore
social networks and, more generally traverse the
information through available relationships
between resources.
- Data retrieval via download of large repositories
or crawling the web
- Data converted to RDF using XSLT or specialised
conversion code, with ontologies created to
represent new schema - Object consolidation performed on dataset to
merge data from different sources on equivalent
instances
- Data stored in a distributed YARS2 installation
on 16 servers - YARS2 stores RDF in quads, an augmentation of
N-Triples adding context to make sources of data
traceable. - YARS2 uses Lucene to support keyword queries and
stores data on-disk in sorted, compressed,
blocked files to support complex queries. - YARS2 uses an in-memory sparse referencing index
to accelerate data access from on-disk files. - Queries can be posed via a HTTP interface. YARS2
supports a subset of SPARQL querying.
- The SWSE user interface takes user queries and
converts them to queries answerable by YARS2. - The UI then retrieves the results and serialises
them to RDF/XML. - An XSLT stylesheet is provided to the client to
transform the RDF/XML results to a HTML
serialisation. - Initially a user is offered a keyword query to
begin. - A user can click on any result to get a details
view, giving all the available info for a result. - One can also restrict the results by the type of
result they are looking for, e.g. Person, Movie,
Document, Publication etc. - Once results have been restricted to a particular
type, or if details view has been selected, a
list of available inlinking and outlinking
relationships are offered for navigation of the
results graph.
Search and explore todays Semantic Web at
http//swse.deri.org/
DERI Galway Andreas Harth Aidan Hogan Jürgen
Umbrich
P 353 91 495006 andreas.harth_at_deri.org aidan.hog
an_at_deri.org