Title: Semantic Search Engines
1Semantic Search Engines On the Way to Web 3.0
- ????? ????? ???????
- Web ???? ?-3.0
????? ???? ????? ????? ????? ??????????
??-???? ariel_at_cs.biu.ac.il
2Contents
- Web 3.0 Semantic Search
- General Search
- "Natural Language" Search
- Vertical Search
- "Social Networking" Search
- Personalized Search
3What is Web 2.0?!
, Open Gardens blog, Ajit Jaokar http//opengarden
sblog.futuretext.com/archives/2005/12/mobile_Web_2
0_w.html
4The good, the bad and the
5Web 1.0, Web 2.0, Web 3.0, Web X.0
6Semantic Search
- Syntactic search can match the query against
- index of the textual content of the resources
- URIs (URLs, URNs) in the system
- literals in the RDF metadata
- or a combination of these, possibly using
- Exact, prefix or substring match, stemming,
minimal edit distance - Semantic search in addition to syntactic
search, can use - index of the meaning of sentences in each
resource - semantic information and analysis
- the graph structure of RDF metadata
- or a combination of these, possibly using
- query expansion, classification/categorization,
tagging, graph traversal, microformats, RDF OWL
inferencing and reasoning
7Can Semantic SEs answer this -?)
8Types/Examples of Semantic SEs
- General Search
- MetaWeb Freebase, Yahoo! Microsearch,
- "Natural Language" Search
- Powerset, Hakia, AskMeNow AskWiki,
- Vertical Search
- Kango, AdaptiveBlue, ReportLinker,
- "Social Networking" Search
- SemantiNet, Delver, Google Social Graph API,
- Personalized Search
- Twine, MavinIT PSS,
9Contents
- Web 3.0 Semantic Search
- General Search
- "Natural Language" Search
- Vertical Search
- "Social Networking" Search
- Personalized Search
10MetaWeb Technologies - Freebase
- Based in San Francisco, MetaWeb Technologies was
spun out of Applied Minds in July 2005. - Goal build a better infrastructure for the Web
application developers and publishers.
11Freebase Rational
- Open, shared database of the worlds knowledge
that collects data from the Web to build a
massive, collaboratively-edited database of
cross-linked data. - It is built by the community, for the community.
- Free for anyone to query, contribute to, build
applications on top of, or integrate into their
Web sites. - Focus is on organizing and managing complex data
structures by use of Semantic Web technologies. - Enables extraction of ordered knowledge out of
the information chaos that is the current Web.
12 Freebase
13Freebase Repository
- Covers millions of topics in hundreds of
categories. - Draws from large open repositories like
Wikipedia, MusicBrainz, and the SEC archives. - Contains structured information on many popular
topics, like movies, music, people and locations
all reconciled and freely available via an open
API. - Freebase information is supplemented by the
efforts of a passionate global community of
users, who are working together to add structured
information on everything relevant.
14Domains and Types
15 Google Company
16Freebase Help Center
17Freebase Semantics
- Freebase spans domains, but requires that a
particular topic exist only once, even if it
might normally be found in multiple databases. - For example, Arnold Schwarzenegger would appear
in a movie database as an actor, a political
database as a governor and a bodybuilder
database as a Mr. Universe. - In Freebase, there is only one topic for Arnold
Schwarzenegger, with all three facets of his
public persona brought together. - The unified topic acts as an information hub,
making it easy to find and contribute
information about him.
18 Arnold Schwarzenegger (1)
19Arnold Schwarzenegger (2)
20Freebase Dynamics
- If the user is a developer, or just mildly
technical, Freebase offers tools that make it
easy to query and integrate the data into Web
applications, blogs, wikis, user pages or
anything else that would benefit from an
injection of structured information. - In addition to reconciling many facets of one
topic, the underlying structure of Freebase lets
the user run more complex queries. - For example, if Freebase is asked for films
starring Jennifer Connelly and actors who have
appeared in Steven Spielberg movies, a list of 8
movies is given.
21Films starring Jennifer Connelly
22Freebase vs. Wikipedia
- The difference lies in the way they store
information. - Wikipedia arranges information in the form of
articles. - Freebase lists facts and statistics. Its list
form is good not only for people who like to
glance at facts, but also for people who want to
use the data to build other Web sites and
software. (Information in an article form cant
be reused in the same way.) - Topics covered by Freebase include subjects that
are too obscure for Wikipedia, which strives for
notability appropriate to an encyclopedia.
23Contents
- Web 3.0 Semantic Search
- General Search
- "Natural Language" Search
- Vertical Search
- "Social Networking" Search
- Personalized Search
24 Powerset
- Powerset is a Silicon Valley company.
- Goal build a transformative consumer search
engine based on Natural Language Processing
(NLP). -
25 Powerset Rational
- Unlike conventional search engines that use
keywords, Powerset reads and understands every
sentence on a Webpage and allows asking questions
in plain English. - Unique innovations in search are rooted in
breakthrough technologies that take advantage of
the structure and nuances of natural language. - Using these advanced techniques, Powerset is
building a large-scale search engine that breaks
the confines of keyword search. - By making search more natural and intuitive,
Powerset is fundamentally changing how we search
the Web, by delivering higher quality results.
26 Who proved Fermats last theorem?
27 What did Steve Jobs say about the iPod?
28 What did Bush say about Gore?
29 Powerlabs
- Powerlabs is a community where users can
- interact with demonstrations of Powersets
technology before search engine launches in 2008 - give feedback to help improve the "Natural
Language" indexing - suggest ideas for the ideal search engine.
- Utilizes the participation of users on such a
scale and at such an early stage of development,
as a recognition of the potential of crowds
wisdom to guide Powerset.
30Powerlabs Sign In
31 Wiki Search Sneak Peek
- Access to first open search box covering
Wikipedia. - Powerset uses linguistic analyses of both the
query and Wikipedia to find the best matches. - The Miniviewer allows to view highlighted matches
in the context of a Wikipedia article without
ever having to leave the results page. - By incorporating semantic information from
Powersets indexing process into republished Wiki
pages, internal page search enables a whole new
kind of search semantic-search-within-the-page.
32 Explore Wikipedia
33 Google acquire something
34 Google acquire company
35Search Wikipedia
36 Companies acquired in 2001
37 Powerset PowerMouse
- PowerMouse is an application that provides a view
into Powersets technology, letting users examine
how structured information is extracted from open
text. - It is not intended as a search application per
se, but allows to search for and navigate through
facts encoded in Powersets Wikipedia index. - It allows to see in dramatic fashion how
compactly large amounts of data can be organized
and displayed based on a few semantic
relationships.
38 PowerMouse Examples
39 Google acquire something
40 something eats carrot
41 person won nobel
42Contents
- Web 3.0 Semantic Search
- General Search
- "Natural Language" Search
- Vertical Search
- "Social Networking" Search
- Personalized Search
43Kango
- Vertical semantic search engine for personalized
travel information. - Goal first step to deciding where to go, where
to stay or what to do finds the trip that is
right for you.
44Kango Rational
- Kango indexes the collective wisdom on travel
from the entire Web. - Recommendations are based on a gestalt of voices
heard in over 20 million reviews, ratings,
blogs, journals, and articles collected from over
a thousand sources such as Web sites, books and
magazines. - Organizes and presents the most relevant opinions
and product details in a "federated" search
display based on whats known about travel
preferences.
45Kango Repository
- Kango has scoured the Web to collect all kinds of
places to go, things to do and places to stay. - It then analyzed and organized millions of
travelers' opinions to enable search based on
exact travel requirements and preferences. - Kango brings together
- more than a thousand sites
- 400,000 lodging, activity and destinations
options - 20 million reviews, ratings and blogs.
46How Kango Works
47Kango Semantics
- It provides many options for specifying a trip.
- Kango thinks about those options in terms of the
Long Tail concept to help make the trips
distinct and memorable. - It "understands" the travel lingo, so it helps
make informed decisions about what best fits
specific travel preferences for each user. - Kango is creating an ontology of global travel
content that includes ranking of superlatives
within review sites.
48Lodging
49Things to Do
50Kango Dynamics
- Enables new ways of filtering through its
collection to get the recommendations that are
most relevant to preferences and priorities. - Based on persons traveled with, the kind of
destination looked for, and what is likely to be
done, it sifts through its information to deliver
the right getaway. - For example, returns
- one set of hotel and activity recommendations
when traveling to Monterey for a romantic getaway
- a different set when going to Monterey with the
family to visit the aquarium and hang out on the
beaches.
51Old Monterey Inn
52Campgrounds in Hawai
53Contents
- Web 3.0 Semantic Search
- General Search
- "Natural Language" Search
- Vertical Search
- "Social Networking" Search
- Personalized Search
54SemantiNet
- SemantiNet is a startup, based in Tel Aviv, that
is creating a new revolutionary technology that
is based on Semantic Web concepts. - Goal leverage Web information in a meaningful
way to boost the manner users experience the
Internet.
A. Frank
55SemantiNet Rational
- SemantiNet makes life easy by allowing users to
take advantage of the variety and richness of
information and services that exist on the
Internet, but in a way that is simple, smart and
intuitive. - SemantiNet leverages Semantic Web concepts to
seamlessly integrate information and services
enabling users to achieve more while working
less!
A. Frank
56SemantiNet Repository
- SemantiNet collects relevant information from
common social networks and established Web sites
in order to provide users with a customized and
efficient personalized and contextual browsing
experience. - Relevant personal information can be
- entered on their Web site
- provided by users through use of SemantiNet
- or extracted from "traffic data" generated by
browser use.
A. Frank
57SemantiNet Semantics
- Develops a semantic framework solution that
allows for rapid deployment of Web mashups,
applications and services, in a way that enhances
the way people use the internet. - Rather than simply aggregating information,
SemantiNets technology, integrates information
as well as mashing it as needed. - The idea is to bring the relevant online content
to the user rather than the user to the content.
A. Frank
58SemantiNet Demo
59SemantiNet Demo
60SemantiNet Demo
61SemantiNet Demo
62Example of Social Graph
63Delver
- Delver (formerly Semingo) is headquartered in
Herzeliya and will officially open U.S. offices
in Silicon Valley in spring of 2008. - Goal provide a semantic search engine that
allows users to search for information created
and referenced by their own social graph.
64Delver Rational
- Delver provides a connected search engine that
allows users to find content, media and people
within their network via a simple search
interface. - Delver organizes and ranks content from the
users network because social connections are
critical for discovering more personally relevant
information. - It indexes the social Web (social networks,
blogs, social applications, etc.), and
cross-connects the data with users social
graph. - Improves the relevancy of Web search results by
prioritizing these results based upon the
specific searchers social network.
65Delver Repository
- Delver begins by crawling the Web in order to map
users social connections. - It specifically indexes people's social
connections on flickr, MySpace, LinkedIn,
YouTube, hi5, facebook, Blogger, and more sites
are being added all the time. - Instead of just looking at a Web site's
popularity, Delver looks at information like
whether your friends have tagged the site or if
it's found on their social network profiles,
bookmarking sites, photos and video sharing
sites, or on their blogs. - The results are more relevant because they
account for who a person is and what it finds
valuable.
66 Liad Agmon
67 Venture Funding
68Delver Semantics
- Delver knows who a user is and who his friends
are even if users didn't import their address
book or add their "Social Networking" profiles. - Instead, Delver leverages the social graph to map
out a user's social connections. - Since everyone's social graph is unique, like a
fingerprint, the same Delver query will yield
significantly different results for each user
as reflected through the collective experiences
of each persons contacts. - The results are more personal and meaningful to
users than a generic search using a "normal"
search engine.
69Delver Dynamics
- When a user performs a query, results from all
over his social Web are displayed. - Even if a user and others are not directly
related as "friends" on a social network, the
plus sign the beneath picture can still be
clicked to add them as a connection. - This way, a user can view the relevant bookmarks,
links, blog posts, photos, and videos of people
like him even if he doesnt know them
personally... and they don't have to confirm the
connection on their end. - Alternately, a user can choose to exclude certain
connections from his search results.
70Roi Carthy
71Visit New York
72Contents
- Web 3.0 Semantic Search
- General Search
- "Natural Language" Search
- Vertical Search
- "Social Networking" Search
- Personalized Search
73Radar Networks Twine
- Radar Networks, a pioneer of Semantic Web
technology, introduced Twine. - Goal enables individuals and groups to organize,
share and discover information and knowledge
around their interests.
74Twine Rational
- Twine is a "knowledge networking" tool designated
as a revolutionary Semantic Web application. - It is a new service that helps organize, share
and discover information about user interests,
with networks of like-minded people. - Twine can be used alone, with friends, groups and
communities, or even in a company. - It has aspects of social networking, wikis,
blogging, knowledge management systems but its
defining feature is that it's built with Semantic
Web technologies. - It aims to bring a usable and scalable interface
to the long-promised dream of the Semantic Web.
75Twine Repository
- Using Twine, a user can
- add content via Wiki functionality (has many post
types) - email content into the system
- and "collect" something (as an object, e.g., a
book object). - Twine ties it all together
- As information is added to Twine, it is
automatically tagged so that it can be easily
found. - Users can connect with individuals and groups,
gather and share content, and engage in
discussions around interests. - Twine connects between new people, content and
products that match their interests, and also
helps users discover other people and their
contributions.
76Twine Semantics
- Twine is powered by semantic understanding.
- At first glance it is very much like Wikipedia,
but there is a whole lot more smarts to the
system. - It's not based around socializing, but aims to
share information and automatically organize it,
learn about user interests, and make varied
connections and recommendations. - The more it is used, the better it understands
the user interests and the more useful it
becomes. - It is a "Semantic Graph", which maps
relationships to both people and topics.
77Twine Sign In
78Twine Dynamics
- Enables user commenting and viewing of related
things. - Allows sharing of tags.
- Enables import and export of user own data.
- RSS feeds to track all kinds of things (topics,
events, search, etc). - Semantic Web technologies are being used RDF,
OWL, SPARQL, XSL, GRDDL. - An open platform - there will be SPARQL and REST
APIs.
79Welcome Steve to Twine
80Explore Green Business and Investing
81Steve Smiths Twine
82Explore Green Tech
83Semantically up -?)
84Where does the MetaWeb fit?!
85References
- Web 3.0, In Wikipedia, The Free Encyclopedia,
http//en.wikipedia.org/w/index.php?titleWeb_3.0
oldid123368293 - Entrepreneurs See a Web Guided by Common Sense,
John Markoff , New York Times, November 12, 2006,
http//www.nytimes.com/2006/11/12/business/12Web.h
tml?ex1320987600en254d697964cedc62ei5088 - Parts I II A Smarter Web, John Borland,
Technology Review, March 19-20, 2007,
http//www.technologyreview.com/Infotech/18396/
86References
- M. Hildebrand, J. R. van Ossenbruggen, L.
Hardman, An Analysis of Search-based User
Interaction on the Semantic Web, Report
INS-E0706, May 2007, 6th Intl. Semantic Web
Conference, November 2007, http//ftp.cwi.nl/CWIre
ports/INS/INS-E0706.pdf - Jim Hendler, Web 3.0 Chicken Farms on the
Semantic Web, IEEE Computer, January 2008,
http//www.computer.org/portal/site/computer/menui
tem.5d61c1d591162e4b0ef1bd108bcd45f3/index.jsp?pN
amecomputer_level1_articleTheCat1075pathcompu
ter/homepage/0108fileWebtech.xmlxslarticle.xsl
- Richard Waters, World-wise Web?, Financial Times,
http//www.ft.com/cms/s/0/4fba0434-e98c-11dc-8365
-0000779fd2ac.html?nclick