Title: Open Search
1Open Search
2Overview
- Proliferation of Digital Libraries
- Metasearch and Fixed Lists of Sources
- Open Search Architecture
- PublishMe for P2P knowledge Sharing
- Webtop Metasearch Clients
3Contributors
- Michael Kepe
- Igor Ranitovic
- Iman Sadreddin
- Senior Team 03
- Ken Chong
- Rudd Stevens
- Colin Bean
- Tim Chan
- Julian Chan
- Pooja Garg
4Information Source Explosion
- Google, Amazon APIs
- Internet Archive
- Technorati The World Live Web
- Domain Specific
- ACM Digital Library for CS
- Lexis-Nexis for law
- MLA for literature
5End-User Created Digital Libraries
- Personal Web (shared Google desktop)
- Personal Web Neighborhood
- Topic-Specific Personal Crawlers
- Ordinary people creating search engines as easily
as web pages
6Subsets of the Web
7Motivation for Small, Independent Subsets of the
Web
- Avoid information being channeled through a
single portal Googleopoly - Google does no evil, but
- Censorship in China
- Creeping level of commercialization
- Unregulated manipulation of secret ranking
algorithms (see PageKing case) - Other media is lost, this is the last frontier
8Little support for using multiple search engines
9Overview
- Proliferation of Digital Libraries
- Metasearch and Fixed Lists of Sources
- Open Search Architecture
- PublishMe for P2P knowledge Sharing
- Webtop Metasearch Clients
10Metasearch
- Help users discover and use digital libraries
- Send queries to multiple, selected search engines
- filter, process, and unify results
- A9.com Amazons metasearch
11Web Services Basis
html
server
Web Page Model
html
server
software
xml
server
Web Service Model
12How does metasearch evolve?
New Digital library
13How does metasearch evolve?
New Digital library
Metasearch clients discover it
14How does metasearch evolve?
New Digital library
Metasearch clients discover it
Metasearch Programmers write adaptor/scraper
15How does metasearch evolve?
New Digital library
Metasearch clients discover
Metasearch Programmers write adaptor/scraper
User can access within metasearch
SLOWLY
16Overview
- Proliferation of Digital Libraries
- Metasearch and Fixed Lists of Sources
- Open Search Architecture
- PublishMe for P2P knowledge Sharing
- Webtop Metasearch Clients
17Goal Automate the Process
- Metasearch engines should provide users with
up-to-date lists of existing digital libraries - Digital libraries should be able to register and
be made immediately available to all Metasearch
clients. - Metasearch and Library development is independent.
18What is Necessary?
- Standard Search API
- So Metasearch clients can use polymorphism to
access sources. -
- for each source s in sourceList
- searchEngine.endPointUrl s.endPointUrl
- resultList searchEngine.keywordSearch(keyword
s) -
- Search API Registry
- Metasearch clients can get dynamic list
19Web Service Standards
- WSDL Web Service Description Language
- SOAP Simple Object Access Protocol
- UDDI Universal Description, Discovery, and
Integration
20Standards on top of Web Services
- WSDL, SOAP, UDDI basis for standards in many
domains. - e.g., MS initiated for securities information
providers - Businesses agree on a standard, then client
applications can use polymorphism and new
businesses can register services. - In this case, we want cross-domain standard.
21Open Search Architecture
- Open Search Protocol (OSP)
- Cross-Domain Search-related services
- Not just keyword search, but citations, authorOf,
etc. - Open Search Registry
- Based on UDDI
- Can add customization, e.g., parsing to find out
which search operations are implemented. - Web and web service access
22Open Search Architecture
OSP metasearch clients
source list
OS Registry
Register service
OSP-Conforming Libraries
23 User Can Choose Sources
24Open Search Protocol
- Keyword search
- Citations (inward links, outward links)
- AuthorOf and other associative operations
- Metadata object results based on Dublin Core
- Restriction object for advanced search stuff
25Publishing a Library
- Access OSP WSDL Specification from
webtop.cs.usfca.edu - Generate code in language of choice
- Implement the search operations for the digital
library - Deploy the service
- Register with Open Search registry
26Deploying an Open Search Lib.
Library server
4. deployed service
Open Search information Registry
1. OS wsdl
programmer
2.wsdl
5. registration info
3. skeleton code
wsdl2java
27Wrapping a Library
Custom search API, e.g., Google API
2. Custom query
3. Custom Result
Open Search Wrapper
Located on 3rd party server
1. OSP Query
4. OSP Result
Metasearch Client
28Wrappers Developed at USF
- Google
- Amazon (sort of)
- Internet Archive
- Technorati
- Feedster
29Overview
- Proliferation of Digital Libraries
- Metasearch and Fixed Lists of Sources
- Open Search Architecture
- PublishMe for P2P knowledge Sharing
- Webtop Metasearch Clients
30PublishMe
- Like Google Desktop, but shared.
- Periodically updates inverse index and linkbase
on PC - Deploys Web Service on Users PC
- Auto-Registers with Open Search Registry
31 Metasearch with P2P Knowledge Sharing
WEBTOP
32 Integrating Global and Personal Libraries
33Motivation for Sharing Personal Webs
People create knowledge everyday when they
bookmark, annotate, link, organize, and
synthesize. Communication is a separate step
which often doesnt happen
34Motivation for Sharing Personal Webs
Collaborative Work
Experts
35Computers are designed using our brains for a
model
- Knowledge creation and dissemination separate
- Explicit effort required to communicate
- Just as we model our word processors on paper.
36Additions to OSP for P2P
- GetFile
- OnLine(ip)
- Handles user starting up
- Dynamic IPs
- OffLine
37But What About PRIVACY?
- The Big Question
- How much of the information hidden
- within your personal web is hidden due
- to privacy concerns?
38I Want you to be a Search Engine!
39Overview
- Proliferation of Digital Libraries
- Metasearch and Fixed Lists of Sources
- Open Search Architecture
- PublishMe for P2P knowledge Sharing
- Metasearch Clients
40Goal Implement Vannevar Bushs Association Trails
- View a document/thing in context
- History of an idea
41Thinkmap-like Interface
42Association Types
- Outward links
- Inward links
- Similar-Content links
- People Links
- author, people referenced in paper
- Domain-Specific links
- law citations
- movie-actor
- Associations specified by Annotators
43Webtop Tree View webtop.cs.usfca.edu
44Expanding a Tree
- Birds Eye View
- Local/Web files integrated
- Follow different Associative Trails
- Ins of Outs of Ins, etc.
- Siblings
- Weird though, as ins and outs both expand right
45Webtop Side Panel View
46Project Status
Too many bugs, Dad
47Future Work
- Open Search Protocol
- In-depth study of existing search APIs
- Provide Rest alternative to SOAP
- Metasearch development
- Complete and refine existing clients
- Dream up new ones
- Thinkmap Graph
- Automated Source Selection and Reputation System
- Page Ranking
- Initiate grass-roots involvement
48Future Work Documents and Things
resourceassociationsannotations
document
person
creative work
html
word
pdf
film
book
49Stop talking about Webtop daddy!
webtop.cs.usfca.edu