Title: Autocompletion for Mashups
1Autocompletion for Mashups
- Ohad Greenshpan, Tova Milo, Neoklis Polyzotis
Tel-Aviv University UCSC
2Talk Roadmap
- Introduction on Mashups and Autocompletion
- Problem Definition
- The Algorithm
- Implementation experiments
- Conclusions Related Work
3Introduction - What is a mashup ?
- Mashup is a technology for integration of data,
services - and applications being available on the web, into
a single application.
4Application Integration
GUI
Logic
Data
Data
Data
Data
5Mashup Development is difficult ...
6knowledge
?
knowledge
7Introduction - Mashup Autocompletion
8The Mashup Model
9Inheritance
B
B
A
A
10Mashup Autocompletion Problem Definition
- Given a database of mashlets and GPs and a set
- of mashlets selected by the user, identify and
rank GPs that link a - subset of the selected mashlets.
- Based on Popularity Relevance to user
query -
What would be the ideal GP
- The most popular one that connects only the user
mashlets and nothing else - Relaxations
- Less popular
- Connects variants of the user mashlets
- Connects a subset of the user mashlets
- Connects additional mashlets
11Inheritance
12Problem Abstraction
- Each glue pattern is represented as a point in a
multidimensional space. - One dimension representing the GP popularity
- The rest All mashlets
- 1) User Mashlets
- 2) Other mashlets
- The algorithm goal is to find the top-k GPs that
link the given user mashlets (the ones close to
the optimal GP).
GP Popularity
m2
m1
13Data Structure Basic Top-k Algorithm
GP Popularity
Mashlets
L1
gtgp,scorelt
gtg7,0.1lt
gtg4,0.2lt
gtg6,0.2lt
gtg1,0.3lt
gtg5,0.4lt
gtg2,0.5lt
gtg3,0.7lt
L2
gtgp,scorelt
gtg4,0.1lt
gtg3,0.2lt
gtg1,0.5lt
gtg2,0.5lt
gtg7,0.5lt
gtg5,0.8lt
gtg6,0.8lt
L0
gtgp,scorelt
gtg1,0.1lt
gtg2,0.2lt
gtg3,0.4lt
gtg4,0.4lt
gtg5,0.4lt
gtg6,0.4lt
gtg7,0.4lt
L3
gtgp,scorelt
gtg1,0.1lt
gtg2,0.6lt
gtg7,0.6lt
gtg6,0.7lt
gtg4,0.8lt
gtg5,0.8lt
gtg3,0.9lt
Glue Patterns
14Problems with the algorithm
- The number of lists the algorithm accesses is
very large - Most of the mashlet lists are unrelated to the
user selection (query)
15Data Structure
Mashlets
GP Popularity
User mashlets
Glue Patterns
16Algorithm
17Correctness of AC - Lemma
- Theorem 4.1 Algorithm AC returns a correct
solution - Proof is based on a lemma showing that any
candidate that has not been encountered by AC,
has a total score lower than the threshold.
Optimality of AC
- Competing Algorithms
- C class of deterministic algorithms that
operate under the same access model as AC. - Algorithms receive as input the lists, the
monotonic function, and k. - Algorithms can use any order (i.e., not
specifically round-robin) and any thresholding
scheme, and can rely on accessed elements. - Instance Optimality
- AC is instance optimal within class C if there
are constants c and c0 such that for every input
instance I, cost(AC,I) ccost(A,I)c0 for any
A?C.
18 Calculating Popularity Glue
Pattern and Mashlets Rank
- Page-rank style algorithm
- Takes into account popularity of mashlets and
GPs, as well as relationship between them.
GP
GP
GP
M
M
GP
M
M
19IBM Mashup Center
Implementation
Websphere Application Server
Knowledge base
MatchUp Algorithm
20Experiments (synthetic dataset)
- Synthetic dataset for large-scale experiments
- Generated a DB of 40k mashlets GPs
(ProgrammableWeb has 4k) - Based on ProgrammableWeb characteristics.
- Experiments for synthetic dataset
- Varying of total mashlets and GPs
- Varying k
- Varying of user mashlets
- Varying GP complexity
21Results (synthetic dataset)
GP Complexity 5, varying k
22Results (synthetic dataset)
GP Complexity 10, varying k
23Results (synthetic dataset)
Varying of user mashlets
24Experiments (real dataset)
- Real dataset
- Used real-life mashlets from ProgrammableWeb and
IBM Mashup Center - Scenario development of a travel-related mashup
- Experiments for quality assesment
- IBM Mashup Center as the mashup platform
- Users placed mashlets
- MatchUp offered top-10 GPs for their mashlets
- Users searched for alternatives
- Results
- User satisfaction was high
- High correlation between suggestions and users
lists - Browsing for additional results was in general
unsuccessful - Gluing process was significantly expedited
25Related Work
- Autocompletion in many other domains
- Phrase Prediction (Nandi Jagadish, VLDB 2007)
- File locations (Myers, CHI 2000)
- Web service composition
- Model for WS composition (Berardi et al., VLDB
2005) - Optimized and customized algorithm (Mcilraith and
Son, KR 2002) - Mashup assembly tools
- MashMaker (Ennals Garofalakis, SIGMOD 2007)
data -gt widgets - MashupAdvisor (Elmeleegy et al., ICWS 2008)
mashup -gt output recomm. -gt assembly to achieve
this output
26(No Transcript)
27(No Transcript)