Title: On%20the%20Evaluation%20of%20Semantic%20Web%20Service%20Matchmaking%20Systems
1On the Evaluation of Semantic Web Service
Matchmaking Systems
- Vassileios Tsetsos, Christos Anagnostopoulos and
- Stathes Hadjiefthymiades
- Pervasive Computing Research Group
- Communication Networks Laboratory
- Department of Informatics and Telecommunications
- University of Athens Greece
- ECOWS 06 _at_ Zurich
2Outline
- Introduction
- Problem Statement
- A Generalized Fuzzy Evaluation Scheme for Service
Retrieval - Experimental Results
- A Pragmatic View
- Conclusions
3SWS Matchmaking
- Matching service requests and advertisements,
based on their semantic annotations (expressed
through ontologies) - Numerous matchmaking approaches
- Logic-, similarity-, structure-based (graph
matching) - Various matched entities
- functional service parameters (e.g., IOPE
attributes) - Non-functional parameters (e.g., QoS attributes)
- Ultimate goal More effective service discovery,
based on semantics and not just on syntax of
service descriptions
4Degree of Match
- A value that expresses how similar two entities
are, with respect to some similarity metric(s) - Important feature of almost all SWS matchmaking
approaches - Allows for ranking of discovered services
- Example DoM set exact, plugin, subsumes,
subsumed-by, fail
5Evaluation Basics
- Most works evaluate the performance of SWS
Discovery (i.e., response times, scalability) - Limited contributions to the evaluation of
retrieval effectiveness (i.e., the ability to
discover relevant services)
Q possible service requests S advertisements of
published services e QxS?W (DoM, analogous to
Retrieval Status Value in IR) r QxS?W (expert
mappings) Evaluation is the determination of how
closely vector e approximates vector r
6Evaluation Schemes
- W is the set of values denoting DoM (for e) or
degree of relevance (for r) - W defines different evaluation schemes (EVS)
Evaluation Scheme RSVs e(R,Si) Expert Mappings r(R,Si)
EVS1 Boolean Boolean
EVS2 Multi-valued Multi-valued
7Boolean Evaluation (EVS1)
- W0,1
- Information Retrieval (IR) measures can be used
- Precision (PB) and Recall (RB)
RT set of retrieved advertisements RL set of
relevant advertisements
8Problem Statement (1/2)
- Since, SWS matchmaking systems have multi-valued
vectors e, application of Boolean evaluation
implies the introduction of a relevance threshold
- Problem 1 This Booleanization process filters
out any service semantics captured through DoM - Problem 2 An optimal threshold value is hard to
find
9Problem Statement (2/2)
- Problem 3 Boolean expert mappings are too
coarse-grained and do not always reflect the
intention of the domain expert. - Experiment
- Manually defined multi-valued mappings between 6
requests and 135 advertisements of TC2 with W0,
0.25, 0.5, 0.75, 1 - Calculation of deviation from existing Boolean
mappings
- Only 33 of the Boolean mappings agree with the
multi-valued ones - 40 of the Boolean mappings are not even close
to the multi-valued ones (deviation gt 0.25)
10A Generalized Fuzzy Evaluation Scheme
- Such scheme (EVS2) can provide solutions to the
aforementioned problems - Main design decisions
- Expert mappings are fuzzy linguistic terms
- DoM are fuzzy sets
- Boolean measures are substituted by generalized
ones - Why fuzzy modeling?
- Relevance is an amorphic concept (L. Zadeh).
I.e., its complexity prevents its mathematical
definition - Numeric values have vague semantics
- Fuzzy linguistic variables assume values from a
linguistic term set, with each term being a fuzzy
variable set - Warning Fuzziness does not refer to the
matchmaking process per se
11Fuzzification of e and r
fr QxS?0,1
fe QxS?0,1
If there is not one-to-one correspondence between
the number of fuzzy variables in each set, fuzzy
modifiers could be used (e.g., dilutions,
concentrators)
12Generalized Evaluation Measures
- Based on Buell and Kraft, Performance
measurement in a fuzzy retrieval system, 1981
the following measures are defined
- The cardinalities of the sets RT and RL are
transformed to fuzzy set cardinalities, since the
above sets are fuzzy. - Note the evaluation measures take into account
all services Si
13Experimental Results (1/3)
- Manual assessment of fuzzy relevance in the
Education subset of TC v2 - Matchmaking engine OWLS-MX Matcher
- Used only logic-based matching algorithms
- Threshold FAIL
EVS1 EVS1 EVS2 EVS2
Query ID RB PB RG PG
Q15 77 77 77 77
Q16 60 92 87 96
Q17 57 92 77 89
Q18 73 92 90 88
Q19 100 65 100 71
Q20 80 71 95 72
Difference between RG and RB is due to
considerable deviation between Boolean and fuzzy
expert mappings
14Experimental Results (2/3)
- Sensitivity of the proposed scheme
Actual case Hypothetical case
S1 somewhat relevant/FAIL (RG87) S1 very relevant/FAIL (RG84, all other unchanged)
S2 irrelevant/SUBSUMES (PG96) S2 irrelevant/EXACT (PG93, all other unchanged)
- Only the generalized measures, are affected by
stronger false negatives/positives
15Experimental Results (3/3)
- Similar overall behavior but better
accuracy/sensitivity as already shown
16A Pragmatic View
- A reasonable assumption
- experts are not willing to provide more than
Boolean mappings - Automatic fuzzification of Boolean expert
mappings would be valuable
17A First Approach
- Services are represented as concepts and form a
service profile ontology - Then an inference matrix is used for adjusting
the Boolean r values
Logic relation Eq DSup DSub Sib No
Boolean Value 1 1 1 1 1
Inferred Fuzzy Value V R R R SW
Logic relation Eq DSup DSub Sib No
Boolean Value 0 0 0 0 0
Inferred Fuzzy Value SW S S I I
18Experimental Results
- The new scheme (EVS2) approximates EVS2 better
than EVS1 - Under the assumption that EVS2 is more accurate,
the EVS2 seems promising
EVS1
EVS2
EVS1 (average)
EVS2 (average)
EVS2
19Conclusions
- Service retrieval evaluation should be
semantics-aware - A generalization of the current evaluation
measures is deemed necessary - Fuzzy Set Theory may assist towards this
direction - However, many practical issues remain open
20Thank You!
- Questions???
- http//p-comp.di.uoa.gr