Nessun titolo diapositiva - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Nessun titolo diapositiva

Description:

(Bookstein, 1981; Yager 1987, Sanchez 1989) 3) SIMILARITY (IDEAL IMPORTANCE VALUES) ... (R.R. Yager, IEEE Trans. on SMC, 1988) a1, ...., an: numeric values to be ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 47
Provided by: gloriab1
Category:

less

Transcript and Presenter's Notes

Title: Nessun titolo diapositiva


1
(No Transcript)
2
(No Transcript)
3
(No Transcript)
4
Limitations of actual IRSs (and SE!)
  • They behave as a black box for a same query the
    same answer
  • They are mainly based on static models (systems
    do not adapt or minimally adapt - their
    behaviour based on learning of users real
    information needs)
  • They do not account for vagueness intrinsic in
    the process of verifying the property of
    information items to be informative, i.e.
    relevant to specific needs. They may account to
    uncertainty (probabilistic) only at a limited
    extent
  • Query languages are usually based on selection
    criteria specified by terms (keywords) no
    possibility to be vague-uncertain
  • Simple visualization techniques of retrieval
    results

CONSEQUENCE subjectivity modeled only at a
shallow extent while IRSs should adapt to users
needs!
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
The weights specify soft constraints on the
weighted document representations The RSV of a
document express the degree of constraints
satisfaction
10
(No Transcript)
11
  • Given a weighted query, the Retrieval Status
    Value computed for a given document for that
    query expresses the degree of constraints
    satisfaction.

12
Query weights semantics
  • 1) THRESHOLD
  • (Radecki, 1979 Buell Kraft, 1981)
  • 2) RELATIVE IMPORTANCE
  • (Bookstein, 1981 Yager 1987, Sanchez 1989)
  • 3) SIMILARITY (IDEAL IMPORTANCE VALUES)
  • (Cater Kraft 1989 Bordogna,Carrara Pasi,
    1991)

13
(No Transcript)
14
(No Transcript)
15
Linguistic query weights
Boolean expressions on pairs ltt,lwgt lw ? very
important, important, not very important ..
  • Each value lw has a function mlw associated,
    which evaluates the compatibility between the
    constraint lw and the numeric values F(d,t)
    ?0,1

16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Personalized indexing of semi-structured documents
  • Limitations of the usual indexing procedure
  • the weighted representation of documents does not
    take into account that a term can play a
    different role within a text, according to the
    location and distribution of its occurrences.
  • usual indexing procedures produce the same
    document representation for all users

Need for personalized indexing procedures
28
A model for personalized indexing of structured
documents
(Bordogna Pasi, International Journal of
Approximate Reasoning, 1995) (Bordogna Pasi,
Information Retrieval, 2005)
29
Hierarchical document structure
30
A model for personalized indexing of structured
documents
  • The model is composed by
  • a static component that extracts the index terms
    and computes for each of them and each document
    the significance degrees in the document sections
  • an adaptive component activated by a user query
    that computes the overall significance degree
    F(d,t) for each query term t and document d.

31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
A model for personalized indexing of structured
documents
The structured indexing model can take into
account hierarchical structures. The index term
weight in a section is computed by aggregating
index term weights at the lowest level. aaa
35
(No Transcript)
36
(No Transcript)
37
XPath
  • XPath is the standard language to write tree
    traversal expressions that extract XML fragments
  • XPath selection is also used within the
    fully-fledged XML Query language (XQuery)
  • The main features of XPath are
  • a rich set of available built-in expressions
  • elementary data-types Boolean, Number, String,
    Node set
  • specification of selection conditions to be
    satisfied by the relevant nodes
  • Path-based selection the user knows enough of
    the target schema so as to be able to formulate a
    search path to be matched against the structure
    of the target XML documents
  • root_node/ /parent_node/target_node

38
Flexibility in XPath
  • Some flexibility can be already achieved in XPath
    (by means of wildcards)

39
Vague predicates
  • Traditional query languages (in the database
    context) allow for data selection based on binary
    predicates (crisp).
  • Relevance w.r.t. a query is therefore modeled as
    a binary concept either an information item is
    relevant or not
  • On the other hand, a vague predicate, represented
    by a fuzzy subset, expresses a soft condition,
    whose evaluation produces a numeric value in
    0,1, with the consequence that the results can
    be ranked
  • The membership functions are defined in
    accordance with the semantics of the linguistic
    labels employed for the vague predicates
    (expensive, recent, )

40
Flexible selection
  • Since atomic information items are clustered in a
    hierarchical structure, we can state that the
    nearer two items are the more likely they are
    semantically related.
  • XPath provides two crisp constructs that can
    express topology constraints, as follows
  • /articles//article. The axis matches any tag
    in a specified position, disregarding its name
    (the position is known, the name is unknown)
  • /articles//article. The // axis matches any path
    that descends the containment hierarchy, and all
    the article elements are matched, independently
    of their distance from the articles element
    they are contained in (the name is known, the
    position is unknown).

New proposal a new XPath axis NEAR can be
defined /articles/NEAR/article The
result set will be ranked w.r.t. the increasing
number of steps to be descended.
41
The PENG project
.
  • The PENG project was a STREP project aimed at
    defining a flexible, personalised system for the
    gathering, filtering, retrieval and presentation
    of multimedia news for news professionals (e.g.
    journalists and editors), with a view of making
    the system also available for general users.

42
Characteristics of the Clustering algorithm
  • Categories of news are not known a-priori
  • Unsupervised clustering
  • Clusters content summarization
  • Need to deal with categorization ambiguity
  • Probabilistic or fuzzy clustering
  • Need to identify Categories of news with
    distinct granularity organized in general topics
    and specific-topics
  • Hierarchical fuzzy clustering

43
Characteristics of the Clustering Algorithm
  • The algorithm generates a hierarchy of Fuzzy
    clusters by recursively applying the Fuzzy
    C-Means algorithm
  • Extensions of FCMs
  • Cosine similarity measure
  • The algorithm automatically identifies the number
    of clusters to generate

44
(No Transcript)
45
Do SE and commercial systems apply Fuzzy Set
Theory?
  • Try the following query lt search engine fuzzy
    gt
  • Lot of answers!
  • The answer to the previous questions some yes
  • Mainly for facing the problem of misspelling. ?
    Levenshstein distance.
  • Other talk about fuzzy matching mechanisms
  • HAKIAVERITY ..

46
Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com