Nessun titolo diapositiva - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

Nessun titolo diapositiva

Description:

(Bookstein, 1981; Yager 1987, Sanchez 1989) 3) SIMILARITY (IDEAL IMPORTANCE VALUES) ... (R.R. Yager, IEEE Trans. on SMC, 1988) a1, ...., an: numeric values to be ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 47

Provided by: gloriab1

Category:

more less

Transcript and Presenter's Notes

Title: Nessun titolo diapositiva

1
(No Transcript)
2
(No Transcript)
3
(No Transcript)
4
Limitations of actual IRSs (and SE!)

They behave as a black box for a same query the
same answer

They are mainly based on static models (systems
do not adapt or minimally adapt - their
behaviour based on learning of users real
information needs)

They do not account for vagueness intrinsic in
the process of verifying the property of
information items to be informative, i.e.
relevant to specific needs. They may account to
uncertainty (probabilistic) only at a limited
extent

Query languages are usually based on selection
criteria specified by terms (keywords) no
possibility to be vague-uncertain

Simple visualization techniques of retrieval
results

CONSEQUENCE subjectivity modeled only at a
shallow extent while IRSs should adapt to users
needs!
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
The weights specify soft constraints on the
weighted document representations The RSV of a
document express the degree of constraints
satisfaction
10
(No Transcript)
11

Given a weighted query, the Retrieval Status
Value computed for a given document for that
query expresses the degree of constraints
satisfaction.

12
Query weights semantics

1) THRESHOLD
(Radecki, 1979 Buell Kraft, 1981)
2) RELATIVE IMPORTANCE
(Bookstein, 1981 Yager 1987, Sanchez 1989)
3) SIMILARITY (IDEAL IMPORTANCE VALUES)
(Cater Kraft 1989 Bordogna,Carrara Pasi,
1991)

13
(No Transcript)
14
(No Transcript)
15
Linguistic query weights
Boolean expressions on pairs ltt,lwgt lw ? very
important, important, not very important ..

Each value lw has a function mlw associated,
which evaluates the compatibility between the
constraint lw and the numeric values F(d,t)
?0,1

16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Personalized indexing of semi-structured documents

Limitations of the usual indexing procedure

the weighted representation of documents does not
take into account that a term can play a
different role within a text, according to the
location and distribution of its occurrences.
usual indexing procedures produce the same
document representation for all users

Need for personalized indexing procedures
28
A model for personalized indexing of structured
documents
(Bordogna Pasi, International Journal of
Approximate Reasoning, 1995) (Bordogna Pasi,
Information Retrieval, 2005)
29
Hierarchical document structure
30
A model for personalized indexing of structured
documents

The model is composed by
a static component that extracts the index terms
and computes for each of them and each document
the significance degrees in the document sections

an adaptive component activated by a user query
that computes the overall significance degree
F(d,t) for each query term t and document d.

31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
A model for personalized indexing of structured
documents
The structured indexing model can take into
account hierarchical structures. The index term
weight in a section is computed by aggregating
index term weights at the lowest level. aaa
35
(No Transcript)
36
(No Transcript)
37
XPath

XPath is the standard language to write tree
traversal expressions that extract XML fragments
XPath selection is also used within the
fully-fledged XML Query language (XQuery)

The main features of XPath are
a rich set of available built-in expressions
elementary data-types Boolean, Number, String,
Node set
specification of selection conditions to be
satisfied by the relevant nodes

Path-based selection the user knows enough of
the target schema so as to be able to formulate a
search path to be matched against the structure
of the target XML documents
root_node/ /parent_node/target_node

38
Flexibility in XPath

Some flexibility can be already achieved in XPath
(by means of wildcards)

39
Vague predicates

Traditional query languages (in the database
context) allow for data selection based on binary
predicates (crisp).
Relevance w.r.t. a query is therefore modeled as
a binary concept either an information item is
relevant or not

On the other hand, a vague predicate, represented
by a fuzzy subset, expresses a soft condition,
whose evaluation produces a numeric value in
0,1, with the consequence that the results can
be ranked
The membership functions are defined in
accordance with the semantics of the linguistic
labels employed for the vague predicates
(expensive, recent, )

40
Flexible selection

Since atomic information items are clustered in a
hierarchical structure, we can state that the
nearer two items are the more likely they are
semantically related.

XPath provides two crisp constructs that can
express topology constraints, as follows
/articles//article. The axis matches any tag
in a specified position, disregarding its name
(the position is known, the name is unknown)
/articles//article. The // axis matches any path
that descends the containment hierarchy, and all
the article elements are matched, independently
of their distance from the articles element
they are contained in (the name is known, the
position is unknown).

New proposal a new XPath axis NEAR can be
defined /articles/NEAR/article The
result set will be ranked w.r.t. the increasing
number of steps to be descended.
41
The PENG project
.

The PENG project was a STREP project aimed at
defining a flexible, personalised system for the
gathering, filtering, retrieval and presentation
of multimedia news for news professionals (e.g.
journalists and editors), with a view of making
the system also available for general users.

42
Characteristics of the Clustering algorithm

Categories of news are not known a-priori
Unsupervised clustering
Clusters content summarization
Need to deal with categorization ambiguity
Probabilistic or fuzzy clustering
Need to identify Categories of news with
distinct granularity organized in general topics
and specific-topics
Hierarchical fuzzy clustering

43
Characteristics of the Clustering Algorithm