An Overview of Bayesian Networkbased Retrieval Models - PowerPoint PPT Presentation

About This Presentation
Title:

An Overview of Bayesian Networkbased Retrieval Models

Description:

Department of Computing Science, University of Glasgow. October, 21th - 2002 ... Qualitative part: Directed Acyclic Graph. G=(V,E): V (Nodes) Random variables, and ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 50
Provided by: sdin3
Category:

less

Transcript and Presenter's Notes

Title: An Overview of Bayesian Networkbased Retrieval Models


1
An Overview of Bayesian Network-basedRetrieval
Models
  • Juan Manuel Fernández Luna
  • Departamento de Informática
  • Universidad de Jaén
  • jmfluna_at_ujaen.es

Department of Computing Science, University of
Glasgow October, 21th - 2002
2
Layout
  • Introduction
  • Introduction to Belief Networks
  • Bayesian Network-based IR Models
  • Inference Network Model
  • Belief Network Model
  • Bayesian Network Retrieval Model
  • Relevance Feedback
  • Other applications
  • Bibliography

3
Introduction
Information Retrieval ? Uncertain process
  • Query and document characterizations are
    incomplete.
  • The query is a vague description of the users
    information need.
  • Computing relevance degree 1 and 2
  • A) different representations that a concept may
    have, B) these concepts are not independent
    among them.

4
Introduction
Probabilistic models tried to overcome these
problems
Researchers focused their attention on Belief
networks in order to apply them to IR because
They show a high performance in actual problems
characterised by uncertainty.
5
Introduction to Belief Networks
  • Graphical models able to represent and
    efficiently manipulate n-dimensional probability
    distributions.
  • The knowledge obtained from a problem is encoded
    in a Belief network by means of the quantitative
    and qualitative componets

6
Introduction to Belief Networks
  • Qualitative part Directed Acyclic Graph.
  • G(V,E)
  • V (Nodes) ? Random variables, and
  • E (Arcs) ? (In)dependence relationships.

7
Introduction to Belief Networks
  • Quantitative part A set of conditional
    distributions
  • Drawn from the graph structure,
  • representing the strength of the relationships,
  • stored in each node.

Belief Network ? Bayesian Network (Conditional
probability distributions)
8
Introduction to Belief Networks
9
Introduction to Belief Networks
Taking into account these (in)dependences, the
joint probability distribution could be restored
from the network
Pa(Xi) being the set of parents of the variable
Xi. This previous expression implies an important
saving in the storage space.
10
Introduction to Belief Networks
  • Construction
  • Manual, using an experts knowledge.
  • Automatic, by means of a learning algorithm.
  • Inference
  • Given a set of evidences, E, to obtain the
    probability with which a variable can take a
    certain value.
  • p(ST WT)0.430, p(RT WT) 0.708

11
Bayesian Network-based IR Models
  • Inference Network Model
  • Belief Network Model
  • Peter Bruzas Index Belief Expressions
  • Maria Indrawan et al.s Model
  • Bayesian Network Retrieval Model

12
Inference Network Model
inn
Link Matrices
Inference Instantiating each document, dj, and
computing p(inn dj).
13
Belief Network Model
Q
2M assigments ? unfeasible Probabilities are
defined in such a way that only one configuration
is evaluated
14
Bayesian Network Retrieval Model
  • Guidelines to build the BNR Model
  • There are strong relationships among a document
    and the terms that index it.
  • Document relationships are only present by means
    of the terms that index them.
  • Documents are conditional independent given the
    terms by which they were indexed.

15
Bayesian Network Retrieval Model
Ti ?ti, ti
Dj ?dj, dj
16
Bayesian Network Retrieval Model
  • All the terms are independent among them
  • Simple Bayesian Network Retrieval Model

17
Bayesian Network Retrieval Model
  • Probability Distributions
  • Term nodes p(tj)1/M, p(tj)1-p(tj)
  • Document nodes p(Dj Pa(Dj)), ?Dj

But... If a document has been indexed by 30
terms, we need to estimate and store 230
probabilities.
Problem!!!!
18
Bayesian Network Retrieval Model
  • Solution

Probability functions
pa(Dj) being a configuration of the parents of
Dj.
19
Bayesian Network Retrieval Model
  • Retrieval
  • Instantiate TQ ?Q to Relevant.
  • Run a propagation algorithm in the network.
  • Rank the documents according p(dj Q), ?Dj

Problem
Great amount of nodes and existing cycles in the
graph
?
General purpose propagation algorithms cant be
applied due to efficiency considerations.
20
Bayesian Network Retrieval Model
  • Solution
  • Taking advantage of
  • The kind of probability function used, and
  • The topology.
  • Propagation is substituted by

Evaluation of the probability function in each
document node
21
Bayesian Network Retrieval Model
  • Result An efficient and exact propagation.

Including Query term frequencies
22
Bayesian Network Retrieval Model
  • Removing the term independency restricction
  • We are interested in representing the main
    relationships among terms in the collection.

Term subnetwork ? Polytree
Why? There is a set of efficient learning and
propagation algorithms available for this
topology.
23
Bayesian Network Retrieval Model
24
Bayesian Network Retrieval Model
  • Probability distributions
  • Marginal Distributions (root term nodes)

(M being the number of terms in the collection)
25
Bayesian Network Retrieval Model
Conditional Distributions (term nodes with
parents) (based on Jaccards coefficient)
  • Conditional Distributions (document nodes)
  • Probability functions

26
Bayesian Network Retrieval Model
Retrieval Tq?Q ? Relevant p(djQ)??
But... Due to the complexity of the whole network
we can not run an exact propagation algorithm.
Solution PROPAGATION EVALUATION
27
Bayesian Network Retrieval Model
  • Propagation
  • Running the exact Pearls propagation
    algorithm in the polytree (term subnetwork),
    p(tiQ), ?Ti, are computed.
  • Evaluation
  • Evaluation of a probability function in the
    Document Subnetwork, computing p(djQ), ?Dj,
    incorporating p(tiQ).

28
Bayesian Network Retrieval Model
Adding document relationships
  • Given a document, Dj
  • Compute p(djdi), ?Di.
  • Select those documents with greatest probability
    of relevance with respect to Dj.
  • Link Dj with all these documents.

29
Bayesian Network Retrieval Model
  • But... Instead of linking the documents in the
    document subnetwork...

30
Bayesian Network Retrieval Model
Advantages of this topology
  • We dont have to restimate probability
    distributions in the document nodes.
  • Propagation Evaluation of a probability function
    in the second document layer ? Efficiency.

31
Bayesian Network Retrieval Model
Retrieval?
  • Compute p(djQ), ?Dj
  • (1st document layer)
  • Compute p(djQ), ?Dj
  • (2nd document layer)

32
Bayesian Network Retrieval Model
  • Reducing the propagation time in the Term
    Subnetwork
  • Representing only the best relationships among
    terms.
  • Modifying Pearls propagation algorithm.
  • Changing the Term subnetwork topology.

33
Bayesian Network Retrieval Model
  • 1. Representing only the best term relationships
  • Problems
  • Automatically learning the relationships among
    terms could imply that some relationships are not
    strong enough.
  • ?
  • Retrieval effectiveness could be damaged
  • If the number of terms is very high, the learning
    stage could be time-consuming.

34
Bayesian Network Retrieval Model
Solution
Selection of best terms
Collection
35
Bayesian Network Retrieval Model
  • Advantages
  • Reduction of learning time
  • Representation of the best relationships among
    terms
  • Faster propagation.

36
Bayesian Network Retrieval Model
  • Classification algorithm
  • K-means, with Euclidean distance
  • Objects
  • Terms
  • Attributes
  • Term discrimination value (tdv)
  • Inverse Document Frequency (idf)
  • Classes
  • Good terms higher tdv, and medium-high idf.
  • Rest of the terms.

37
Bayesian Network Retrieval Model
  • 2. Modifying Pearls algorithm.
  • In large polytrees, the belief of a great number
    of terms, those furthest from query terms, will
    not be updated after propagating.
  • So...Why is the propagation
  • algorithm still running?

38
Bayesian Network Retrieval Model
Radial Propagation
r2
39
Bayesian Network Retrieval Model
Linear Propagation
40
Bayesian Network Retrieval Model
  • 3. Changing the Term Subnetwork topology.
  • In certain cases, the polytree topology of the
    Term subnetwork, even using the term selection
    approach, could not be very appropriate.

An alternative topology
Two term layers
  • Preserving accuracy of term relationships
    represented in the graph.
  • Providing an efficient inference mechanism.

41
Bayesian Network Retrieval Model
42
Bayesian Network Retrieval Model
  • Relationships ara captured using the coocurrences
    among terms.
  • The probability of relevance in the second term
    layer is computed by means of

43
Relevance Feedback in B.N. Models
  • Inference and Belief Network Models
  • Modifying link matrices and adding new links (and
    also new document nodes in the second).
  • Bayesian Network Model
  • Inclusion of new evidences from the inspection of
    the document ranking using partial evidences.
  • (Advantage neither graph structure modification
    nor probability matrix re-estimation).

44
Other applications
  • Indexing
  • Hypertext
  • User profiling
  • WWW
  • Structured documents
  • Image retrieval
  • Document classification
  • Filtering

45
Bibliography
  • Bruza, P. van de Gaag, L.C. (1996). Index
    Expression Belief Network for Information
    Disclosure. International Journal of Expert
    Systems. 7(2), 107-138.
  • de Campos, L.M. Fernández-Luna, J.M. Huete,
    J.F. (2000). Building Bayesian network-based
    information retrieval systems. Proc. of the 2nd
    LUMIS Workshop. 543-550.
  • de Campos, L.M. Fernández-Luna, J.M. Huete,
    J.F. (2001). Relevance Feedback in the Bayesian
    Network Retrieval Model An Approach Based on
    Term Instantiation. Lecture Notes in Computer
    Science. 2189. 13 23.
  • de Campos, L.M. Fernández-Luna, J.M. Huete,
    J.F. (2001). Document Instantiation for relevance
    feedback in the Bayesian Network Retrieval model.
    Proceedings of the SIGIR01 Workshop on
    Mathematical and Formal Models in Information
    Retrieval. 10-18
  • de Campos, L.M. Fernández-Luna, J.M. Huete,
    J.F. (2002). A layered Bayesian Network Model for
    Document Retrieval. Proceedings of the ECIR2002
    Colloquium. Lecture notes in Computer Science,
    2291, 169 182.

46
Bibliography
  • Luis M. de Campos, Juan M. Fernández-Luna, Juan
    F. Huete. Reducing term to term relationships in
    an extended Bayesian network retrieval model.
    Proceedings of the Ninth International IPMU
    Conference (Information Processing and Mangement
    of Uncertainty in Knowledge-based Systems)
    Conference, Vol. 2, 1195-1202 (ISBN Vol. 2
    2-9516453-2-5), 2002. ESIA Université de Savoie
    (Editor).
  • Luis M. de Campos, Juan M. Fernández-Luna, Juan
    F. Huete. Two terms layer An alternative
    topology for representing term relationships in
    the Bayesian Network Retrieval Model. Electronic
    Proceeding of the Seventh Online World Conference
    on Soft Computing in Industrial Applications
    (wsc7.ugr.es).
  • Luis M. de Campos, Juan M. Fernández-Luna, Juan
    F. Huete. Reducing Propagation Effort in Large
    Polytree An application to Information
    Retrieval. To appear in Proceedings of the
    Workshop on Probabilistic and Graphical Models.
    Cuenca (SPAIN), 2002.
  • Crestani, F., Lalmas, M., van Rijsbergen, C.J.,
    Campbell, L. (1998). Is this Document Relevant?
    Probably A Survey of Probabilistic Models in
    Information Retrieval. Computing Survey. 30(4).
    528-552.

47
Bibliography
  • Fernández-Luna, J.M. (2001). Modelos de
    Recuperación de Información basados en Redes de
    Creencia. Ph.D. Thesis (in Spanish). University
    of Granada.
  • Frisse M. Cousins, S.B. (1989). Information
    Retrieval from Hypertext Update on the Dynamic
    Medical Handbook Project. Proceedings of the
    Hypertext89 Conference. 199-212.
  • Ghazfan , D., Indrawan, M. Srinivasan, B.
    (1996). Towards meaningful Bayesian networks.
    IPMU96 Conference. 841-846.
  • Haines, D. Croft W.B. (1983). Relevance
    Feedback and Inference Networks. 20th ACM-SIGIR
    Conference. 119-128.
  • Pearl, J. (1988). Probabilistic Reasoning in
    Intelligent Systems Networks of Plausible
    Inference. Morgan and Kaufmann. San Mateo,
    California.
  • Reis, I. (2000). Bayesian Networks for
    Information Retrieval. Ph.D. Thesis. Universidad
    Federal de Minas Gerais.
  • van Rijsbergen, C.J. (1971). Information
    Retrieval. 2nd Edition. Butter Worths.
  • van Rijsbergen, C.J., Harper, D.J., Porter,
    M.F. (1981). The selection of good search terms.
    Information Processing Management. 17, 77-91.

48
Bibliography
  • Sahami, M. (1998). Using Machine Learning to
    Improve Information Access. Ph.D. Thesis.
    Stanford University.
  • Savoy, J. Desbois, D. (1991). Information
    Retrieval in Hypertext Systems An Approach using
    Bayesian Networks. Electronic Publishing. 42(2),
    87-108.
  • Turtle, H.R., Croft, W.B. (1991). Evaluation of
    an Inference Network-based Retrieval Model.
    Information Systems. 9(3), 189-224.
  • Turtle, H.R., Croft, W.B. (1997). Uncertainty
    in Information Systems. In Uncertainty Management
    in Information System From needs to solutions.
    Kluver Academic. 189-224.
  • Tzeras K. Hartman, S. (1993). Automatic
    Indexing Based on Bayesian Inference Netoworks.
    16th ACM-SIGIR Conference. 22-35.

49
The end...
  • Thank you very much
Write a Comment
User Comments (0)
About PowerShow.com