Human Information Access with Fuzzy Searching - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Human Information Access with Fuzzy Searching

Description:

Fuzzy logic has shown to act like approximate reasoning in control systems ... Category: Arts Television Theme Songs. tv.cream.org/ - 1k - Cached - Similar pages ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 34
Provided by: chrisd53
Category:

less

Transcript and Presenter's Notes

Title: Human Information Access with Fuzzy Searching


1
Human Information Access with Fuzzy Searching
Chris Demwell Communication Networks
Laboratory School of Engineering Science Simon
Fraser University
2
Introduction
  • Language is a process of negotiated meaning
  • Ambiguity can lead to search engine error
  • Fuzzy logic has shown to act like approximate
    reasoning in control systems
  • Why not apply fuzzy logic to searching?
  • The Jasmine fuzzy searching framework could be
    used to explore searching with fuzzy logic
  • Based on RUBIN98 FuzzyBase an
    information-intelligent retrieval system

3
Road Map
  • Searching as Human Information Access
  • Introduction to Fuzzy Logic
  • Fuzzy Logic and Searching
  • Existing Search Engine Implementations
  • Recent Research
  • The Jasmine Search Framework
  • References and Questions

4
Searching as Human Information Access (1)
  • Find document similar to known key
  • First, Characterize key and documents
  • Stop words removed The, a, and,
  • Count word frequency in document
  • Complex data mining / Computational Linguistics
    techniques
  • Second, compare the keys characterization
    against the documents characterizations

5
Searching as Human Information Access (2)
  • We want high precision - all documents returned
    should be relevant
  • We also want high recall (a.k.a. coverage) - we
    should find all relevant documents
  • Problem 1 Searcher is not looking for exact
    match
  • Problem 2 Searcher does not know exactly what is
    wanted
  • Problem 3 Search key is very small and ambiguous

6
Road Map
  • Searching as Human Information Access
  • Introduction to Fuzzy Logic
  • Fuzzy Logic and Searching
  • Existing Search Engine Implementations
  • Recent Research
  • The Jasmine Search Framework
  • References and Questions

7
Introduction to Fuzzy Logic (1)
  • Fuzzy set membership is not true or false
  • Sets characterized by fuzzy membership functions
  • Fuzzy sets better represent human reasoning

8
Introduction to Fuzzy Logic (2)
  • Fuzzy logic operations typically have three
    phases
  • Fuzzification
  • Fuzzy set processing
  • Defuzzification
  • This maps well onto characterization, matching,
    and retrieval

9
Fuzzy Logic and Searching (1)
  • Fuzzy Logic can apply to Searching in each phase
  • Query Subtask Submit a key with weighted parts
  • Characterization subtask Each document described
    by membership in fuzzy sets
  • Pattern-matching subtask Fuzzily match the key
    to the documents

Query cast(.8) away(.1) tires(.9)_
10
Fuzzy Logic and Searching (1)
  • Fuzzy Logic can apply to Searching in each phase
  • Query Subtask Submit a key with weighted parts
  • Characterization subtask Each document described
    by membership in fuzzy sets
  • Pattern-matching subtask Fuzzily match the key
    to the documents

Document 1 1 .2 .5 .3 .8 .1 Document 2 .2
1 .6 .4 .7 .2
11
Fuzzy Logic and Searching (1)
  • Fuzzy Logic can apply to Searching in each phase
  • Query Subtask Submit a key with weighted parts
  • Characterization subtask Each document described
    by membership in fuzzy sets
  • Pattern-matching subtask Fuzzily match the key
    to the documents

µ1(key) 0.3 µ1 defines like document
1 µ2(key) 0.8 µ2 defines like document 2
µ is the symbol commonly used to represent a
fuzzy membership function
12
Fuzzy Logic and Searching (1)
  • Fuzzy Logic can apply to Searching in each phase
  • Query Subtask Submit a key with weighted parts
  • Characterization subtask Each document described
    by membership in fuzzy sets
  • Pattern-matching subtask Fuzzily match the key
    to the documents

13
Fuzzy Logic and Searching (2)
  • Only making the characterization and pattern
    matching subtasks considered for building
    Jasmine
  • Forcing user involvement when it is not required
    must be avoided
  • The software should do as much interpretation as
    possible
  • Search through fuzzy data characterizations
  • Fuzzily match keys to documents
  • Possibly both

14
Road Map
  • Searching as Human Information Access
  • Introduction to Fuzzy Logic
  • Fuzzy Logic and Searching
  • Existing Search Engine Implementations
  • Recent Research
  • The Jasmine Search Framework
  • References and Questions

15
Existing Search Engine Implementations (1)
  • Four of the main search methods used for
    searching the World Wide Web
  • Term-based search engines
  • Popularity-based search engines
  • Semantics-based search engines
  • Clustering-based search engines
  • Note this ignores many subtleties beyond scope

16
Existing Search Engine Implementations (2)
  • Term-based Engines use term existence to find
    similar documents
  • Documents are considered similar if a word from
    the key is more present in the document
  • More complex methods, e.g. cosine distance
    measure, boolean logic
  • Ambiguity simply ignored
  • Results dependent on query construction

Excerpt from Altavista.com search results
17
Existing Search Engine Implementations (3)
  • Popularity-based searching consider how popular a
    document is
  • More commonly searched-for documents more likely
    to be appropriate
  • Implicit endorsement through hyperlinks
  • Still does not address ambiguity - finds
    authoritative sites, but on wrong topic!

www.cream.co.ukYou really need to get a browser
with javascript (orturn it on if you already
have) Skip Intro. Description Liverpool night
club. Category Regional gt Europe gt ... gt Arts
and Entertainment gt Music gt Clubs
andwww.cream.co.uk/ - 4k - Cached - Similar
pages Ben Jerry's Ice CreamUnited States
United Kingdom The NetherlandsFrance
Japan Company Info Page Blank ...
Description Vermont's Finest All Natural Ice
Cream, Frozen Yogurt and Sorbet. Overnight
delivery. Category Shopping gt Food gt
Confectionery gt Frozenwww.benjerry.com/ - 3k -
Cached - Similar pages TV CreamCategory Arts gt
Television gt Theme Songstv.cream.org/ - 1k -
Cached - Similar pages
Excerpt from google.coms search results
18
Existing Search Engine Implementations (4)
  • Semantic engines attempt to find the meaning of
    the query
  • Often means matching query words against an
    ontology to find context
  • When ambiguity found, engine asks user to clarify
  • Intuitively a better document model
  • Difficult to automate ontology generation
  • Does not solve searching problem - just
    disambiguates!

Excerpt from Simpli.coms search results. Note
Simpli.com no longer appears to provide this
service (as of 08/30/2001)
19
Existing Search Engine Implementations (5)
  • Clustering-based engines cluster results
    statistically
  • Leverage existing data mining techniques
  • Hope that statistical groups match semantic
    groups
  • Helps user ignore irrelevancies instead
    discarding automatically
  • No ontology needed
  • Must still do full search!
  • Clusters may be useless

Excerpt from vivisimo.coms search results
20
Road Map
  • Searching as Human Information Access
  • Introduction to Fuzzy Logic
  • Fuzzy Logic and Searching
  • Existing Search Engine Implementations
  • Recent Research
  • The Jasmine Search Framework
  • References and Questions

21
Recent Research - Key Phrases and Meaning
  • It is difficult to process plain text - must
    extract keyphrases
  • FRANK99 describes automatic keyphrase
    extraction technique
  • Split document into phrases use Bayesian methods
    to classify phrases as key or not
  • Accuracy increases with domain knowledge
  • Saw that we could merge with an ontology to infer
    meaning of keyphrases
  • Now, we can match against key concepts in the
    document

22
Recent Research - Ontology Construction
  • To construct an ontology, we must
  • disambiguate any ambiguous words
  • find their place within the tree
  • Unreasonable to do by hand
  • KARKALETSIS99
  • decision tree containing about 1000 nodes
  • precision about 90, recall of 60 in
    disambiguation task
  • 60 recall a training artifact? Study used many
    negative examples
  • Iteratively comparing word usage could allow this
    to help build an entire ontology KROHN01

23
Recent Research - Flat and Hyper Texts
  • Many text databases are not hypertexts, nor do
    they contain much metadata
  • Hypertexts are useful for determining authority
    CHAKRABARTI98 CHAKRABARTI99
  • KIM99 and FRANK99 demonstrate a method to
    automatically construct a hypertext using
    existing thesauri
  • Once hypertext is constructed, can also be used
    to find mostly distinct communities of documents

24
Recent Research - Agents and Clustering
  • ALLOWAY97 describes a successful project to
    build a set of ontologies for the University of
    Michigan Digital Library
  • Used a system of distributed intelligent agents
  • A technique described in VELING98
    automatically disambiguates queries based on
    statistical clustering
  • Words considered similar (linked) if they often
    appear close together in the database
  • Finds some non-intuitive clusters
  • Fast! 10,000 documents /min on a 200 MHz x86

25
Recent Research - Fuzzy Queries
  • There is a dearth of work regarding the
    application of fuzzy logic to the searching task!
  • Wolski and Bouaziz proposed in BOUAZIZ98 a
    method to replace crisp database triggers with
    fuzzy ones
  • Mostly beyond scope
  • Bulk of the work seems similar

26
Road Map
  • Searching as Human Information Access
  • Introduction to Fuzzy Logic
  • Fuzzy Logic and Searching
  • Existing Search Engine Implementations
  • Recent Research
  • The Jasmine Search Framework
  • References and Questions

27
The Jasmine Search Framework (1)
  • Jasmine designed to accommodate research into
    the key components of a fuzzy logic based search
    engine
  • Assumptions Fuzzy logic used for fuzzy
    characterizations, fuzzy pattern matching, or
    both
  • Modular design permits specification of the
    Jasmine framework without specifying fuzzy
    components
  • Fuzzy components may be swapped out to
    comparatively test algorithms without changing
    engines

28
The Jasmine Search Framework (2)
29
The Jasmine Search Framework (3)
30
The Jasmine Search Framework (4)
  • Future extensions
  • Key phrase extraction and ontology creation
  • Hypertext induction on plain text databases
  • Hypertext clustering for authority and community
  • Distributed, intelligent architecture
  • Complex Metadata
  • MPEG 7 WEB3
  • Dublin Core WEB4
  • Use model data mining
  • COOLEY00, GREENBERG97 Mining hypertext
    usage patterns can yield useful information about
    relevance
  • HOFMANN99 Aggregated user feedback is
    powerful

31
References (1)
RUBIN98 S. Rubin, M. H. Smith, and Lj.
Trajkovic, FuzzyBase an information
intelligent retrieval system,'' Proc. 1998 IEEE
Int. Conf. on Systems, Man, and Cybernetics, San
Diego, CA, Oct. 1998, TA11, pp.
2797-2802. FRANK99 E. Frank, G. Paynter, I.
Witten, C. Gutwin, and C. Nevill-Manning.
Domain-Specific Keyphrase Extraction. In Proc.
16th Joint Int. Conf. on Artificial Intelligence
(IJCAI'99), PP 668-673, Stockholm, Sweeden,
1999. KARKALETSIS99 Vangelis Karkaletsis,
Georgios Paliouras, and Constantine D.
Spyropoulos. Learning Rules for Large Vocabulary
Word Sense Disambiguation. 16th Joint Int. Conf.
on Artificial Intelligence (IJCAI'99), PP
674-679, Stockholm, Sweeden, 1999. KROHN01 Fred
Krohn, conversation at ASI exchange,
2000 CHARKRABARTI98 S. Chakrabarti, B.E. Dom,
and P. Indyk. Enhanced hypertext classification
using hyper-links. In Proc. 1998 ACM-SIGMOD Int.
Conf. Management of Data (SIGMOD'98), pages
307-318, Seattle, Washington, June
1998. CHAKRABARTI99 S. Chakrabarti, B. E.
Dom, S. R. Kumar, P. Raghavan, S. Rajahopalan, A.
Tomkins, D. Gibson, and J. M. Kleinberg. Mining
the web's link structure. COMPUTER, 3260-67,
1999.
32
References (2)
KIM99 Munseok Kim, Sejin Nam, and Dongwook
Shin. Hypertext Construction using statistical
and semantic similarity. 16th Joint Int. Conf.
on Artificial Intelligence (IJCAI'99), PP
57-63, Stockholm, Sweeden, 1999. ALLOWAY99 Ge
ne Alloway and Peter Weinstein. Seed Ontologies
growing digital libraries as distributed, intelli
gent systems. Proceedings of the second ACM
International Conference on Digital Libraries,
pp. 83-91, Philadelphia, USA, 1999. VELING98 A
nne Veling and Peter van der Weerd. Conceptual
grouping in word co-occurrence networks. 16th
Joint Int. Conf. on Artificial Intelligence
(IJCAI'99), PP 694-699, Stockholm, Sweeden,
1999. BOUAZIZ98 Tarik Bouaziz and Anton
Wolski. Fuzzy Triggers Incorporating Imprecise
Reasoning into Active Databases. Proc. IEEE 14th
International Conference on Data Engineering.
1998. WEB3 http//www.darmstadt.gmd.de/mobile/M
PEG7/, The MPEG 7 web page. MPEG 7 is a proposed
standard for metadata description of multimedia
information of varying kinds.
33
References (3)
WEB4 http//dublincore.org/documents/,
recommendations of the Dublin Core
Metadata Initiative, an open forum concerned
with "development of interoperable online
metadata standards that support a broad range of
purposes and business models". COOLEY00 R.
Cooley, M. Deshpande, J. Srivastava, and P. N.
Tan. Web usage mining Discovery and aplications
of usage patterns from web data. SIGKDD
Explorations, 112-23, 2000. GREENBERG97 L.
Tauscher and S. Greenberg. How people revisit web
pages Empirical findings and implications for
the design of history systems. International
Journal of Human Computer Studies, Special issue
on World Wide Web Usability, 4797-138,
1997. HOFMANN99 Thomas Hofmann and Jan
Puzicha. Latent Class Models for Collaborative
Filtering. 16th Joint Int. Conf. on Artificial
Intelligence (IJCAI'99), PP 688-693, Stockholm,
Sweeden, 1999.
Write a Comment
User Comments (0)
About PowerShow.com