Human Information Access with Fuzzy Searching

About This Presentation

Title:

Human Information Access with Fuzzy Searching

Description:

Fuzzy logic has shown to act like approximate reasoning in control systems ... Category: Arts Television Theme Songs. tv.cream.org/ - 1k - Cached - Similar pages ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 34

Provided by: chrisd53

Category:

more less

Transcript and Presenter's Notes

Title: Human Information Access with Fuzzy Searching

1
Human Information Access with Fuzzy Searching
Chris Demwell Communication Networks
Laboratory School of Engineering Science Simon
Fraser University
2
Introduction

Language is a process of negotiated meaning
Ambiguity can lead to search engine error
Fuzzy logic has shown to act like approximate
reasoning in control systems
Why not apply fuzzy logic to searching?
The Jasmine fuzzy searching framework could be
used to explore searching with fuzzy logic
Based on RUBIN98 FuzzyBase an
information-intelligent retrieval system

3
Road Map

Searching as Human Information Access
Introduction to Fuzzy Logic
Fuzzy Logic and Searching
Existing Search Engine Implementations
Recent Research
The Jasmine Search Framework
References and Questions

4
Searching as Human Information Access (1)

Find document similar to known key
First, Characterize key and documents
Stop words removed The, a, and,
Count word frequency in document
Complex data mining / Computational Linguistics
techniques
Second, compare the keys characterization
against the documents characterizations

5
Searching as Human Information Access (2)

We want high precision - all documents returned
should be relevant
We also want high recall (a.k.a. coverage) - we
should find all relevant documents
Problem 1 Searcher is not looking for exact
match
Problem 2 Searcher does not know exactly what is
wanted
Problem 3 Search key is very small and ambiguous

6
Road Map

Searching as Human Information Access
Introduction to Fuzzy Logic
Fuzzy Logic and Searching
Existing Search Engine Implementations
Recent Research
The Jasmine Search Framework
References and Questions

7
Introduction to Fuzzy Logic (1)

Fuzzy set membership is not true or false
Sets characterized by fuzzy membership functions
Fuzzy sets better represent human reasoning

8
Introduction to Fuzzy Logic (2)

Fuzzy logic operations typically have three
phases
Fuzzification
Fuzzy set processing
Defuzzification
This maps well onto characterization, matching,
and retrieval

9
Fuzzy Logic and Searching (1)

Fuzzy Logic can apply to Searching in each phase
Query Subtask Submit a key with weighted parts
Characterization subtask Each document described
by membership in fuzzy sets
Pattern-matching subtask Fuzzily match the key
to the documents

Query cast(.8) away(.1) tires(.9)_
10
Fuzzy Logic and Searching (1)

Fuzzy Logic can apply to Searching in each phase
Query Subtask Submit a key with weighted parts
Characterization subtask Each document described
by membership in fuzzy sets
Pattern-matching subtask Fuzzily match the key
to the documents

Document 1 1 .2 .5 .3 .8 .1 Document 2 .2
1 .6 .4 .7 .2
11
Fuzzy Logic and Searching (1)

Fuzzy Logic can apply to Searching in each phase
Query Subtask Submit a key with weighted parts
Characterization subtask Each document described
by membership in fuzzy sets
Pattern-matching subtask Fuzzily match the key
to the documents

µ1(key) 0.3 µ1 defines like document
1 µ2(key) 0.8 µ2 defines like document 2
µ is the symbol commonly used to represent a
fuzzy membership function
12
Fuzzy Logic and Searching (1)

Fuzzy Logic can apply to Searching in each phase
Query Subtask Submit a key with weighted parts
Characterization subtask Each document described
by membership in fuzzy sets
Pattern-matching subtask Fuzzily match the key
to the documents

13
Fuzzy Logic and Searching (2)

Only making the characterization and pattern
matching subtasks considered for building
Jasmine
Forcing user involvement when it is not required
must be avoided
The software should do as much interpretation as
possible
Search through fuzzy data characterizations
Fuzzily match keys to documents
Possibly both

14
Road Map

Searching as Human Information Access
Introduction to Fuzzy Logic
Fuzzy Logic and Searching
Existing Search Engine Implementations
Recent Research
The Jasmine Search Framework
References and Questions

15
Existing Search Engine Implementations (1)

Four of the main search methods used for
searching the World Wide Web
Term-based search engines
Popularity-based search engines
Semantics-based search engines
Clustering-based search engines
Note this ignores many subtleties beyond scope

16
Existing Search Engine Implementations (2)

Term-based Engines use term existence to find
similar documents
Documents are considered similar if a word from
the key is more present in the document
More complex methods, e.g. cosine distance
measure, boolean logic
Ambiguity simply ignored
Results dependent on query construction

Excerpt from Altavista.com search results
17
Existing Search Engine Implementations (3)

Popularity-based searching consider how popular a
document is
More commonly searched-for documents more likely
to be appropriate
Implicit endorsement through hyperlinks
Still does not address ambiguity - finds
authoritative sites, but on wrong topic!

www.cream.co.ukYou really need to get a browser
with javascript (orturn it on if you already
have) Skip Intro. Description Liverpool night
club. Category Regional gt Europe gt ... gt Arts
and Entertainment gt Music gt Clubs
andwww.cream.co.uk/ - 4k - Cached - Similar
pages Ben Jerry's Ice CreamUnited States
United Kingdom The NetherlandsFrance
Japan Company Info Page Blank ...
Description Vermont's Finest All Natural Ice
Cream, Frozen Yogurt and Sorbet. Overnight
delivery. Category Shopping gt Food gt
Confectionery gt Frozenwww.benjerry.com/ - 3k -
Cached - Similar pages TV CreamCategory Arts gt
Television gt Theme Songstv.cream.org/ - 1k -
Cached - Similar pages
Excerpt from google.coms search results
18
Existing Search Engine Implementations (4)

Semantic engines attempt to find the meaning of
the query
Often means matching query words against an
ontology to find context
When ambiguity found, engine asks user to clarify
Intuitively a better document model
Difficult to automate ontology generation
Does not solve searching problem - just
disambiguates!

Excerpt from Simpli.coms search results. Note
Simpli.com no longer appears to provide this
service (as of 08/30/2001)
19
Existing Search Engine Implementations (5)

Clustering-based engines cluster results
statistically
Leverage existing data mining techniques
Hope that statistical groups match semantic
groups
Helps user ignore irrelevancies instead
discarding automatically
No ontology needed
Must still do full search!
Clusters may be useless

Excerpt from vivisimo.coms search results
20
Road Map

Searching as Human Information Access
Introduction to Fuzzy Logic
Fuzzy Logic and Searching
Existing Search Engine Implementations
Recent Research
The Jasmine Search Framework
References and Questions

21
Recent Research - Key Phrases and Meaning

It is difficult to process plain text - must
extract keyphrases
FRANK99 describes automatic keyphrase
extraction technique
Split document into phrases use Bayesian methods
to classify phrases as key or not
Accuracy increases with domain knowledge
Saw that we could merge with an ontology to infer
meaning of keyphrases
Now, we can match against key concepts in the
document

22
Recent Research - Ontology Construction

To construct an ontology, we must
disambiguate any ambiguous words
find their place within the tree
Unreasonable to do by hand
KARKALETSIS99
decision tree containing about 1000 nodes
precision about 90, recall of 60 in
disambiguation task
60 recall a training artifact? Study used many
negative examples
Iteratively comparing word usage could allow this
to help build an entire ontology KROHN01

23
Recent Research - Flat and Hyper Texts

Many text databases are not hypertexts, nor do
they contain much metadata
Hypertexts are useful for determining authority
CHAKRABARTI98 CHAKRABARTI99
KIM99 and FRANK99 demonstrate a method to
automatically construct a hypertext using
existing thesauri
Once hypertext is constructed, can also be used
to find mostly distinct communities of documents

24
Recent Research - Agents and Clustering

ALLOWAY97 describes a successful project to
build a set of ontologies for the University of
Michigan Digital Library
Used a system of distributed intelligent agents
A technique described in VELING98
automatically disambiguates queries based on
statistical clustering
Words considered similar (linked) if they often
appear close together in the database
Finds some non-intuitive clusters
Fast! 10,000 documents /min on a 200 MHz x86

25
Recent Research - Fuzzy Queries

There is a dearth of work regarding the
application of fuzzy logic to the searching task!
Wolski and Bouaziz proposed in BOUAZIZ98 a
method to replace crisp database triggers with
fuzzy ones
Mostly beyond scope
Bulk of the work seems similar

26
Road Map

Searching as Human Information Access
Introduction to Fuzzy Logic
Fuzzy Logic and Searching
Existing Search Engine Implementations
Recent Research
The Jasmine Search Framework
References and Questions

27
The Jasmine Search Framework (1)

Jasmine designed to accommodate research into
the key components of a fuzzy logic based search
engine
Assumptions Fuzzy logic used for fuzzy
characterizations, fuzzy pattern matching, or
both
Modular design permits specification of the
Jasmine framework without specifying fuzzy
components
Fuzzy components may be swapped out to
comparatively test algorithms without changing
engines

28
The Jasmine Search Framework (2)
29
The Jasmine Search Framework (3)
30
The Jasmine Search Framework (4)

Future extensions
Key phrase extraction and ontology creation
Hypertext induction on plain text databases
Hypertext clustering for authority and community
Distributed, intelligent architecture
Complex Metadata
MPEG 7 WEB3
Dublin Core WEB4
Use model data mining
COOLEY00, GREENBERG97 Mining hypertext
usage patterns can yield useful information about
relevance
HOFMANN99 Aggregated user feedback is
powerful

31
References (1)
RUBIN98 S. Rubin, M. H. Smith, and Lj.
Trajkovic, FuzzyBase an information
intelligent retrieval system,'' Proc. 1998 IEEE
Int. Conf. on Systems, Man, and Cybernetics, San
Diego, CA, Oct. 1998, TA11, pp.
2797-2802. FRANK99 E. Frank, G. Paynter, I.
Witten, C. Gutwin, and C. Nevill-Manning.
Domain-Specific Keyphrase Extraction. In Proc.
16th Joint Int. Conf. on Artificial Intelligence
(IJCAI'99), PP 668-673, Stockholm, Sweeden,
1999. KARKALETSIS99 Vangelis Karkaletsis,
Georgios Paliouras, and Constantine D.
Spyropoulos. Learning Rules for Large Vocabulary
Word Sense Disambiguation. 16th Joint Int. Conf.
on Artificial Intelligence (IJCAI'99), PP
674-679, Stockholm, Sweeden, 1999. KROHN01 Fred
Krohn, conversation at ASI exchange,
2000 CHARKRABARTI98 S. Chakrabarti, B.E. Dom,
and P. Indyk. Enhanced hypertext classification
using hyper-links. In Proc. 1998 ACM-SIGMOD Int.
Conf. Management of Data (SIGMOD'98), pages
307-318, Seattle, Washington, June
1998. CHAKRABARTI99 S. Chakrabarti, B. E.
Dom, S. R. Kumar, P. Raghavan, S. Rajahopalan, A.
Tomkins, D. Gibson, and J. M. Kleinberg. Mining
the web's link structure. COMPUTER, 3260-67,
1999.
32
References (2)
KIM99 Munseok Kim, Sejin Nam, and Dongwook
Shin. Hypertext Construction using statistical
and semantic similarity. 16th Joint Int. Conf.
on Artificial Intelligence (IJCAI'99), PP
57-63, Stockholm, Sweeden, 1999. ALLOWAY99 Ge
ne Alloway and Peter Weinstein. Seed Ontologies
growing digital libraries as distributed, intelli
gent systems. Proceedings of the second ACM
International Conference on Digital Libraries,
pp. 83-91, Philadelphia, USA, 1999. VELING98 A
nne Veling and Peter van der Weerd. Conceptual
grouping in word co-occurrence networks. 16th
Joint Int. Conf. on Artificial Intelligence
(IJCAI'99), PP 694-699, Stockholm, Sweeden,
1999. BOUAZIZ98 Tarik Bouaziz and Anton
Wolski. Fuzzy Triggers Incorporating Imprecise
Reasoning into Active Databases. Proc. IEEE 14th
International Conference on Data Engineering.
1998. WEB3 http//www.darmstadt.gmd.de/mobile/M
PEG7/, The MPEG 7 web page. MPEG 7 is a proposed
standard for metadata description of multimedia
information of varying kinds.
33
References (3)
WEB4 http//dublincore.org/documents/,
recommendations of the Dublin Core
Metadata Initiative, an open forum concerned
with "development of interoperable online
metadata standards that support a broad range of
purposes and business models". COOLEY00 R.
Cooley, M. Deshpande, J. Srivastava, and P. N.
Tan. Web usage mining Discovery and aplications
of usage patterns from web data. SIGKDD
Explorations, 112-23, 2000. GREENBERG97 L.
Tauscher and S. Greenberg. How people revisit web
pages Empirical findings and implications for
the design of history systems. International
Journal of Human Computer Studies, Special issue
on World Wide Web Usability, 4797-138,
1997. HOFMANN99 Thomas Hofmann and Jan
Puzicha. Latent Class Models for Collaborative
Filtering. 16th Joint Int. Conf. on Artificial
Intelligence (IJCAI'99), PP 688-693, Stockholm,
Sweeden, 1999.

Write a Comment

User Comments (0)

About PowerShow.com

Human Information Access with Fuzzy Searching - PowerPoint PPT Presentation

Human Information Access with Fuzzy Searching

Fuzzy logic has shown to act like approximate reasoning in control systems ... Category: Arts Television Theme Songs. tv.cream.org/ - 1k - Cached - Similar pages ... – PowerPoint PPT presentation