Knowledge Management Challenges in Knowledge Discovery Systems - PowerPoint PPT Presentation

About This Presentation

Title:

Knowledge Management Challenges in Knowledge Discovery Systems

Description:

TAKMA'05 Copenhagen, Denmark August 22-26, 2005. 2. TAKMA'05 Copenhagen, Denmark August 22-26, 2005 ... TAKMA'05 Copenhagen, Denmark August 22-26, 2005 ... – PowerPoint PPT presentation

Number of Views:89

Avg rating:3.0/5.0

Slides: 24

Provided by: mykolapec

Category:

more less

Transcript and Presenter's Notes

Title: Knowledge Management Challenges in Knowledge Discovery Systems

1
Knowledge Management Challenges in Knowledge
Discovery Systems
TAKMA05 Copenhagen, Denmark August 22-26, 2005

Mykola Pechenizkiy, Seppo Puuronen Department of
Computer ScienceUniversity of Jyväskylä Finland
Alexey Tsymbal
Department of Computer ScienceTrinity College
DublinIreland

2
Outline

Introduction
KDD
Selection of DM strategy for a problem at hand
Meta-learning
Our goal
To propose a knowledge-driven approach to enhance
the selection of DM strategies in KDSs.
Need for KM
What are the challenges
KM processes wrt problem of DM strategy selection
Further research
Discussion

3
Knowledge discovery as a process
I
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.,
Uthurusamy, R., Advances in Knowledge Discovery
and Data Mining, AAAI/MIT Press, 1997.
4
CRISP-DM
http//www.crisp-dm.org/
5
KDD Process Vertical Solutions
Reinartz, T. 1999, Focusing Solutions for Data
Mining. LNAI 1623, Berlin Heidelberg.
6
The Search for Scientific Methods and
Meta-Learning

Adequate scientific methods make induction easier
with a smaller number of examples.
The choice of methods needs to be based on a
higher level induction or on meta-learning in the
context of machine learning.
knowledge concerning the most appropriate method
for a given goal can be obtained by induction on
the database of history of science a collection
of problems of different methods, different goals
and different degrees of success Laudan
Meta-learning can produce rules concerning the
use of the alternative strategies, methodological
knowledge, or correct predictions concerning the
best rank of strategies for a new task.

7
Dynamic Selection of DM Methods

in KDSs has been under active study
2 contexts of dynamic selection
multi-classifier systems that apply different
ensemble techniques (Dietterich, 1997).
Their general idea is usually to select one
classifier on the dynamic basis taking into
account the local performance (e.g.
generalisation accuracy) in the instance space.
multistrategy learning (Michalski)
applies a strategy selection approach which takes
into account the classification problem- related
characteristics (meta-data).

8
Selection of the most appropriate DM technique

Motivation
No Free Lunch theorem
many empirical studies show
one learning strategy can perform significantly
better than another strategy on a group of
problems that are characterised by some
properties (Kiang, 2003).
Problem
Selection is usually not straightforward.
some knowledge is required for making a decision
about appropriate techniques selection and DM
strategy construction for a problem at hand.
We distinguish 2 levels of knowledge
the knowledge extracted from data that represents
the problem to be mined by means of applying a DM
technique
the higher-level knowledge (from the KDS
perspective) required for managing techniques
selection, combination and application gt
meta-knowledge.

9
Meta-learning

or learning to learn the effort to
automatically induce dependencies
learning tasks ? learning strategies.
based on the assumptions that it is possible
to evaluate and compare learning strategies,
to measure the benefits of early learning on
subsequent learning,
to use such evaluations to reason about learning
strategies
select useful ones and disregard the useless or
misleading strategies (Schmidhuber et al., 1996).

10
in Meta-learning

in the context of classifier ensembles, where
only the data itself is used to make decisions
about method selection,
rather good practical results are shown in
experiments supported by theoretical studies as
well
in dynamic integration of DM strategies for a
data set at hand
a multistrategy approach based on the ideas of
constructive induction and conceptual clustering
(Michalski, 1997)
several studies on automatic classifier selection
via meta-learning (Kalousis, 2002)
No practical success!

11
Meta-Learning
12
Problems with Meta-Learning for DM SS

Representativeness of meta-data samples
Meta-learning space is large
Computationally expensive to produce meta-data
samples
Curse of dimensionality
Many possible irrelevant features wrt
collected/produced meta-data
Complexity of statistical measures
Why do we need to spend time to characterize the
dataset if we can use this time to try different
DM approaches and select the best one?

13
Our goal and focus KM perspective

to propose a knowledge-driven approach to enhance
the dynamic integration of DM strategies in
knowledge discovery systems
focus on KM aimed to organise a systematic
process of knowledge capture and refinement over
time.
We consider the basic knowledge management
processes of
knowledge creation and identification,
representation, collection and organization,
sharing and integration,
adaptation and application
with respect to the introduced concept of
meta-knowledge.

14
Introducing KM to DM SS

Generally, the problem of knowledge capture,
storage, and dissemination is similar to data and
information management in ISs, and therefore some
executives prefer to view KM as a natural
extension to IS functions (Alavi and Leidner,
1999).
Zack (1999) the most practical way to define KM
is to show on the existing IT infrastructure the
involvement of
(1) knowledge repositories,
(2) best-practices and lessons-learned systems,
(3) expert networks these are DM experts, and
(4) communities of practice these are end-users.

15
Transformations of data and knowledge concepts
(adopted from Spiegler, 2000)
Knowledge is justified belief that increases an
entitys capacity for effective action (Nonaka,
1994). A long history of epistemological debates,
and discussion of knowledge from different
perspectives in Polanyi (1962).
16
Different types of knowing
17
Knowledge distribution and knowledge integration

4 potential sources of knowledge that has to be
integrated in the repository of KDS system
(1) knowledge from an expert in data-mining,
knowledge discovery, statistics and related
fields
(2) knowledge from a data-mining practitioner
(3) knowledge from laboratory experiments on
synthetic data sets and, finally,
(4) knowledge from field experiments on
real-world problems.
Beside this, research and business communities,
and similar KDSs themselves can organize
different trusted networks, where participant are
motivated to share their knowledge.

18
Knowledge Repository Lifecycle (1 of 2)

Since the repository is created it tends to grow
and at some point it naturally begins to collapse
under its own weight, requiring major
reorganization.
needs for continuously update,
some content needs to be deleted (if misleading),
deactivated or archived (if it is potentially
useful).
if similar contributions are combined,
generalized and restructured, the content may
become less fragmented and redundant.
The process of filtering knowledge claims into
accepted or suppressed is important
when a plenty of claims are produced
automatically they need to be filtered
automatically.

19
Knowledge Repository Lifecycle (2 of 2)

knowing when and knowing where contexts
when the environment changes, all of the general
rules without specifying the context could become
invalid.
some knowledge should exist that would guide an
organization to change the repository when the
environment calls for it.
Some knowledge claims are naturally in constant
competition with the other claims.
Disagreements within the knowledge repository
need to be resolved by means of generalization of
some parts and contextualization of the others.
In order to increase the quality and validity of
knowledge, it needs to be continually tested,
improved or removed.
Some basic principles of triggers can be
introduced

20
Knowledge validity and knowledge quality

The contexts knowing when and knowing where
can be discovered before it appears in a real
situation.
Active learning
Zooming in and zooming out procedures
Search for balance between generality,
compactness, interpretability, and
understandability and sensitiveness to the
context, exactness, precision, and adequacy of
(meta-)knowledge.
context conditions can be important for knowledge
quality estimation
The quality of knowledge can be estimated by its
ability to help a KDS produce solutions faster
and more effectively.
Knowledge claims have both a degree of utility
and a degree of satisfaction.
To determine the relative quality of a validated
knowledge claim, evaluation criteria should be
defined
complexity, usefulness, and predictive power are
well formalised and easy to estimate
understandability, reliability of source,
explanatory power are rather subjective and
therefore inaccurate.

21
Limitations

The goal of KM here is to make more effective and
efficient use of available DM techniques.
The most important issues in knowledge
management
(1) executive/strategic management,
(2) operational management,
the identification of available knowledge,
seeking ways to capture it in a KM process,
and analysing the ability to design an KM
(sub)system including its tools and applications
(3) costs, benefits, and risks management, and
(4) standards in the KM technology and
communication.

22
Further Research

Implementation of presented knowledge-driven
framework for a KDS that contains a limited
number of DM techniques of a certain type
Feature extraction techniques and classification
techniques
Evaluation of the framework in practice for
real-world problems in a distributed environment

23
Thank You!