The Data Mining Visual Environment - PowerPoint PPT Presentation

About This Presentation

Title:

The Data Mining Visual Environment

Description:

The Data Mining Visual Environment Motivation Major problems with existing DM systems They are based on non-extensible frameworks. They provide a non-uniform mining ... – PowerPoint PPT presentation

Number of Views:128

Avg rating:3.0/5.0

Slides: 13

Provided by: StephenK164

Category:

more less

Transcript and Presenter's Notes

Title: The Data Mining Visual Environment

1
The Data Mining Visual Environment
Motivation Major problems with existing DM
systems They are based on non-extensible
frameworks. They provide a non-uniform mining
environment - the user is presented with totally
different interface(s) across implementations of
different DM techniques. Major needs An overall
framework that can support the entire Knowledge
Discovery (KD) process (accommodate and integrate
all KD phases seamlessly). Placing the user at
the center of the entire KD process/in the
framework. In fact the corresponding system
should provide a consistent, uniform and flexible
visual interaction environment that supports the
user throughout the entire discovery process.
2
The Data Mining Visual Environment

Primary layers
User layer
Engine layer
Data layer
Main features
Open
Modular with
well defined
modification/extension points
Possible integration of
different tasks
eg output reuse by another task
User flexibility and enablement to
process data and knowledge,
drive and guide the entire KD
process

System Architecture
At present, there is a partial prototype,
complete implementation is underway.
3
The Data Mining Visual Environment

Primary layers
User Interface/GUI Container
(interacts with specific DM
visual environments)
Developed using Java
Abstract DM Engine/Wrapper of
DM algorithms (interacts with
specific DM algorithms)
Developed using Java
Note
The DM algorithms may be
implemented by third parties
in possibly any language.
DM methods (but not limited to)
MQs, ARs, and clustering

System Architecture The Prototype
Stephen
Stefano
4
The Data Mining Visual Environment
Visual Environment A consistent, uniform,
flexible and intuitive GUI, with support
throughout the whole DM process. The principal
focus is to support the user in Visual
construction of the task relevant dataset The
user directly interacts with data. For this task,
there are two intuitive interaction
spaces. Visual construction of the mining query
The user directly interacts with data and other
parameters (e.g. threshold values) in making
queries e.g., in the Metaquery Environment, the
user can suggest patterns by linking attributes,
while the Association Rule Environment offers
visual baskets. Visual output presentation and
interaction Exploiting relevant effective
visualizations and where necessary, we have
designed novel visualizations. Planning E.g.,
advertising relevant prior knowledge. Handling
the non-static nature of users quest E.g.,
enabling user to adjust.
5
The Data Mining Visual Environment
Visual Environment Overall
6
The Data Mining Visual Environment
Visual Environment Tree View (Progress
Companion)
Before user settings
After user settings
After DM results
7
The Data Mining Visual Environment
Visual Environment Clustering - Input
8
The Data Mining Visual Environment
Visual Environment Clustering - Output
... for more on the prototype, demo
9
The Data Mining Visual Environment
Usability Usability heuristics Done, but
regular reference to the same will go
on. Mock-up tests Done with DM experts. The
experts gave an encouraging feedback and even
suggestions on how to improve the
interface. (These tests were done at the end of
2001.) Questionnaire experiments The
experiments involved the application simulation,
a case study, data schema and user tasks
corresponding to the case study, and a
questionnaire. Positive interface features
consistency, layout/organization, visual
exploration. Negative interface features size of
some visual elements small/big (These tests were
done in July 2002.) Formal usability tests In
the pipeline.
10
The Data Mining Visual Environment

The Clustering Engine
Clustering method Generalizations of three
techniques homogeneity, separation, density.
Clustering based on homogeneity/separation
Homogeneity (separation) is a global measure of
the similarity between points belonging to the
same cluster (to different clusters)
Clustering based on density Clusters are regions
of the object space where objects are located
most frequently
Clustering based The system selects the best
clustering according to a cost function
For homogeneity/separation-based clustering the
cost function is computed by evaluating
pointwise, clusterwise, and partitionwise
similarity/dissimilarity
For density-based clustering, the cost function
is derived from an estimated density function

11
The Data Mining Visual Environment

Formal Semantics of the Input Environment
Visual language abstract syntax semantics
Abstract syntax defined in terms of multi-graphs
Visual components are vertices of the multi-graph
Spatial relations between visual components are
edges of the multi-graph
Semantics
Clustering defined by a mapping between
multi-graphs and cost functions and predicates
expressing optimality
Metaqueries/association rules defined by a
mapping between multi-graphs and rules

12
The Data Mining Visual Environment

Operational Specification
Concrete, high-level syntax of the tasks proposed
in the usability tests
Describes legal click-streams allowed to occur
during operation
Standard grammar notation
Communication protocol between the abstract
clustering engine and the data mining engines
XML DTDs based on PMML 2.0
Extension of PMML 2.0 to
Specification of input
Broader spectrum of clustering methods
Concrete semantics of the clustering task by
mapping on symbols in the tasks grammar to
structures of the communication protocol
Interpretation function recursively defined on
the grammar rules of the high-level syntax of
tasks
The interpretation of a legal click-stream is an
XML document satisfying the DTD of the input
specification