GridMiner: Vision, Design and Underlying Grid Technology - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

GridMiner: Vision, Design and Underlying Grid Technology

Description:

Ivan Janciak, Peter Brezany and A Min Toja. Vienna ... SPADE. Clustering (sequential version) SimpleKMeans. Ongoing work. Text Mining - classification ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 25
Provided by: Ale8375
Category:

less

Transcript and Presenter's Notes

Title: GridMiner: Vision, Design and Underlying Grid Technology


1
GridMiner Vision, Design and Underlying Grid
Technology
  • Ivan Janciak, Peter Brezany and A Min Toja
  • Vienna University of Technology
  • Institute of Scientific Computing
  • email janciak_at_par.univie.ac.at

2
Motivation
Service Provider
Business understanding
Data understanding
Service Provider
Data provider
Data Preparation
Data
GridMiner
Deployment
Service Provider
Modeling
Evaluation
CRISP-DM, SPSS
3
Outline
  • Motivation
  • Data Mining and The Grid
  • GridMiner Architecture
  • Workflow Engine
  • Knowledge Base
  • Data Mining Services
  • Decision Trees
  • OLAP
  • GridMiner The Movie

4
Data Mining
  • Data Mining process
  • Data understanding
  • Statistics
  • Metadata exploration
  • Data preparation
  • Integration, Selection, Transformation
  • Data cleaning
  • Modeling / Evaluation
  • Methods / Algorithms selection
  • Tuning data mining task parameters
  • Reporting
  • Visualization

5
The Grid
  • Large Datasets
  • Distributed data
  • Access to data, data transfer
  • Heterogeneity
  • Semantic issues
  • Distributed Computing Resources
  • Security
  • Different policies
  • Access rights

6
Requirements of data mining system on the Grid
  • Analyze huge and distributed datasets
  • Sophisticated data access system
  • Data Mediator
  • Interactive workflow management
  • Workflow language and engine
  • Data Mining services
  • Parallel or distributed versions
  • Knowledge Base
  • Documents (Metadata, Models, Workflows, Rules, )
  • Graphical user interface
  • Secure system

7
Basic Components
  • Grid Layer
  • Grid Services
  • Grid Data services OGSA-DAI
  • Data Mining services OGSA/OGSI
  • Web Layer
  • Web applications JSP/JavaBeans
  • Service Configuration
  • Data mining task configuration
  • User ltgt Service Interaction
  • Documents
  • XML/XSLT/XSD documents
  • Workflow description, Metadata , Models (PMML),
  • Perform Document Mediation Schema

8
Architecture Overview
Graphical User Interface
Web
Knowledge Base
Service Task configuration
DSCE Client
Visualization
Data Exploration
Grid
Dynamic service control engine
Grid Data Service
Data mining service
9
Workflows
User
DSCL
  • Dynamic Service Control Engine (DSCE)
  • processes the workflow according to DSCL
  • Dynamic Service Control Language (DSCL)
  • based on XML
  • easy to use

DSCE Client
DSCE
Service A
Service B
Service D
Service C
10
Workflow language (DSCL)
act2.1
Conversion to XML
Users view
act1
act2.2
dscl
variables
composition
sequence
createService activityIDact1
parallel
invoke activityIDact2.1
invoke activityIDact2.2
sequence

11
DSCL and OGSA-DAI Perform document
ltdsclgt ltvariablesgt ltvariable name"PERFORM_DOCUM
ENT"gt ltvaluegt ltgdsgridDataServicePerformgt
ltgdssqlQueryStatement name"myQuery"gt lt
gdsexpressiongtselect from testlt/gdsexpressiongt
ltgdswebRowSetStream name"myQueryOutput"/gt
lt/gdssqlQueryStatementgt lt/gdsgridDataService
Performgt ltvaluegt lt/variablegt ltvariab
le namePERFORM_RESULTS"/gt lt/variablesgt ltcomposi
tiongt ltsequencegt ltcreateService
activityID"START" factory-gshhttp//localhost8
9/ogsa/services/ogsadai/GDSF/gt ltinvoke
activityIDDAI001" operation"perform"gt ltparam
eter variable"PERFORM_DOCUMENT"/gt ltresult
variable"PERFORM_RESULTS"/gt lt/invokegt lt/seq
uencegt lt/compositiongt lt/dsclgt
12
Workflow client (DSCE client)
  • Implemented as Web application
  • Finalize DSCL Document
  • Receives notifications from DSCE
  • Delivers results to KB
  • Workflow optimization
  • cashing
  • Interaction with DSCE engine
  • Start
  • Stop
  • Pause
  • Resume

13
Knowledge Base
  • XML Database (xindiche)
  • Store and share documents
  • DSCL,PMML, Mapping schemas, XSLT , Perform
    documents

Rules / Facts
SWRL
Models
PMML MiningModel
Ontologies
OWL/OWL-S
Metadata
PMML DataDictionary
14
Components and Documents interactions
Web Applications
Knowledge Base
Services
Visualization
PMML
XSLT
Data mining service
Service/Task Configuration
DSCL
DSCE Engine
DSCE Client
Perform Document / Mapping Schema
Grid Data Service
Data Exploration
15
Data Mining Service provided by GridMiner
  • Kernel Services
  • Decision Trees (distributed version)
  • SPRINT algorithm
  • Pruning
  • Tree evaluation
  • OLAP (parallel version)
  • Sequences (sequential version)
  • SPADE
  • Clustering (sequential version)
  • SimpleKMeans
  • Ongoing work
  • Text Mining - classification

16
Decision Tree Service - DT
Master
Data
DT
XML
Model
Slave 1
Slave 2
DT
DT
17
Decision Tree Service cont.
18
Decision Tree Service cont.
19
Decision Tree Service- Test
Test Dataset XML file (webRowSet) 6 attributes
1 node
130k
2 nodes
4 nodes
Execution Time ms
50k
250 000
500 000
Size records
20
OLAP Service
Master
query
Data
Virtual Cube
XML
answer
Slave 1
Index Service
Indexes
Sub Cube
Slave 3
Slave 2
Sub Cube
Sub Cube
21
Graphical User Interface
  • Service Configuration
  • Services selection
  • Services Configuration
  • Data Mining Task Configuration
  • Setting parameters of methods / algorithms
  • Data Processing
  • Data Access
  • Data Integration
  • Data Selection
  • Data Statistics, Histograms
  • Workflow Execution
  • Interaction with DSCE Client
  • Results Visualization

22
Visualization
PMML Document transformed to SVG
23
Summary
  • GridMiner the 1st project adressing all facets
    of knowledge discovery on the Grid
  • Running prototype available
  • Ongoing work on
  • Semantic data integration
  • Knowledge management
  • Service performance optimization
  • Grid intelligence

24
GridMiner Group Members
Write a Comment
User Comments (0)
About PowerShow.com