Title: Technology for Pooling Knowledge
1Technology for Pooling Knowledge
2 Overview
- What is Knowledge Engineering
- Why Pool Knowledge
- Knowledge Pooling Process
- What is Data Mining
- Paradigms for Data Mining
- Which Paradigm to Use?
- Issues
- Conclusions
3What is Knowledge Engineering?
- Techniques employed to build intelligent systems
- knowledge acquisition and discovery
- representation and integration
- reasoning methodologies
- explanation
- Decision Support Tumour identification
- Process Control Depth of Anaesthesia
- Teaching intelligent tutoring systems
4Data Information and Knowledge
- Data Collection of facts which have no meaning
on their own - Hot, -6
- Information data becomes information when it is
interpreted in context - Engine Hot, -6oC
- Knowledge Information becomes knowledge when it
is usefully applied - The engine is hot therefore it must have been
used recently - The temperature is -6oC, I better wear gloves
5Why Pool Knowledge
- To develop intelligent systems we must pool
knowledge from data - See the big picture what does all the data have
to say? - Improve our decision making processes
- Improved diagnosis
- More effective treatment
- Higher quality management
- Enhance understanding of disease progression
6How Do We Pool Knowledge
- Too much data for human to trawl through so
automated techniques developed - Knowledge Discovery
- Data Warehousing consistency, integrity
- Data Mining
- Expert may not be aware of knowledge found
- Either Im missing something, or nothings going
on!
7Knowledge Pooling in Practice
Patient History
Knowledge Pool
Knowledge Discovery Toolset
Data Warehouse
Clinical Data
Background Knowledge
Knowledge Based Decision Support System
Patient Details
Advice
Unstructured data
Expert(s) Decision Maker(s)
8Knowledge Pooling Process
Data Warehouse
Integration and Cleaning
Legacy Databases
9Knowledge Pooling Process
Data Warehouse
Selection
Integration and Cleaning
Legacy Databases
10Knowledge Pooling Process
Data Mining
Knowledge Pool
Data Warehouse
Selection
Integration and Cleaning
Legacy Databases
11Knowledge Pooling Process
Expert(s) Decision Maker(s)
Feedback
Data Mining
Knowledge Pool
Data Warehouse
Selection
Background knowledge
Integration and Cleaning
Legacy Databases
12What is Data Mining
- The nontrivial extraction of implicit,
previously unknown, and potentially useful
information from data - Term is a misnomer knowledge mining
- Uses machine learning, statistical and
visualisation techniques to discover and present
knowledge in a form which is easily
comprehensible to humans
13Approaches For Data Mining
- Classification
- Prediction
- Association Rule Discovery
- Sequence Rule Discovery
- Clustering/Segmentation
14Classification Tasks
- Process of examining the features of record and
assigning it to one of a predefined set of
classes - Discovers, from the data the model that can
classify new records - Example application classifying skin disease
- Technologies used
- Decision Trees
- Memory Based Reasoning
- Rule Induction
15Classification
psoriasis seboreic dermatitis lichen
planus pityriasis rosea cronic dermatitis
pityriasis rubra pilaris
Build Model
Classifier Model
Training Data
Use Model
Test Data
Classification
psoriasis
16Predictive Tasks
- A predictive model is similar in nature to the
classification model except that the value being
predicted is numeric - Predicting the life expectancy of a cancer
patient from characteristics of the tumour. - Technologies used
- Decision Trees
- Memory Based Reasoning
- Rule Induction
17Prediction
Build Model
Predictive Model
Training Data
Use Model
Test Data
Prediction
45 Months
18Association Rule Discovery
- Rules that define relations between attribute
values - If Headache and Temperature then Sore Throat,
support 35 and confidence 75 - 75 of records in which the patient has a
Headache and a Temperature also have a Sore
Throat and these patients constitute 35 of all
patients presented during the period analysed. - Technology used
- Set Oriented Methods
19Sequence Rule Discovery
- Generalisations of association rules
- Discovered rules take into account the temporal
nature of data - If a diabetic patient presents with early stage
retinopathy at an age under 18 then within 5
years renal failure will occur, support 30 and
confidence 65. - Technology used
- Set Oriented Methods
20Clustering (Segmentation)
- The aim of cluster detection is to discover
regularities in data based on similarity - These algorithms discover sub-groups (clusters)
of data that are more similar (intra-cluster
distance) than data that belong to other clusters
(inter-cluster distance). - Technologies used
- Bayesian Techniques
- Statistical Techniques
21Clustering Example
Cluster 2 Females Over 45
Cluster 1 Males Over 40
Cluster 3 Males Under 40
Cluster 4 Females Under 45
Clustering of patients with a particular illness
reveals 4 patient clusters. It is seen that each
group responds best to a different drug.
22Which Paradigm to Use?
- Mining Task Given the data mining task at hand,
the choice of paradigm is restricted to those
that can solve the task - TransparencySome paradigms generate knowledge in
more understandable representations than others - Decision trees and memory based reasoning
generate knowledge in more intuitive
representations than paradigms like neural
networks
23Which Paradigm to Use?
- Data A number of aspects of data can affect the
effectiveness of the paradigm being used - Data Types Some paradigms such as set-oriented
methods can only handle categorical data - Dimensionality Some paradigms are more adept at
handling large number of attributes in the data
24Issues
- Noisy data wrong values. How does this affect
the confidence levels of results - Incomplete data missing values
- Temporal data data added recently of more value
than that added a year ago? - Non textual data image, Video
- Privacy issues what if you predict future
illnesses, slimming pills example
25Issues
26Issues
- Heterogeneity integration of distributed,
heterogeneous data - Access to data
- Who owns it?
- Who is responsible for maintenance of links?
- Cross referencing patients between databases
- Consistency
- Up to date data
27Conclusions
- It is desirable to integrate data from all
possible sources - Technology is available to do this
- Technology is available to pool knowledge from
the data - Better decisions/care/training/understanding
- Requires vision, determination and money