Technology for Pooling Knowledge - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Technology for Pooling Knowledge

Description:

What is Knowledge Engineering? Techniques employed to build intelligent systems ... Process Control Depth of Anaesthesia. Teaching intelligent tutoring systems ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 27
Provided by: davep171
Category:

less

Transcript and Presenter's Notes

Title: Technology for Pooling Knowledge


1
Technology for Pooling Knowledge
2
Overview
  • What is Knowledge Engineering
  • Why Pool Knowledge
  • Knowledge Pooling Process
  • What is Data Mining
  • Paradigms for Data Mining
  • Which Paradigm to Use?
  • Issues
  • Conclusions

3
What is Knowledge Engineering?
  • Techniques employed to build intelligent systems
  • knowledge acquisition and discovery
  • representation and integration
  • reasoning methodologies
  • explanation
  • Decision Support Tumour identification
  • Process Control Depth of Anaesthesia
  • Teaching intelligent tutoring systems

4
Data Information and Knowledge
  • Data Collection of facts which have no meaning
    on their own
  • Hot, -6
  • Information data becomes information when it is
    interpreted in context
  • Engine Hot, -6oC
  • Knowledge Information becomes knowledge when it
    is usefully applied
  • The engine is hot therefore it must have been
    used recently
  • The temperature is -6oC, I better wear gloves

5
Why Pool Knowledge
  • To develop intelligent systems we must pool
    knowledge from data
  • See the big picture what does all the data have
    to say?
  • Improve our decision making processes
  • Improved diagnosis
  • More effective treatment
  • Higher quality management
  • Enhance understanding of disease progression

6
How Do We Pool Knowledge
  • Too much data for human to trawl through so
    automated techniques developed
  • Knowledge Discovery
  • Data Warehousing consistency, integrity
  • Data Mining
  • Expert may not be aware of knowledge found
  • Either Im missing something, or nothings going
    on!

7
Knowledge Pooling in Practice
Patient History
Knowledge Pool
Knowledge Discovery Toolset
Data Warehouse
Clinical Data
Background Knowledge
Knowledge Based Decision Support System
Patient Details
Advice
Unstructured data
Expert(s) Decision Maker(s)
8
Knowledge Pooling Process

Data Warehouse
Integration and Cleaning
Legacy Databases
9
Knowledge Pooling Process

Data Warehouse
Selection
Integration and Cleaning
Legacy Databases
10
Knowledge Pooling Process
Data Mining
Knowledge Pool

Data Warehouse
Selection
Integration and Cleaning
Legacy Databases
11
Knowledge Pooling Process
Expert(s) Decision Maker(s)
Feedback
Data Mining
Knowledge Pool

Data Warehouse
Selection
Background knowledge
Integration and Cleaning
Legacy Databases
12
What is Data Mining
  • The nontrivial extraction of implicit,
    previously unknown, and potentially useful
    information from data
  • Term is a misnomer knowledge mining
  • Uses machine learning, statistical and
    visualisation techniques to discover and present
    knowledge in a form which is easily
    comprehensible to humans

13
Approaches For Data Mining
  • Classification
  • Prediction
  • Association Rule Discovery
  • Sequence Rule Discovery
  • Clustering/Segmentation

14
Classification Tasks
  • Process of examining the features of record and
    assigning it to one of a predefined set of
    classes
  • Discovers, from the data the model that can
    classify new records
  • Example application classifying skin disease
  • Technologies used
  • Decision Trees
  • Memory Based Reasoning
  • Rule Induction

15
Classification
psoriasis seboreic dermatitis lichen
planus pityriasis rosea cronic dermatitis
pityriasis rubra pilaris
Build Model
Classifier Model
Training Data
Use Model
Test Data
Classification
psoriasis
16
Predictive Tasks
  • A predictive model is similar in nature to the
    classification model except that the value being
    predicted is numeric
  • Predicting the life expectancy of a cancer
    patient from characteristics of the tumour.
  • Technologies used
  • Decision Trees
  • Memory Based Reasoning
  • Rule Induction

17
Prediction
Build Model
Predictive Model
Training Data
Use Model
Test Data
Prediction
45 Months
18
Association Rule Discovery
  • Rules that define relations between attribute
    values
  • If Headache and Temperature then Sore Throat,
    support 35 and confidence 75
  • 75 of records in which the patient has a
    Headache and a Temperature also have a Sore
    Throat and these patients constitute 35 of all
    patients presented during the period analysed.
  • Technology used
  • Set Oriented Methods

19
Sequence Rule Discovery
  • Generalisations of association rules
  • Discovered rules take into account the temporal
    nature of data
  • If a diabetic patient presents with early stage
    retinopathy at an age under 18 then within 5
    years renal failure will occur, support 30 and
    confidence 65.
  • Technology used
  • Set Oriented Methods

20
Clustering (Segmentation)
  • The aim of cluster detection is to discover
    regularities in data based on similarity
  • These algorithms discover sub-groups (clusters)
    of data that are more similar (intra-cluster
    distance) than data that belong to other clusters
    (inter-cluster distance).
  • Technologies used
  • Bayesian Techniques
  • Statistical Techniques

21
Clustering Example
Cluster 2 Females Over 45
Cluster 1 Males Over 40
Cluster 3 Males Under 40
Cluster 4 Females Under 45
Clustering of patients with a particular illness
reveals 4 patient clusters. It is seen that each
group responds best to a different drug.
22
Which Paradigm to Use?
  • Mining Task Given the data mining task at hand,
    the choice of paradigm is restricted to those
    that can solve the task
  • TransparencySome paradigms generate knowledge in
    more understandable representations than others
  • Decision trees and memory based reasoning
    generate knowledge in more intuitive
    representations than paradigms like neural
    networks

23
Which Paradigm to Use?
  • Data A number of aspects of data can affect the
    effectiveness of the paradigm being used
  • Data Types Some paradigms such as set-oriented
    methods can only handle categorical data
  • Dimensionality Some paradigms are more adept at
    handling large number of attributes in the data

24
Issues
  • Noisy data wrong values. How does this affect
    the confidence levels of results
  • Incomplete data missing values
  • Temporal data data added recently of more value
    than that added a year ago?
  • Non textual data image, Video
  • Privacy issues what if you predict future
    illnesses, slimming pills example

25
Issues
  • Misuse - ??

26
Issues
  • Heterogeneity integration of distributed,
    heterogeneous data
  • Access to data
  • Who owns it?
  • Who is responsible for maintenance of links?
  • Cross referencing patients between databases
  • Consistency
  • Up to date data

27
Conclusions
  • It is desirable to integrate data from all
    possible sources
  • Technology is available to do this
  • Technology is available to pool knowledge from
    the data
  • Better decisions/care/training/understanding
  • Requires vision, determination and money
Write a Comment
User Comments (0)
About PowerShow.com