Data Interpretation and Domain Knowledge - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Data Interpretation and Domain Knowledge

Description:

The diver ascended rapidly for a while and. The diver stopped twice on the ascent ... buoyancy control or the result of the diver moving up and down inspecting a ship ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 24
Provided by: Somay
Category:

less

Transcript and Presenter's Notes

Title: Data Interpretation and Domain Knowledge


1
Data Interpretation and Domain Knowledge
  • Yaji Sripada

2
In this lecture you learn
  • Need and role of domain knowledge in
  • Data analysis and
  • Data interpretation
  • Issues involved in using domain knowledge

3
Introduction
  • So far we have studied data analysis methods and
    visualization techniques that are domain
    independent
  • They are useful for extracting information from
    input data
  • However, the information extracted by these
    methods is factual
  • On the other hand users want interpreted
    information
  • Domain knowledge is required to transform factual
    information into interpreted information

4
Example
  • We learnt how to use bottom-up segmentation
    method to analyse scuba dive profile data
  • Consider the dive profile shown above and its
    segmented representation shown below
  • User is interested in knowing safety related
    information extracted from the dive profile

5
Fact Vs Interpreted
  • The segmented representation presents the factual
    information such as
  • The diver ascended rapidly for a while and
  • The diver stopped twice on the ascent
  • What is missing is the domain knowledge
  • The rate of ascent is more than the recommended
    rate and
  • One of the stops is unnecessary
  • Without such domain knowledge it is impossible to
    communicate useful and interesting information
    (interpreted information) to the user

6
More uses of Domain knowledge
  • Domain knowledge is also required for controlling
    data analysis so that
  • the results of data analysis contain the required
    level of detail and
  • the data analysis process is sensitive to certain
    data items or patterns that are significant (in
    some way) in the domain

7
Example
  • Consider the dive profile shown in the earlier
    example
  • Bottom-up segmentation produces many different
    segmented representations from this dive profile

8
Multiple Segmented representations
  • Three possibilities are shown here, although many
    more are possible

9
Controlling Segmentation
  • Which one of the segmented representations is an
    interesting and useful one?
  • Which one of them has the right level of detail
    for the user?
  • Because bottom-up segmentation follows the
    process of iterative merging of adjacent segments
    achieving the right level of detail means
    controlling the merging process
  • You have studied the merging process in bottom-up
    segmentation in practical 2

10
Domain significant data values and patterns
  • Continuing with our dive profile example
  • If the depth of this dive is deeper than the
    recommended depth for safe dives, the
    segmentation process does not generate any extra
    information about the safety of the dive.
  • This means that in this case the two segment
    representation on the right is just as good as
    the detailed segmented representation on the left
  • Similarly if a dive profile is reverse (shallow
    first and deeper later) then once again
    segmentation details do not matter

11
Domain significant data values and patterns (2)
  • In many other domains we have observed such
    significant values or patterns that affect data
    analysis
  • Meteorology wind speeds above or below certain
    ranges are useless for oilrig people because such
    ranges of wind speed either completely prohibit
    operations or do not cast influence on their
    operations at all
  • Medical domain the blood pressure changes are
    unimportant after it crosses certain threshold
    values

12
Domain Knowledge
  • Domain knowledge is required for
  • Data interpretation and
  • Controlling data analysis
  • In fact, exploiting domain knowledge in computer
    systems has been a generic goal in Computing
    Science
  • Database theory which models data based on domain
    independent relational theory uses business
    rules to bring in domain knowledge
  • AI and Expert System communities have always
    worked with domain experts

13
Strategies to include domain knowledge
  • Domain knowledge can be included
  • directly to control the data analysis process
  • Because data analysis procedures essentially
    involve searching for patterns and rules domain
    knowledge will reduce the amount of search
  • or
  • In a post-processing stage where the results of
    domain independent data analysis are inspected to
    eliminate uninteresting information
  • or
  • To do both

14
Domain Knowledge for Controlling Data Analysis
  • A simple way of controlling data analysis methods
    is by introducing domain thresholds as parameters
    in the data analysis algorithm
  • E.g. bottom-up segmentation algorithm uses an
    error parameter (ETV) to control its runtime
    behaviour and we can relate some domain threshold
    value to determine values for the error threshold
  • However, it is always not possible to find a
    simple mapping between domain thresholds and
    parameters of data analysis algorithms
  • Studies in the SumTime project with wind speed
    threshold values controlling segmentation of wind
    speed data have not been very encouraging

15
Domain Knowledge for interpretation
  • Interpretation involves inferring interesting
    information from factual information
  • Based on the amount of domain knowledge available
    interpretation can happen to several levels of
    inferences
  • Therefore knowledge of user tasks should be known
    to determine the desired level and the required
    domain knowledge
  • In other words, in order to carry out
    interpretation, we need to know a level
    determined by user task to which inferences are
    required
  • We also need to know the required domain
    knowledge to perform the inference - mapping of
    factual information to interpreted information
  • In general, user models and context models are
    important for performing data interpretation

16
Expert Systems
  • Expert systems are software systems that exploit
    domain and problem solving knowledge acquired
    from experts
  • Expert systems offer one solution to the problem
    of data interpretation
  • However, we need to face the following
    challenges
  • It is difficult to acquire all the required
    knowledge from any one source
  • It is difficult to merge (consolidate) knowledge
    from multiple sources

17
Resume
  • Resume is a knowledge based expert system for
    deriving temporal abstractions from medical time
    series
  • Focuses on interpretation of time series based on
    contextual knowledge
  • Input Time stamped clinical data and relevant
    events
  • Output abstracted temporal intervals
  • Identifies past and present trends and states
  • Supports decisions based on temporal patterns,
    such as
  • modify therapy if the patient has a second
    episode of Grade II bone-marrow toxicity lasting
    more than 3 weeks

18
Cost of expert systems
  • Resume uses a tool called Protégé for experts
    specifying knowledge
  • Protégé allows experts define an ontology and
    then specify domain knowledge using that ontology
  • It is expensive to build Resume like tools and
    therefore data interpretation is an expensive
    step in building our data interpretation and
    communication technology
  • As a cheaper alternative, simple levels of
    interpretation can be achieved by using domain
    knowledge in the form of thresholds on values
  • E.g, we can infer that a blood pressure value of
    150 is high by knowing the threshold for normal
    blood pressures

19
ScubaText
  • ScubaText is a research project in our department
  • Task produces reports on interpretations of dive
    data
  • Input Dive Computer Data similar to the data
    you work with in the practicals
  • Output Reports contain textual presentations
    combined with annotated graphical presentations
    of dive data
  • Text presents an interpretation of the data/graph
  • Annotations on the graph provide interpretations
    of some of the phrases in the text

20
ScubaText Prototype
  • Prototype architecture is a four stage pipeline

Data Analysis analysing raw data for required
features/patterns Data Interpretation mapping
the data features/patterns to the actual dive
features e.g. saw tooth pattern (data pattern)
mapped to poor buoyancy control rating the dive
based on the dive features e.g. long bottom
times receive low rating
21
TextAnnotated Graphics
Risky dive with some minor problems. Because your
bottom time of 12.0min exceeds no-stop limit by
4.0min this dive is risky. But you performed the
ascent well. Your buoyancy control in the bottom
zone was poor as indicated by saw tooth
patterns marked A on the depth-time profile.
22
Dive Data Interpretation
  • Features and patterns (abstractions) extracted
    using data analysis methods cannot be
    communicated to the user directly
  • No use in saying you have a saw tooth pattern on
    your dive profile
  • If the user is ignorant of the implications of
    that pattern
  • Abstractions need to be mapped to domain features
  • Useful to say Your buoyancy control in the
    bottom zone was poor as indicated by saw tooth
    patterns
  • A given abstraction may have more than one
    interpretation
  • Saw tooth pattern could be either due to poor
    buoyancy control or the result of the diver
    moving up and down inspecting a ship wreck
  • Knowledge of the dive context and user modelling
    can help in disambiguating data interpretation

23
Summary
  • Domain knowledge is required for
  • Controlling data analysis and
  • Performing data interpretation
  • User modelling and contextual knowledge play a
    major role in data interpretation
  • However, the cost of incorporating domain
    knowledge needs to be justified by the human
    effort needed to interpret factual information
  • If humans can interpret factual information
    easily, our technology should not attempt to
    build expensive interpretation modules
  • However because users vary in their
    interpretation capacity, for some classes of
    users interpretation might always be required
Write a Comment
User Comments (0)
About PowerShow.com