Title: Data Interpretation and Domain Knowledge
1Data Interpretation and Domain Knowledge
2In this lecture you learn
- Need and role of domain knowledge in
- Data analysis and
- Data interpretation
- Issues involved in using domain knowledge
3Introduction
- So far we have studied data analysis methods and
visualization techniques that are domain
independent - They are useful for extracting information from
input data - However, the information extracted by these
methods is factual - On the other hand users want interpreted
information - Domain knowledge is required to transform factual
information into interpreted information
4Example
- We learnt how to use bottom-up segmentation
method to analyse scuba dive profile data - Consider the dive profile shown above and its
segmented representation shown below - User is interested in knowing safety related
information extracted from the dive profile
5Fact Vs Interpreted
- The segmented representation presents the factual
information such as - The diver ascended rapidly for a while and
- The diver stopped twice on the ascent
- What is missing is the domain knowledge
- The rate of ascent is more than the recommended
rate and - One of the stops is unnecessary
- Without such domain knowledge it is impossible to
communicate useful and interesting information
(interpreted information) to the user
6More uses of Domain knowledge
- Domain knowledge is also required for controlling
data analysis so that - the results of data analysis contain the required
level of detail and - the data analysis process is sensitive to certain
data items or patterns that are significant (in
some way) in the domain
7Example
- Consider the dive profile shown in the earlier
example - Bottom-up segmentation produces many different
segmented representations from this dive profile
8Multiple Segmented representations
- Three possibilities are shown here, although many
more are possible
9Controlling Segmentation
- Which one of the segmented representations is an
interesting and useful one? - Which one of them has the right level of detail
for the user? - Because bottom-up segmentation follows the
process of iterative merging of adjacent segments
achieving the right level of detail means
controlling the merging process - You have studied the merging process in bottom-up
segmentation in practical 2
10Domain significant data values and patterns
- Continuing with our dive profile example
- If the depth of this dive is deeper than the
recommended depth for safe dives, the
segmentation process does not generate any extra
information about the safety of the dive. - This means that in this case the two segment
representation on the right is just as good as
the detailed segmented representation on the left - Similarly if a dive profile is reverse (shallow
first and deeper later) then once again
segmentation details do not matter
11Domain significant data values and patterns (2)
- In many other domains we have observed such
significant values or patterns that affect data
analysis - Meteorology wind speeds above or below certain
ranges are useless for oilrig people because such
ranges of wind speed either completely prohibit
operations or do not cast influence on their
operations at all - Medical domain the blood pressure changes are
unimportant after it crosses certain threshold
values
12Domain Knowledge
- Domain knowledge is required for
- Data interpretation and
- Controlling data analysis
- In fact, exploiting domain knowledge in computer
systems has been a generic goal in Computing
Science - Database theory which models data based on domain
independent relational theory uses business
rules to bring in domain knowledge - AI and Expert System communities have always
worked with domain experts
13Strategies to include domain knowledge
- Domain knowledge can be included
- directly to control the data analysis process
- Because data analysis procedures essentially
involve searching for patterns and rules domain
knowledge will reduce the amount of search - or
- In a post-processing stage where the results of
domain independent data analysis are inspected to
eliminate uninteresting information - or
- To do both
14Domain Knowledge for Controlling Data Analysis
- A simple way of controlling data analysis methods
is by introducing domain thresholds as parameters
in the data analysis algorithm - E.g. bottom-up segmentation algorithm uses an
error parameter (ETV) to control its runtime
behaviour and we can relate some domain threshold
value to determine values for the error threshold - However, it is always not possible to find a
simple mapping between domain thresholds and
parameters of data analysis algorithms - Studies in the SumTime project with wind speed
threshold values controlling segmentation of wind
speed data have not been very encouraging
15Domain Knowledge for interpretation
- Interpretation involves inferring interesting
information from factual information - Based on the amount of domain knowledge available
interpretation can happen to several levels of
inferences - Therefore knowledge of user tasks should be known
to determine the desired level and the required
domain knowledge - In other words, in order to carry out
interpretation, we need to know a level
determined by user task to which inferences are
required - We also need to know the required domain
knowledge to perform the inference - mapping of
factual information to interpreted information - In general, user models and context models are
important for performing data interpretation
16Expert Systems
- Expert systems are software systems that exploit
domain and problem solving knowledge acquired
from experts - Expert systems offer one solution to the problem
of data interpretation - However, we need to face the following
challenges - It is difficult to acquire all the required
knowledge from any one source - It is difficult to merge (consolidate) knowledge
from multiple sources
17Resume
- Resume is a knowledge based expert system for
deriving temporal abstractions from medical time
series - Focuses on interpretation of time series based on
contextual knowledge - Input Time stamped clinical data and relevant
events - Output abstracted temporal intervals
- Identifies past and present trends and states
- Supports decisions based on temporal patterns,
such as - modify therapy if the patient has a second
episode of Grade II bone-marrow toxicity lasting
more than 3 weeks
18Cost of expert systems
- Resume uses a tool called Protégé for experts
specifying knowledge - Protégé allows experts define an ontology and
then specify domain knowledge using that ontology - It is expensive to build Resume like tools and
therefore data interpretation is an expensive
step in building our data interpretation and
communication technology - As a cheaper alternative, simple levels of
interpretation can be achieved by using domain
knowledge in the form of thresholds on values - E.g, we can infer that a blood pressure value of
150 is high by knowing the threshold for normal
blood pressures
19ScubaText
- ScubaText is a research project in our department
- Task produces reports on interpretations of dive
data - Input Dive Computer Data similar to the data
you work with in the practicals - Output Reports contain textual presentations
combined with annotated graphical presentations
of dive data - Text presents an interpretation of the data/graph
- Annotations on the graph provide interpretations
of some of the phrases in the text
20ScubaText Prototype
- Prototype architecture is a four stage pipeline
Data Analysis analysing raw data for required
features/patterns Data Interpretation mapping
the data features/patterns to the actual dive
features e.g. saw tooth pattern (data pattern)
mapped to poor buoyancy control rating the dive
based on the dive features e.g. long bottom
times receive low rating
21TextAnnotated Graphics
Risky dive with some minor problems. Because your
bottom time of 12.0min exceeds no-stop limit by
4.0min this dive is risky. But you performed the
ascent well. Your buoyancy control in the bottom
zone was poor as indicated by saw tooth
patterns marked A on the depth-time profile.
22Dive Data Interpretation
- Features and patterns (abstractions) extracted
using data analysis methods cannot be
communicated to the user directly - No use in saying you have a saw tooth pattern on
your dive profile - If the user is ignorant of the implications of
that pattern - Abstractions need to be mapped to domain features
- Useful to say Your buoyancy control in the
bottom zone was poor as indicated by saw tooth
patterns - A given abstraction may have more than one
interpretation - Saw tooth pattern could be either due to poor
buoyancy control or the result of the diver
moving up and down inspecting a ship wreck - Knowledge of the dive context and user modelling
can help in disambiguating data interpretation
23Summary
- Domain knowledge is required for
- Controlling data analysis and
- Performing data interpretation
- User modelling and contextual knowledge play a
major role in data interpretation - However, the cost of incorporating domain
knowledge needs to be justified by the human
effort needed to interpret factual information - If humans can interpret factual information
easily, our technology should not attempt to
build expensive interpretation modules - However because users vary in their
interpretation capacity, for some classes of
users interpretation might always be required