Title: Computational Discovery of Communicable Knowledge
1Experimental Studies of Integrated Cognitive
Systems
Pat Langley Computational Learning
Laboratory Center for the Study of Language and
Information Stanford University, Stanford,
California Elena Messina Intelligent Systems
Division National Institute of Standards and
Technology Gaithersburg, Maryland
Thanks to David Aha, Michael Genesereth, and
Barney Pell. This work was funded in part by
DARPA IPTO, which is not responsible for the
points made herein.
2Experimentation in Artificial Intelligence
- Controlled experiments are the primary evaluation
tool in modern AI, including the subfields of - supervised learning and reinforcement learning
- generative planning and scheduling
- computational linguistics and text processing
- but not for work on integrated cognitive
systems. - Extending experimental methods to the latter is
crucial, since it deals with the ultimate goals
of artificial intelligence.
3Challenges for Experimentation
- The reasons that experiments with integrated
cognitive systems have lagged behind are clear
from the phrase itself - systems are harder to evaluate than component
algorithms - cognitive methods involve complex, multi-step
reasoning - integrated software relies on interactions
among components. - Together, these factors have slowed the
development and wide acceptance of an
experimental framework. - In this talk, we propose the key elements of an
experimental method for the study of integrated
cognitive systems.
4Dependent Variables Basic Measures
- Dependent variables in an experiment measure
system behavior. - Some basic measures of integrated cognitive
systems include - success or failure on a given problem
- speed or efficiency of the systems response
- desirability or quality of the systems
response. - Such metrics provide the building blocks for more
sophisticated and informative measures of
behavior.
5Dependent Variables Combined Measures
- Statistics tells us we should not draw
conclusions from one case. - Collecting multiple samples supports combined
measures like - average behavior of the system
- cumulative behavior of the system
- variance of the systems behavior.
- Combined measures also partly cancel variation
due to unknown or uncontrolled factors. - However, this requires some population from which
samples are drawn, which one should always
specify clearly.
6Dependent Variables Higher-Order Metrics
- Combined measures present only a small window on
behavior. - However, one can also derive higher-order
measures such as - the slope and intercept with respect to a
control system - the intercept, rate, and asymptote of a
learning curve. - Such metrics let one summarize behavior even when
variation across samples is not systematic. - Conclusions about higher-order measures are more
important than ones about basic or combined
variables.
7Independent Variables Task Characteristics
- Independent variables in an experiment reflect
factors thought to influence system behavior. - An important class of factors are domain or task
features like - the complexity of the environment
- the difficulty of achieving a given task
- the resources available for pursuing the task.
- Experiments that vary these factors reveal how
the intelligent systems behavior depends on
them. - Synthetic domains let one alter such variables
systematically, but it is crucial that they be
similar to natural domains.
8Independent Variables System Characteristics
- Another important class of variables involves
system features. - Varying these factors leads to different types of
experiments - parametric studies (altering system
parameters) - lesion studies (removing a system component)
- replacement studies (replacing one module with
another). - Such experiments suggest ways that the
intelligent systems behavior depends on its
parameters and components. - Studies that vary two or more factors can reveal
interactions among them.
9Independent Variables System Knowledge
- A third class of factors concerns the knowledge
and experience of the intelligent system. - One can adapt lesion and replacement studies to
examine - the presence or absence of types of knowledge
- the amount of knowledge about a given subject
- the amount of experience with a class of tasks.
- Such experiments let one plot behavioral measures
as a function of knowledge and experience
(learning curves). - They also let one compute higher-order measures
such as rate of improvement and asymptotic
performance.
10Repositories for Cognitive Systems
- Public repositories are now common among the AI
subfields, and they offer clear advantages for
research by - providing fast and cheap materials for
experiments - supporting replication and standards for
comparison. - However, they can also produce undesirable side
effects by - focusing attention on a narrow class of
problems - encouraging a bake-off mentality among
researchers. - To support research on cognitive systems, we need
testbeds and environments designed to evaluate
general intelligence.
11Desirable Characteristics of Testbeds
- Testbeds that are designed to support research on
integrated cognitive systems should - include a variety of domains to ensure
generality - be well documented and simple for researchers
to use - have standard formats to ease interface with
systems. - However, these features are already present in
many existing repositories, and more work is
necessary.
12Desirable Characteristics of Testbeds
- In addition, testbeds for integrated cognitive
systems should - contain not data sets but task environments
- which support agents that exist over time
- at least some of which involve physical domains
- provide an infrastructure to ease
experimentation with - external databases (e.g., geographic information
systems) - controlled capture, replay, and restart of
scenarios - methods for recording performance measures
- Also, environments should have little or no
dependence on sensory processing.
13Physical vs. Simulated Environments
- For domains that involve external settings, one
can either a physical or a simulated environment
for evaluation. - Simulated environments have many advantages,
including - ability to vary domain parameters and physical
layout - ease of recording traces of behavior and
cognitive state. - One can make simulated environments more
realistic by - using simulators that support kinematics and
dynamics - including data from real sensors in analogous
locations. - This approach combines the relevance of physical
testbeds with the affordability of synthetic
ones.
14Some Promising Domains
- A number of domains hold promise for the
experimental study of integrated cognitive
systems - urban search and rescue (Balakirsky Messina,
2002) - flying aircraft on military missions (Jones et
al., 1999) - driving a vehicle in a city (Choi et al.,
2004) - playing strategy games (Aha Molineaux, 2004)
- general game playing (Genesereth, 2004).
- Each requires the integration of cognition,
perception, and action in a complex, dynamical
setting.
15Goals of Scientific Experimentation
- Science aims not to show that one method is
better than another, but to understand the
reasons for complex behavior. - This goal can best be achieved through
experimental studies that - ask clear questions or test specific hypotheses
- examine relations between behavior and
independent factors - move beyond descriptions to explanations of
phenomena - Good experiments provide insight into the reasons
that underlie system behavior. - Also, whether or not they support an hypothesis,
they do not end the story, but rather suggest
ideas for further studies.
16Concluding Remarks
In this talk, we considered the experimental
study of integrated cognitive systems, including
- challenges posed by their distinctive
characteristics - dependent measures that describe their
behavior - independent variables that influence this
behavior - the need for environments and testbeds that
- exercise the full capabilities of integrated
agents - evaluate their behavior at the system level
- support studies of interactions among components.
Taking these into account will transform the
study of integrated cognitive systems into a
well-balanced experimental science.
17End of Presentation