Title: Mike Pazzani
1How to Evaluate a Mixed-initiative System?
Mike Pazzanis caution
- Dont lose sight of the goal.
- The metrics are just approximations of the goal.
- Optimizing the metric may not optimize the goal.
2Question What is the goal to be optimized?
Possible goals of mixed-initiative systems
General goal
Mixed-initiative systems integrate human and
automated reasoning to take advantage of their
complementary reasoning styles and computational
strengths.
More specific goal
Mixed-initiative systems combine the humans
experience, flexibility, creativity, with the
agents speed, memory, tirelessness to take
advantage of these complementary strengths.
Even more specific goal
Mixed-initiative systems increase humans speed,
memory, accuracy, competence, creativity
Other goals
The more precise the goal the easier to evaluate
it achievement.
3Question How to evaluate the goal (or claim)?
Mixed-initiative system X increases a humans
speed, memory, accuracy, competence, creativity
MI
- Sub-questions
- How to define and measure the speed, memory,
accuracy, competence, creativity , of the
human-system combination? - How to measure the relative contribution of the
human and the system to the emergent behavior? - (Is the overall performance mostly due to a smart
user, to a good system, or to both?)
4Compare to baseline behavior?
Measure and compare speed, memory, accuracy,
competence, creativity for solving a class of
problems in different settings
MI
Human alone
Agent alone
Mixed-initiative human-agent system
MI
MI-
Non mixed-initiative human-agent system
Ablated mixed-initiative human-agent system
5Other complex questions
Consider the setting
MI
Human alone (baseline)
Mixed-initiative human-agent system
How to account for human learning during baseline
evaluation? Use other humans? How to account
for human variability? Use many
humans? How to pay for the associated
cost??? Replace a human with a simulation? How
well does the simulation actually represents a
human? Since the simulation is not perfect,
how good is the result? How much does a good
simulation cost?
6Evaluation Framework for MI systems
Currently no such framework exists, but it may
emerge from generalization of specific cases.
Specific problem Knowledge authoring by subject
matter experts who do not have prior knowledge
engineering experience. Specific case Disciple
learning agent taught by a subject matter expert
to become a knowledge-based assistant.
The expert has knowledge but cannot formalize it
by himself.
The agent can help to formalize the knowledge.
Question What are the characteristics of good
case studies?