Title: Edward Brent, Idea Works, Inc'
1Representing Metadata with Intelligent Agents
An Initial Prototype
- Edward Brent, Idea Works, Inc.
- BrentE_at_missouri.edu www.ideaworks.com
- Albert F. Anderson, Public Data Queries, Inc.
- afa_at_pdq.com www.pdq.com
- and G. Alan Thompson, Idea Works, Inc.
- PDQ-Explore is being developed by Public Data
Queries, Inc., with funding, in part, from Small
Business Innovation Research (SBIR) and Small
Business Technology Transfer Research (STTR)
grants from the National Institute of Child
Health and Human Development (NICHD) and the
National Institute on Aging (NIA) of the National
Institutes of Health (NIH). The awards are R41
HD32222, R41 HD32220, R43 HD33633, R43 AG13832,
R43 HD37311, R43 HD37738, and R43 HD38216
(pending).
2Introduction
- Digital representations of data make it possible
to have intelligent interactive social science
data capable of helping users formulate
questions, specify analyses, and interpret
findings. - This presentation describes an intelligent user
interface now under development for the
PDQ-Explore information system that tries to
achieve some of these objectives.
3Topics to be Covered
- The PDQ-Explore System
- Design Strategies
- Varieties of Help Offered
- Two Prototype Modules
4PDQ-Explore
- The PDQ-Explore information system combines
paralleled high performance processors, data
cached in random access memory, and efficient
retrieval algorithms capable of processing tens
of millions of records per second.
5PDQ-Explore (continued)
- Complex queries can be defined and executed in
real time to produce tabulations, summary
statistics, correlation matrices, and data
extracts. - The system is structured as a client-server
architecture with a graphical user interface
accessible over the World-Wide Web.
6PDQ-Explore InterfaceWorkspace Window
7PDQ-Explore InterfaceQuery Setup Window
8PDQ-Explore InterfaceQuery Results Window
9PDQ-Explore InterfaceQuery Details Window
10PDQ-Explore InterfaceCustom Item Setup Window
11PDQ-Explore InterfaceCustom Item Assignment
Window
12A Demonstration Version
- A demonstration version of PDQ-Explore with a
preliminary Web-based interface is accessible
from the Public Data Queries, Inc. home page at
www.pdq.com. - Example queries and the graphical-based client
program can also be downloaded from that site.
13Design Strategies
- The Vision Data as Agents
- Case-Based Reasoning
- Machine Learning
- Representation Metadata, XML, and Ontologies
14The Vision Data as Agents
- Agents as Intelligent Interface Managers
- Agents as Personal Assistants
- Agents Behind the Scenes
- Agent-to-Agent Communication
15An Agent-Enabled System Architecture
16Agents
- a computer program capable of acting on behalf of
the user to carry out tasks that have been
delegated to it - does not require the user to specify the task in
all of its detail
17Agents (continued)
- able to take an admittedly vague description of
the task from the user and infer what the user
means - translates a general description into what may be
many individual steps or tasks to perform - Bradshaw (19976)
18Case-Based Reasoning
- Find an old problem that is close in nature and
expected solution to what we anticipate for the
new problem based on similarities in - substantive problem
- data set
- specific items referenced
- methods used (tabulations, graphics, etc.)
- source of the query
- Then help the user tailor the solution to fit the
new problem
19Machine Learning
- Create a system that can grow over time and
evolve to meet the changing needs of users - Identify successful queries and incorporate them
into the knowledge base as new examples for use
by future users
20Representation Metadata, XML, and Ontologies
- Make digitized information readily accessible to
a wide range of users and intelligent autonomous
agent programs - Represent data using the extensible markup
language (XML) and a standardized ontology
21Varieties of Help Offered
- Interpret and Clarify the Query
- Identify Key Variables
- Identify Relevant Data Sets
- Point to Related Literature
- Similar Queries to Serve as Models
- Measurement, Indices, Recoding, and
Transformations - Appropriate Tables or Analyses
- Check Assumptions
- Structured Tutorials
22Interpret and Clarify the Query
- Tell the user how the program is interpreting
their problem statement - Identify key variables and types of analysis
implied by that - Ask the user for further clarifications as
required
23Identify Key Variables
- Identify specific variables in existing data sets
and link to terms used by the user - Provide extensive information on each variable
including its developmental history,
characteristics, and examples of its use in the
literature and in past queries
24Identify Relevant Data Sets
- List available data sets that include the
variables identified - Provide at the users request extensive
information on the data set including studies
from the literature using the dataset as well as
previous queries. - Permit the user to browse through the datasets
searching by various indices
25Point to Related Literature
- Intelligent agents would scan the literature
identifying studies on a broad range of relevant
topics as well as studies using these specific
data sets - The literature would be retrievable by data set,
by variable, by types of analysis, and by broad
topics - Users could scan the literature and examine how
they analyzed the data and key methodological
decisions they made.
26Similar Queries to Serve as Models
- The system would automatically incorporate
subsequent queries into its knowledge base - Relevant queries would be displayed to the user
to help them specify their own analysis - Queries would be selected that examine the same
variables, use the same data sets, or employ
similar forms of analysis
27Measurement, Indices, Recoding, and
Transformations
- The system would show users strategies that have
been used in the literature and in past queries
to handle common problems, such as - developing indices for key concepts
- transforming data to assure normality
- handling missing data
- recoding
28Appropriate Tables or Analyses
- The system will identify relevant analyses or
tables that appear appropriate for the problem - It would also point to examples of those in the
literature and in previous queries
29Check Assumptions
- Once the user selects the tables of analyses to
perform, the system can automatically check
important assumptions for the analysis such as
normality, level of measurement, and the number
of categories. - It can also point out other data issues such as
common transformations, missing data problems,
and so on.
30Structured Tutorials
- The system can point the user to structured
tutorials available over the Internet including
multimedia presentations. - These might include
- Multimedia presentations by experts
- Internet-Based Instructional Materials
- Multimedia Interactive Tutorials
31Module One Case-Based Reasoning to Identify
Relevant Example Queries
- Case-based reasoning strategies are used to
- 1) collect information from the user regarding
their objectives, - 2) identify and display existing queries that are
similar, and - 3)facilitate the user modifying the existing
query to accomplish their objectives.
32User Objectives
- User objectives are specified on this form.
- For example, lets indicate that the user will
examine a tabulation of individuals in 1990 in
all households, looking at educational
attainment, comparing across groups defined by
region.
33A Similar Query is Retrieved Displayed
34Modify the Query to Meet Objectives
- Note that this query is similar to the users
objectives in that it also examines a tabulation
of individuals in 1990 in all households,
comparing across groups defined by region
(cities). - The user can then change a few parameters of this
query to meet their objectives.
35Module 2 Advice Regarding Recoding or
Transformations
- This module uses information from the user to
determine whether recoding is indicated and what
the objectives of the recoding should be.
36The Recoding Control Panel
- Users select conditions that characterize their
study using this control panel.
37Clicking a phrase shows details
38Recommendations Can Be Detailed or Brief
39Example Recoding Race
40Summary and Overview
- This framework provides a plan for developing an
interface that takes full advantage of digital
databases. - These two prototypes illustrate how this program
will work - We are proposing to develop the complete system
in a Phase II STTR grant from the National
Institute of Child Health and Human Development
to Public Data Queries, Inc.