Title: Lecture 04: Knowledge Representation
1Lecture 04 Knowledge Representation
SIMS 202 Information Organization and Retrieval
- Prof. Ray Larson Prof. Marc Davis
- UC Berkeley SIMS
- Tuesday and Thursday 1030 am - 1200 am
- Fall 2003
Credits to Warren Sack for some of the slides in
this lecture
2Today
- Review of Categorization
- Knowledge Representation
- The Vocabulary Problem
- Commonsense
- Cyc
- Discussion Questions
- Phone Project Overview and Assignment 2
- Action Items for Next Time
3Today
- Review of Categorization
- Knowledge Representation
- The Vocabulary Problem
- Commonsense
- Cyc
- Discussion Questions
- Phone Project Overview and Assignment 2
- Action Items for Next Time
4Categorization
- Processes of categorization are fundamental to
human cognition - Categorization is messier than our computer
systems would like - Human categorization is characterized by
- Family resemblances
- Prototypes
- Basic-level categories
- Considering how human categorization functions is
important in the design of information
organization and retrieval systems
5Categorization
- Classical categorization
- Necessary and sufficient conditions for
membership - Generic-to-specific monohierarchical structure
- Modern categorization
- Characteristic features (family resemblances)
- Centrality/typicality (prototypes)
- Basic-level categories
6Properties of Categorization
- Family Resemblance
- Members of a category may be related to one
another without all members having any property
in common - Prototypes
- Some members of a category may be better
examples than others, i.e., prototypical
members
7Basic-Level Categorization
- Perception
- Overall perceived shape
- Single mental image
- Fast identification
- Function
- General motor program
- Communication
- Shortest, most commonly used and contextually
neutral words - First learned by children
- Knowledge Organization
- Most attributes of category members stored at
this level - Tends to be in the middle of a classification
hierarchy
8Today
- Review of Categorization
- Knowledge Representation
- The Vocabulary Problem
- Commonsense
- Cyc
- Discussion Questions
- Phone Project Overview and Assignment 2
- Action Items for Next Time
9Information Hierarchy
Wisdom
Knowledge
Information
Data
10Information Hierarchy
Wisdom
Knowledge
Information
Data
11Todays Thinkers/Tinkerers
George Furnas http//www.si.umich.edu/furnas/
Marvin Minsky http//web.media.mit.edu/minsky/
Doug Lenat http//www.cyc.com/staff.html
12The Birth of AI
- Rockefeller-sponsored Institute at Dartmouth
College, Summer 1956 - John McCarthy, Dartmouth (-gtMIT-gtStanford)
- Marvin Minsky, MIT (geometry)
- Herbert Simon, CMU (logic)
- Allen Newell, CMU (logic)
- Arthur Samuel, IBM (checkers)
- Alex Bernstein, IBM (chess)
- Nathan Rochester, IBM (neural networks)
- Etc.
13Definition of AI
- ... artificial intelligence AI is the science
of making machines do things that would require
intelligence if done by humans (Minsky, 1963)
14The Goals of AI Are Not New
- Ancient Greece
- Daedalus automata
- Judaisms myth of the Golem
- 18th century automata
- Singing, dancing, playing chess?
- Mechanical metaphors for mind
- Clock
- Telegraph/telephone network
- Computer
15Some Areas of AI
- Knowledge representation
- Programming languages
- Natural language understanding
- Speech understanding
- Vision
- Robotics
- Planning
- Machine learning
- Expert systems
- Qualitative simulation
16AI or IA?
- Artificial Intelligence (AI)
- Make machines as smart as (or smarter than)
people - Intelligence Amplification (IA)
- Use machines to make people smarter
17Today
- Review of Categorization
- Knowledge Representation
- The Vocabulary Problem
- Commonsense
- Cyc
- Discussion Questions
- Phone Project Overview and Assignment 2
- Action Items for Next Time
18Furnas The Vocabulary Problem
- People use different words to describe the same
things - If one person assigns the name of an item, other
untutored people will fail to access it on 80 to
90 percent of their attempts. - Simply stated, the data tell us there is no one
good access term for most objects.
19The Vocabulary Problem
- How is it that we come to understand each other?
- Shared context
- Dialogue
- How can machines come to understand what we say?
- Shared context?
- Dialogue?
20Vocabulary Problem Solutions?
- Furnas et al.
- Make the user memorize precise system meanings
- Have the user and system interact to identify the
precise referent - Provide infinite aliases to objects
- Minsky and Lenat
- Give the system commonsense so it can
understand what the users words can mean
21Lenat on the Vocabulary Problem
- The important point is that users will be able
to find information without having to be familiar
with the precise way the information is stored,
either through field names or by knowing which
databases exist, and can be tapped.
22Minsky on the Vocabulary Problem
- To make our computers easier to use, we must
make them more sensitive to our needs. That is,
make them understand what we mean when we try to
tell them what we want. If we want our
computers to understand us, well need to equip
them with adequate knowledge.
23Today
- Review of Categorization
- Knowledge Representation
- The Vocabulary Problem
- Commonsense
- Cyc
- Discussion Questions
- Phone Project Overview and Assignment 2
- Action Items for Next Time
24Commonsense
- Commonsense is background knowledge that enables
us to understand, act, and communicate - Things that most children know
- Minsky on commonsense
- Much of our commonsense knowledge information
has never been recorded at all because it has
always seemed so obvious we never thought of
describing it.
25Commonsense Example
- I want to get inexpensive dog food.
- The food is not made out of dogs.
- The food is not for me to eat.
- Dogs cannot buy their own food.
- I am not asking to be given dog food.
- I am not saying that I want to understand why
some dog food is inexpensive. - The dog food is not more than 5 per can.
26Engineering Commonsense
- Use multiple ways to represent knowledge
- Acquire huge amounts of that knowledge
- Find commonsense ways to reason with it
(knowledge about how to think)
27Multiple Representations
- Minksy
- I think this is what brains do instead Find
several ways to represent each problem and to
represent the required knowledge. Then when one
method fails to solve a problem, you can quickly
switch to another description. - Furnas
- But regardless of the number of commands or
objects in a system and whatever the choice of
their official names, the designer must make
many, many alternative verbal access routes to
each.
28Today
- Review of Categorization
- Knowledge Representation
- The Vocabulary Problem
- Commonsense
- Cyc
- Discussion Questions
- Phone Project Overview and Assignment 2
- Action Items for Next Time
29CYC
- Decades long effort to build a commonsense
knowledge-base - Storied past
- 100,000 basic concepts
- 1,000,000 assertions about the world
- The validity of Cycs assertions are
context-dependent (default reasoning)
30Cyc Examples
- Cyc can find the match between a user's query for
"pictures of strong, adventurous people" and an
image whose caption reads simply "a man climbing
a cliff" - Cyc can notice if an annual salary and an hourly
salary are inadvertently being added together in
a spreadsheet - Cyc can combine information from multiple
databases to guess which physicians in practice
together had been classmates in medical school - When someone searches for "Bolivia" on the Web,
Cyc knows not to offer a follow-up question like
"Where can I get free Bolivia online?"
31Cyc Applications
- Applications currently available or in
development - Integration of Heterogeneous Databases
- Knowledge-Enhanced Retrieval of Captioned
Information - Guided Integration of Structured Terminology
(GIST) - Distributed AI
- WWW Information Retrieval
- Potential applications
- Online brokering of goods and services
- "Smart" interfaces
- Intelligent character simulation for games
- Enhanced virtual reality
- Improved machine translation
- Improved speech recognition
- Sophisticated user modeling
- Semantic data mining
32Cycs Top-Level Ontology
- Fundamentals
- Top Level
- Time and Dates
- Types of Predicates
- Spatial Relations
- Quantities
- Mathematics
- Contexts
- Groups
- "Doing"
- Transformations
- Changes Of State
- Transfer Of Possession
- Movement
- Parts of Objects
- Composition of Substances
- Agents
- Organizations
- Actors
- Roles
- Professions
- Emotion
- Propositional Attitudes
- Social
- Biology
- Chemistry
- Physiology
- General Medicine
- Materials
- Waves
- Devices
- Construction
- Financial
- Food
- Clothing
- Weather
- Geography
- Transportation
- Information
- Perception
- Agreements
- Linguistic Terms
- Documentation
http//www.cyc.com/cyc-2-1/toc.html
33OpenCYC
- Cycs knowledge-base is now coming online
- http//www.opencyc.org/
- How could Cycs knowledge-base affect the design
of information organization and retrieval
systems?
34Today
- Review of Categorization
- Knowledge Representation
- The Vocabulary Problem
- Commonsense
- Cyc
- Discussion Questions
- Phone Project Overview and Assignment 2
- Action Items for Next Time
35Discussion Questions (Furnas)
- Alison Billings Vijay Viswanathan on Furnas
- Are unlimited alias indexes an effective design
solution to the problem of precision in "term
based" searches? Is it possible to implement
such a system that could maintain an accurate
relation (category) to the designers armchair
term with the existence of polysemy? Would the
adaptive nature of this solution propagate an all
inclusive alias category which could include all
accessible information in a particular index?
36Discussion Questions (Furnas)
- Alison Billings Vijay Viswanathan on Furnas
- Since the publishing of this article in 1987 the
technological advances in information retrieval
in the past 16 years have been profound. Is the
Vocabulary-Problem still a major issue in
Human-System Communication? Furnas, et al.,
provide some solutions to the Vocabulary Problem
such as unlimited aliasing, keyword
harvesting, and adaptive indices. But now
there are WYSIWYG interfaces such as Windows that
may reduce the need for command line word
choices, search engines that harvest the content
from web pages, or services like Google that put
out Did you mean xxxxx? when search results are
sparse. Has the Vocabulary Problem been solved?
37Discussion Questions (Minsky)
- Joseph Hall on Minsky
- Minsky talks a lot about commonsense. How would
you define what is within the commonsense? Do
you think that commonsense would be easy or
difficult to teach to a computer? Why? Is
commonsense a cross-cultural, basic-level
category in the sense of what Lakoff described?
Or is it more culturally specific (like "Don't
step in front of moving traffic.") and thus
harder to define? How would culturally-dependent
definitions of "commonsense" complicate Minsky's
theory? - Are machines that learn such a good thing? For
example, I would like my computer to learn
certain things (like how to fix common errors)
but not others (like how to play the stock market
with my bank account). Are ethics (cyber and
otherwise) to be programmed into learning
computers?
38Discussion Questions (Minsky)
- Joseph Hall on Minsky
- What Minsky describes is all fine and dandy...
but there seems to be a rather large gap between
the machines of today and the machines he is
postulating. To learn, machines would not only
have to be able to note (and take action) when
they are deviating from "operational parameter
space" (malfunctioning, blue screen of death,
etc.) but be able to decide on and implement a
solution to the problem at hand from a different
direction and/or using a different technique,
quickly.
39Discussion Questions (Minsky)
- Joseph Hall on Minsky
- Do you think that building such a
commonsense-aware machine is possible today?
(That is, is Minsky's model of a
commonsense-based machine a reasonable goal or
just an ideal?) If not, what are some of the
impediments to the realization of one of Minsky's
machines? - Do user expectations (reasonable or not) of what
a computer should be doing factor into this at
all?
40Discussion Questions (Lenat)
- Rebecca Shapley on Lenat
- What does this article imply for best-practices
in information organization retrieval? How
would you articulate the potential for a
commonsense knowledgebase to revolutionize
information retrieval? Does the premise of a
commonsense-base feeding efforts at machine
learning or natural language understanding make
sense to you? Which potential applications Lenat
mentions are compelling to you? - This article is from 1995 - do we hear anything
more about this CYC? Did it revolutionize things?
Why does Minsky call for a huge commonsense
knowledgebase in 2000 when CYC was nearly
complete in 1995?
41Discussion Questions (Lenat)
- Rebecca Shapley on Lenat
- How would you apply the conduit metaphor
toolmaker's paradigms to describe, or perhaps
critique, the CYC project? - If CYC is 'automating the whitespace in
documents' - capturing the context for
information, how would you describe the context
it is capturing? How would you describe where the
captured context is no longer applicable? How do
you feel about the notion that 10 people in Palo
Alto CA were able to describe your context? Do
you trust them with that task? Do you consider it
necessary that some shared automated context be
created? What challenges do you see for their
ostensible goal, or limitations do you see to
their approach?
42Discussion Questions (Lenat)
- Rebecca Shapley on Lenat
- Anything in particular you can imagine yourself
unwilling to have represented a particular way in
the commonsensebase? Let's say you believe in
reincarnation but the assertions in the
commonsensebase don't leave any room for this
idea, and how to interpret what you might say to
a bereaved friend. How do you feel about the
ability to 'automatically' interpret your
expression being left out? Does it make you feel
invisible, relieved, angry? What would be
necessary to have it be culturally sensitive, and
would that be encodable?
43Discussion Questions (Lenat)
- Rebecca Shapley on Lenat
- What can you piece together about how CYC is
implemented, how it makes decisions? What
questions do you still have about how it works? - Do you think the tone of the article was
influenced by the fact that Lenat was writing as
President of Cycorp? - So, can this common-sense-base 'think'? Is it
intelligent? Why and why not?
44Today
- Review of Categorization
- Knowledge Representation
- The Vocabulary Problem
- Commonsense
- Cyc
- Discussion Questions
- Phone Project Overview and Assignment 2
- Action Items for Next Time
45Assignment 0 Check-In
- Deliverables
- Personal web page
- Assignments page
- Email address
- Focus statement
- Online Questionnaire
46Phone Project Overview
- In this project we will be creating, sharing, and
reusing mobile media and metadata - You and your Project Group will design
application use scenarios and develop and refine
metadata frameworks for your photos - Some of you may even choose to develop retrieval
applications for the photo database in the second
half of the course - We will be using the Nokia 3650 mobile media
phone and software developed by Garage Cinema
Research
47Phone Project Overview
- In the SIMS 202 Phone Project you and your
Project Group will - Experience the actual process of information
organization and retrieval (especially as regards
metadata creation and use) - Work in small, focused teams performing a variety
of tasks in image acquisition, description, and
application design - Develop an ongoing resource for SIMS (an
annotated photo database) that can be used for
internal research and teaching, as well as for
external promotional and informational purposes
48Phone Project Requirements
- Create engaging and useful application scenarios
and photos for use by your team and the entire
class - The photos you take and the applications you will
design to use them should be interesting and
useful to you and your colleagues - Create a shared, reusable resource of annotated
photos - Design your metadata such that all photos are
accessible not only for the needs of your
particular application, but also for the
reusability of your photos and metadata by other
applications
49Phone Project Assignments
- Photo Use Scenario Application Idea (Assignment
2) - You will brainstorm and storyboard an application
for a mobile media device that accesses a server
and facilitates the creation, sharing, and reuse
of media and metadata. You will develop user
personas and scenarios of how the application
works and how the user experiences it. -
- Photo Capture and Annotation (Assignment 3)
- With the goals of your application and the
overall goals of the class project in mind, each
group member is required to take at least 5
pictures relevant to the scenario you specified
in the prior assignment. You will also get
hands-on experience in annotating photos using
the Mobile Media Metadata (MMM) framework, an
application available on the mobile phones. You
will also identify strengths and weaknesses of
MMM framework.
50Phone Project Assignments
- Photo Metadata Design (Assignment 4)
- Having your application and the overall project
goals in mind, you will design a suitable
metadata framework to annotate the photos in the
collection. You will also annotate more photos
using your metadata framework.
51Phone Project Assignments
- Project Presentations (Assignment 6)
- In a special class session, your group will
present your application ideas, metadata
frameworks, and annotated photos to your fellow
students using the Flamenco browser. Each group
will have about 10 minutes to present their
innovative work. - Metadata Consolidation (Assignment 8)
- You will consolidate your classification scheme
with those belonging to other groups. The entire
class will collaborate to create one overall
metadata framework which will be used to for
Phase II of the project.
52Phone Project Assignments
- Phone Project Phase II Application Selection
(Assignment 10) - The entire class will decide on an application to
implement from among the application ideas
presented by the various project groups as well
as from among any ideas you or your Project group
have come up with. - Phone Project Phase II Specification Design
(Assignment 13) - A group of class volunteers will draft
specifications and designs for the application
selected in the previous assignment. - Phone Project Phase II Implementation Testing
(Assignment 14) - A group of class volunteers will implement and
test the application selected in the previous
assignment.
53Assignment 2 Process
- Brainstorm application ideas
- Evaluate your ideas and agree on one to pursue
- Come up with a persona and scenario for your
application idea - Write a description of your application idea
involving one persona and one scenario - Draw a storyboard with explanatory text
- Document the results of your brainstorming
- Create your group website
54Assignment 2 Deliverables
- Brief description of the application idea you
selected - Persona description
- Scenario description
- Annotated storyboard
- Work distribution table
- List all brainstorming ideas and reasons for
selecting or rejecting each
55Assignment 2 Turning It In
- Submit an email to is202-ta_at_sims.berkeley.edu
with the following information (due September 16,
before class) - Group name
- URL of your group website
- URL to description (application, persona,
scenario), storyboard, brainstorming results,
work distribution table - Time it took you to complete the assignment
- Any comments on assignment (optional)
56Today
- Review of Categorization
- Knowledge Representation
- The Vocabulary Problem
- Commonsense
- Cyc
- Discussion Questions
- Phone Project Overview and Assignment 2
- Action Items for Next Time
57Homework (!)
- Read
- Word Association Norms, Mutual Information, and
Lexicography (Church, Kenneth and Hanks, Patrick) - Wordnet An Electronic Lexical Database --
Introduction Ch. 1 (C. Fellbaum, G.A. Miller)
(handout) - Assignment 2 Photo Use Scenario
- Due by Tuesday, September 16
58Next Time
- Lexical Relations and WordNet (RRL)