Title: Lecture 22: Interfaces for Information Retrieval I
1Lecture 22 Interfaces for Information Retrieval I
SIMS 202 Information Organization and Retrieval
- Prof. Ray Larson Prof. Marc Davis
- UC Berkeley SIMS
- Tuesday and Thursday 1030 am - 1200 pm
- Fall 2002
- http//www.sims.berkeley.edu/academics/courses/is2
02/f03/
2Lecture Overview
- Review of Last Time
- Web Search Engines and Algorithms
- Interfaces for Information Retrieval
- Introduction to HCI
- Why Interfaces Dont Work
- Early Visions Memex
- Discussion Questions
- Action Items for Next Time
Credit for some of the slides in this lecture
goes to Marti Hearst
3Lecture Overview
- Review of Last Time
- Web Search Engines and Algorithms
- Interfaces for Information Retrieval
- Introduction to HCI
- Why Interfaces Dont Work
- Early Visions Memex
- Discussion Questions
- Action Items for Next Time
Credit for some of the slides in this lecture
goes to Marti Hearst
4Directories vs. Search Engines
- Directories
- Hand-selected sites
- Search over the contents of the descriptions of
the pages - Organized in advance into categories
- Search Engines
- All pages in all sites
- Search over the contents of the pages themselves
- Organized after the query by relevance rankings
or other scores
5Challenges for Web Searching Data
- Distributed data
- Volatile data/Freshness 40 of the web changes
every month - Exponential growth
- Unstructured and redundant data 30 of web pages
are near duplicates - Unedited data
- Multiple formats
- Commercial biases
- Hidden data
6Challenges for Web Searching Users
- Users unfamiliar with search engine interfaces
(e.g., Does the query apples oranges mean the
same thing on all of the search engines?) - Users unfamiliar with the logical view of the
data (e.g., Is a search for Oranges the same
things as a search for oranges?) - Many different kinds of users
7Web Search Queries
- Web search queries are SHORT
- 2.4 words on average (Aug 2000)
- Has increased, was 1.7 (1997)
- User expectations
- Many say the first item shown should be what I
want to see! - This works if the user has the most
popular/common notion in mind
8Search Engines
- Crawling
- Indexing
- Querying
9Standard Web Search Engine Architecture
Check for duplicates, store the documents
DocIds
crawl the web
user query
create an inverted index
Inverted index
Search engine servers
Show results To user
10Google
- Google maintains (currently) the worlds largest
Linux cluster (over 15,000 servers) - These are partitioned between index servers and
page servers - Index servers resolve the queries (massively
parallel processing) - Page servers deliver the results of the queries
- Over 3 Billion web pages are indexed and served
by Google
11Starting Points What is Really Being Used?
- Todays search engines combine these methods in
various ways - Integration of directories
- Today most web search engines integrate
categories into the results listings - Lycos, MSN, Google
- Link analysis
- Google uses it others are also using it
- Words on the links seems to be especially useful
- Page popularity
- Many use DirectHits popularity rankings
12Ranking Link Analysis
- Assumptions
- If the pages pointing to this page are good, then
this is also a good page - The words on the links pointing to this page are
useful indicators of what this page is about - References Page et al. 98, Kleinberg 98
13Ranking Link Analysis
- Why does this work?
- The official Toyota site will be linked to by
lots of other official (or high-quality) sites - The best Toyota fan-club site probably also has
many links pointing to it - Less high-quality sites do not have as many
high-quality sites linking to them
14Lecture Overview
- Review of Last Time
- Web Search Engines and Algorithms
- Interfaces for Information Retrieval
- Introduction to HCI
- Why Interfaces Dont Work
- Early Visions Memex
- Discussion Questions
- Action Items for Next Time
Credit for some of the slides in this lecture
goes to Marti Hearst
15Drawing the Circles
16Drawing the Circles
17Drawing the Circles
18Drawing the Circles
19Drawing the Circles
20Drawing the Circles
21Drawing the Circles
22Drawing the Circles
23Drawing the Circles
24Human-Computer Interaction (HCI)
- Human
- The end-users of a program
- The others in the organization
- The designers of the program
- Computer
- The machines the programs run on
- Interaction
- The users tell the computers what they want
- The computers communicate results
- The computer may also tell users what the
computer wants them to do
25What is HCI?
26Shneiderman on HCI
- Well-designed interactive computer systems
- Promote
- Positive feelings of success
- Competence
- Mastery
- Allow users to concentrate on their work,
exploration, or pleasure, rather than on the
system or the interface
27Design Guidelines
- Set of design rules to follow
- Apply at multiple levels of design
- Are neither complete nor orthogonal
- Have psychological underpinnings (ideally)
28Shneidermans Design Principles
- Provide informative feedback
- Permit easy reversal of actions
- Support an internal locus of control
- Reduce working memory load
- Provide alternative interfaces for expert and
novice users
29HCI for IR
- Information seeking is an imprecise process
- UI should aid users in understanding and
expressing their information needs - Help formulate queries
- Select among available information sources
- Understand search results
- Keep track of the progress of their search
30Provide Informative Feedback
- About
- The relationship between query specification and
documents retrieved - Relationships among retrieved documents
- Relationships between retrieved documents and
metadata describing collections
31Reduce Working Memory Load
- Provide mechanisms for keeping track of choices
made during the search process - Allow users to
- Return to temporarily abandoned strategies
- Jump from one strategy to the next
- Retain information and context across search
sessions - Provide browsable information that is relevant to
the current stage of the search process - Related terms or metadata
- Search starting points (e.g., lists of sources,
topic lists)
32Interfaces For Expert And Novice Users
- Simplicity vs. power tradeoffs
- Scaffolded user interface
- How much information to show the user?
- Number and complexity of user operations
- Variants of operations
- Inner workings of system itself
- System history
- Example
- Television remote control
33User Differences
- Abilities, preferences, predilections
- Spatial ability
- Memory
- Reasoning abilities
- Verbal aptitudes
- Personality differences
- Age, gender, ethnicity, class, sexuality,
culture, education - Modalilty preferences/restrictions
- Vision, audition, speech, gesture, haptics,
locomotion
34Nielsens Usability Slogans
- Your best guess is not good enough
- The user is always right
- The user is not always right
- Users are not designers
- Designers are not users
- Less is more
- Details matter
(from Nielsens Usability Engineering)
35Who Builds UIs?
- A team of specialists (ideally)
- Graphic designers
- Interaction / interface designers
- Technical writers
- Marketers
- Test engineers
- Software engineers
- Enthnographers
- Cognitive psychologists
36How to Design and Build UIs
- Task analysis
- Rapid prototyping
- Evaluation
- Implementation
Iterate at every stage!
37Task Analysis
- Observe existing work practices
- Create examples and scenarios of actual use
- Try out new ideas before building software
38Rapid Prototyping
- Build a mock-up of design
- Low fidelity techniques
- Paper sketches
- Cut, copy, paste
- Video segments
- Interactive prototyping tools
- Visual Basic, HyperCard, Director, etc.
- UI builders
- NeXT, etc.
39Evaluation Techniques
- Qualitative vs. quantitative methods
- Qualitative (non-numeric, discursive,
ethnographic) - Focus groups
- Interviews
- Surveys
- User observation
- Participatory design sessions
- Quantitative (numeric, statistical, empirical)
- User testing
- System testing
40Qualitative Questions
- User experience
- User preferences
- User recommendations
- Design dialogue
41Quantitative Questions
- Precision
- Recall
- Time required to learn the system
- Time required to achieve goals on benchmark tasks
- Error rates
- Retention of the use of the interface over time
42Lecture Overview
- Review of Last Time
- Web Search Engines and Algorithms
- Interfaces for Information Retrieval
- Introduction to HCI
- Why Interfaces Dont Work
- Early Visions Memex
- Discussion Questions
- Action Items for Next Time
Credit for some of the slides in this lecture
goes to Marti Hearst
43Why Interfaces Dont Work
- Because
- We still think of using the interface
- We still talk of designing the interface
- We still talk of improving the interface
- We need to aid the task, not the interface to
the task. - The computer of the future should be invisible.
44Norman on Design Priorities
- The userwhat does the person really need to have
accomplished? - The taskanalyze the task. How best can the job
be done?, taking into account the whole setting
in which it is embedded, including the other
tasks to be accomplished, the social setting, the
people, and the organization. - As much as possible, make the task dominate make
the tools invisible. - Then, get the interaction right, making things
the right things visible, exploiting affordances
and constraints, providing the proper mental
models, and so onthe rules of good design for
the user, written about many, many times in many,
many places.
45Lecture Overview
- Review of Last Time
- Web Search Engines and Algorithms
- Interfaces for Information Retrieval
- Introduction to HCI
- Why Interfaces Dont Work
- Early Visions Memex
- Discussion Questions
- Action Items for Next Time
Credit for some of the slides in this lecture
goes to Marti Hearst
46What Dr. Bush Foresees
- Cyclops Camera
- Worn on forehead, it would photograph anything
you see and want to record. Film would be
developed at once by dry photography. - Microfilm
- It could reduce Encyclopaedia Britannica to
volume of a matchbox. Material cost 5. Thus a
whole library could be kept in a desk. - Vocoder
- A machine which could type when talked to. But
you might have to talk a special phonetic
language to this mechanical supersecretary. - Thinking machine
- A development of the mathematical calculator.
Give it premises and it would pass out
conclusions, all in accordance with logic. - Memex
- An aid to memory. Like the brain, Memex would
file material by association. Press a key and it
would run through a trail of facts.
47Memex
48Memex Detail
49Cyclops Camera
50Vocoder Supersecretary
51Investigator at Work
- One can now picture a future investigator in his
laboratory. His hands are free, and he is not
anchored. As he moves about and observes, he
photographs and comments. Time is automatically
recorded to tie the two records together. If he
goes into the field, he may be connected by radio
to his recorder. As he ponders over his notes in
the evening, he again talks his comments into the
record. His typed record, as well as his
photographs, may be both in miniature, so that he
projects them for examination.
52Memex
- A memex is a device in which an individual
stores all his books, records, and
communications, and which is mechanized so that
it may be consulted with exceeding speed and
flexibility. It is an enlarged intimate
supplement to his memory.
53Associative Indexing
- associative indexing, the basic idea of
which is a provision whereby any item may be
caused at will to select immediately and
automatically another. This is the essential
feature of memex. The process of tying two items
together is the important thing.
54The WWW circa 1945
- It is exactly as though the physical items had
been gathered together from widely separated
sources and bound together to form a new book.
But it is more than this for any item can be
joined into numerous trails, the trails can
bifurcate, and they can give birth to side
trails. - Wholly new forms of encyclopaedias will appear,
ready-made with a mesh of associative trails
running them, ready to be dropped into the memex
and there amplified.
55Selection
- The heart of the problem, and of the personal
machine we have here considered, is the task of
selection. And here, in spite of great progress,
we are still lame. - Selection, in the broad sense, is still a stone
adze in the hands of a cabinetmaker. - Memex Revisited (Bush 1965)
56Interaction Paradigms for IR
- Direct manipulation
- Query specification
- Query refinement
- Result selection
- Delegation
- Agents
- Recommender systems
- Filtering
57The Adaptive Memex
- In an adaptive Memex, the owner has delegated to
the machine the ability to propose or effect
changes in the stored information. By analogy to
business practice, the Memex is said to be
functioning as an agent (Kay, 1984). The machine
is playing an autonomous role within a restricted
charter to attempt a more effective organization
of the information based on observations of
actual use and topical similarities.
58Lecture Overview
- Review of Last Time
- Web Search Engines and Algorithms
- Interfaces for Information Retrieval
- Introduction to HCI
- Why Interfaces Dont Work
- Early Visions Memex
- Discussion Questions
- Action Items for Next Time
Credit for some of the slides in this lecture
goes to Marti Hearst
59Discussion Questions
- Alison Billings on MIR 10.1 10.3
- In section 10 of Modern Information Retrieval
Marti A. Hearst touches on the difficulty
untrained users face in doing Boolean searches
(i.e. the misinterpretation of OR and AND, nets
being cast too wide or too narrow) so I thought
it best to rely on both our experience and the
reading to address the following questions In
doing the Boolean searches for assignment 8, did
you use the KWIC search function to help you sort
through the documents you retrieved? Did it help
you find the information you needed? Did you
have to reformat your Boolean queries several
times in order for them to return the results you
expected? Is it reasonable to expect users to
continue use Boolean searches when there are more
effective search methods available?
60Discussion Questions
- danah boyd on Why Interfaces Dont Work
- While Norman frames his argument through users,
tasks, invisible tools, and make the right things
visible, his examples are quite flawed. - He spent the majority of the paper talking about
the problems with set-up. Yesterday, i purchased
a brand new 12" Mac to replace my battered one.
Turned it on it worked and connected to my
wireless. Put a cable between it and my old one
and sucked off all of the data, including the
programs. I installed 2 new programs. Inserted
them into the CD drive and dragged them from the
disk to my Applications folder. They worked.
Brand new machine and it was immediately
functional and identical to my old one in less
than 2 hours (copy time). Even the proprietary
stuff like my Audible.com files just asked me if
i wanted to assign them to this new machine. - I opened up a Sidekick yesterday. Turned it on.
It connected to T-Mobile, told me what my email
address was, told me to sign on to AIM and voila
it worked. - Is set-up really the problem?
61Discussion Questions
- danah boyd on Why Interfaces Dont Work
- Norman argues to put the user first. What user?
Can you really design a mass-produced item that
takes into consideration all users who use it? - Take the keyboard. What size is chosen? I have
small fingers and yet it's hard to find small
keyboards. - What are the consequences of designing for an
"average" user?
62Discussion Questions
- danah boyd on Why Interfaces Dont Work
- Norman argues for a comparison to RL tasks,
making the task the priority. - Users have vastly different sets of tasks that
they want, but the majority of computer consumers
use their computer to 1) communicate (email, IM,
chatrooms, voice over IP) 2) find information on
the Web (surf). - Neither of these tasks has a comparable off-line
equivalent. How can you do a task-first analysis
without an interface when you don't have an
offline model to work with? What are the
problems with modeling this behavior off of
physical metaphors? - For example, we've conceptualized email to be a
metaphor to mail. This has created more problems
than trying to design for an entirely new
behavior.
63Discussion Questions
- Jeff Towle on As We May Think
- Vannevar Bush throws out quite a few ideas in
this piece. A large portion of his piece is an
analysis of instances where and idea was not
feasible at the time, but was later built into
something successful. Is this the case with
Bush's ideas? They were clearly not feasible
when he wrote this, but are they now? - Many of Bush's proposals sound very familiar.
His description of 'dry photography' seems to
closely match digital photography technology.
His description of information trails is quite
similar to hypertext. But are we still missing
some of Bush's great ideas?
64Discussion Questions
- Denise Green on Memex II
- Vannevar Bush suggests that Memex "...merely
supplements a human memory, does so precisely
and comprehensively, and aids the process of
recollection." At the heart of Memex are
information trails, which Bush believes are
similar in nature to trails of association in our
brains. How does this model compare with current
ideas about how the brain works? - How are Memex trails related to today's
hypertext, as used commonly on the Internet? How
are they dissimilar?
65Discussion Questions
- Ryan Shaw on Memex Revisited
- Bush emphasizes compression and rapid access as
the two most important developments for
data-handling technology. In retrospect, he seems
to have given networking short shrift. Given
Bush's uncanny vision, why did he overlook the
importance of the network? - While the Memex is a marvelous idea, Bush's
article on the topic betray certain biases in his
views on who uses information and why. How might
these biases have affected his beliefs about 1)
the feasibility of actually building a Memex-like
system and 2) the effects of such a system, were
it to be built?
66Lecture Overview
- Review of Last Time
- Web Search Engines and Algorithms
- Interfaces for Information Retrieval
- Introduction to HCI
- Why Interfaces Dont Work
- Early Visions Memex
- Discussion Questions
- Action Items for Next Time
Credit for some of the slides in this lecture
goes to Marti Hearst
67Next Time HCI For IR
- Browsing
- Visualizing collections and documents
- Navigating collections and documents
- Searching
- Formulating queries
- Visualizing results
- Navigating results
- Refining queries
- Selecting results
68Next Time HCI for IR
- Interfaces for Information Retrieval
- Readings
- MIR 10.4 10.10