Lecture 22: Interfaces for Information Retrieval I - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 22: Interfaces for Information Retrieval I

Description:

Lecture 22: Interfaces for Information Retrieval I SIMS 202: Information Organization and Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS – PowerPoint PPT presentation

Number of Views:252
Avg rating:3.0/5.0
Slides: 69
Provided by: ValuedGa234
Category:

less

Transcript and Presenter's Notes

Title: Lecture 22: Interfaces for Information Retrieval I


1
Lecture 22 Interfaces for Information Retrieval I
SIMS 202 Information Organization and Retrieval
  • Prof. Ray Larson Prof. Marc Davis
  • UC Berkeley SIMS
  • Tuesday and Thursday 1030 am - 1200 pm
  • Fall 2002
  • http//www.sims.berkeley.edu/academics/courses/is2
    02/f03/

2
Lecture Overview
  • Review of Last Time
  • Web Search Engines and Algorithms
  • Interfaces for Information Retrieval
  • Introduction to HCI
  • Why Interfaces Dont Work
  • Early Visions Memex
  • Discussion Questions
  • Action Items for Next Time

Credit for some of the slides in this lecture
goes to Marti Hearst
3
Lecture Overview
  • Review of Last Time
  • Web Search Engines and Algorithms
  • Interfaces for Information Retrieval
  • Introduction to HCI
  • Why Interfaces Dont Work
  • Early Visions Memex
  • Discussion Questions
  • Action Items for Next Time

Credit for some of the slides in this lecture
goes to Marti Hearst
4
Directories vs. Search Engines
  • Directories
  • Hand-selected sites
  • Search over the contents of the descriptions of
    the pages
  • Organized in advance into categories
  • Search Engines
  • All pages in all sites
  • Search over the contents of the pages themselves
  • Organized after the query by relevance rankings
    or other scores

5
Challenges for Web Searching Data
  • Distributed data
  • Volatile data/Freshness 40 of the web changes
    every month
  • Exponential growth
  • Unstructured and redundant data 30 of web pages
    are near duplicates
  • Unedited data
  • Multiple formats
  • Commercial biases
  • Hidden data

6
Challenges for Web Searching Users
  • Users unfamiliar with search engine interfaces
    (e.g., Does the query apples oranges mean the
    same thing on all of the search engines?)
  • Users unfamiliar with the logical view of the
    data (e.g., Is a search for Oranges the same
    things as a search for oranges?)
  • Many different kinds of users

7
Web Search Queries
  • Web search queries are SHORT
  • 2.4 words on average (Aug 2000)
  • Has increased, was 1.7 (1997)
  • User expectations
  • Many say the first item shown should be what I
    want to see!
  • This works if the user has the most
    popular/common notion in mind

8
Search Engines
  • Crawling
  • Indexing
  • Querying

9
Standard Web Search Engine Architecture
Check for duplicates, store the documents
DocIds
crawl the web
user query
create an inverted index
Inverted index
Search engine servers
Show results To user
10
Google
  • Google maintains (currently) the worlds largest
    Linux cluster (over 15,000 servers)
  • These are partitioned between index servers and
    page servers
  • Index servers resolve the queries (massively
    parallel processing)
  • Page servers deliver the results of the queries
  • Over 3 Billion web pages are indexed and served
    by Google

11
Starting Points What is Really Being Used?
  • Todays search engines combine these methods in
    various ways
  • Integration of directories
  • Today most web search engines integrate
    categories into the results listings
  • Lycos, MSN, Google
  • Link analysis
  • Google uses it others are also using it
  • Words on the links seems to be especially useful
  • Page popularity
  • Many use DirectHits popularity rankings

12
Ranking Link Analysis
  • Assumptions
  • If the pages pointing to this page are good, then
    this is also a good page
  • The words on the links pointing to this page are
    useful indicators of what this page is about
  • References Page et al. 98, Kleinberg 98

13
Ranking Link Analysis
  • Why does this work?
  • The official Toyota site will be linked to by
    lots of other official (or high-quality) sites
  • The best Toyota fan-club site probably also has
    many links pointing to it
  • Less high-quality sites do not have as many
    high-quality sites linking to them

14
Lecture Overview
  • Review of Last Time
  • Web Search Engines and Algorithms
  • Interfaces for Information Retrieval
  • Introduction to HCI
  • Why Interfaces Dont Work
  • Early Visions Memex
  • Discussion Questions
  • Action Items for Next Time

Credit for some of the slides in this lecture
goes to Marti Hearst
15
Drawing the Circles
16
Drawing the Circles
17
Drawing the Circles
18
Drawing the Circles
19
Drawing the Circles
20
Drawing the Circles
21
Drawing the Circles
22
Drawing the Circles
23
Drawing the Circles
24
Human-Computer Interaction (HCI)
  • Human
  • The end-users of a program
  • The others in the organization
  • The designers of the program
  • Computer
  • The machines the programs run on
  • Interaction
  • The users tell the computers what they want
  • The computers communicate results
  • The computer may also tell users what the
    computer wants them to do

25
What is HCI?
26
Shneiderman on HCI
  • Well-designed interactive computer systems
  • Promote
  • Positive feelings of success
  • Competence
  • Mastery
  • Allow users to concentrate on their work,
    exploration, or pleasure, rather than on the
    system or the interface

27
Design Guidelines
  • Set of design rules to follow
  • Apply at multiple levels of design
  • Are neither complete nor orthogonal
  • Have psychological underpinnings (ideally)

28
Shneidermans Design Principles
  • Provide informative feedback
  • Permit easy reversal of actions
  • Support an internal locus of control
  • Reduce working memory load
  • Provide alternative interfaces for expert and
    novice users

29
HCI for IR
  • Information seeking is an imprecise process
  • UI should aid users in understanding and
    expressing their information needs
  • Help formulate queries
  • Select among available information sources
  • Understand search results
  • Keep track of the progress of their search

30
Provide Informative Feedback
  • About
  • The relationship between query specification and
    documents retrieved
  • Relationships among retrieved documents
  • Relationships between retrieved documents and
    metadata describing collections

31
Reduce Working Memory Load
  • Provide mechanisms for keeping track of choices
    made during the search process
  • Allow users to
  • Return to temporarily abandoned strategies
  • Jump from one strategy to the next
  • Retain information and context across search
    sessions
  • Provide browsable information that is relevant to
    the current stage of the search process
  • Related terms or metadata
  • Search starting points (e.g., lists of sources,
    topic lists)

32
Interfaces For Expert And Novice Users
  • Simplicity vs. power tradeoffs
  • Scaffolded user interface
  • How much information to show the user?
  • Number and complexity of user operations
  • Variants of operations
  • Inner workings of system itself
  • System history
  • Example
  • Television remote control

33
User Differences
  • Abilities, preferences, predilections
  • Spatial ability
  • Memory
  • Reasoning abilities
  • Verbal aptitudes
  • Personality differences
  • Age, gender, ethnicity, class, sexuality,
    culture, education
  • Modalilty preferences/restrictions
  • Vision, audition, speech, gesture, haptics,
    locomotion

34
Nielsens Usability Slogans
  • Your best guess is not good enough
  • The user is always right
  • The user is not always right
  • Users are not designers
  • Designers are not users
  • Less is more
  • Details matter

(from Nielsens Usability Engineering)
35
Who Builds UIs?
  • A team of specialists (ideally)
  • Graphic designers
  • Interaction / interface designers
  • Technical writers
  • Marketers
  • Test engineers
  • Software engineers
  • Enthnographers
  • Cognitive psychologists

36
How to Design and Build UIs
  • Task analysis
  • Rapid prototyping
  • Evaluation
  • Implementation

Iterate at every stage!
37
Task Analysis
  • Observe existing work practices
  • Create examples and scenarios of actual use
  • Try out new ideas before building software

38
Rapid Prototyping
  • Build a mock-up of design
  • Low fidelity techniques
  • Paper sketches
  • Cut, copy, paste
  • Video segments
  • Interactive prototyping tools
  • Visual Basic, HyperCard, Director, etc.
  • UI builders
  • NeXT, etc.

39
Evaluation Techniques
  • Qualitative vs. quantitative methods
  • Qualitative (non-numeric, discursive,
    ethnographic)
  • Focus groups
  • Interviews
  • Surveys
  • User observation
  • Participatory design sessions
  • Quantitative (numeric, statistical, empirical)
  • User testing
  • System testing

40
Qualitative Questions
  • User experience
  • User preferences
  • User recommendations
  • Design dialogue

41
Quantitative Questions
  • Precision
  • Recall
  • Time required to learn the system
  • Time required to achieve goals on benchmark tasks
  • Error rates
  • Retention of the use of the interface over time

42
Lecture Overview
  • Review of Last Time
  • Web Search Engines and Algorithms
  • Interfaces for Information Retrieval
  • Introduction to HCI
  • Why Interfaces Dont Work
  • Early Visions Memex
  • Discussion Questions
  • Action Items for Next Time

Credit for some of the slides in this lecture
goes to Marti Hearst
43
Why Interfaces Dont Work
  • Because
  • We still think of using the interface
  • We still talk of designing the interface
  • We still talk of improving the interface
  • We need to aid the task, not the interface to
    the task.
  • The computer of the future should be invisible.

44
Norman on Design Priorities
  1. The userwhat does the person really need to have
    accomplished?
  2. The taskanalyze the task. How best can the job
    be done?, taking into account the whole setting
    in which it is embedded, including the other
    tasks to be accomplished, the social setting, the
    people, and the organization.
  3. As much as possible, make the task dominate make
    the tools invisible.
  4. Then, get the interaction right, making things
    the right things visible, exploiting affordances
    and constraints, providing the proper mental
    models, and so onthe rules of good design for
    the user, written about many, many times in many,
    many places.

45
Lecture Overview
  • Review of Last Time
  • Web Search Engines and Algorithms
  • Interfaces for Information Retrieval
  • Introduction to HCI
  • Why Interfaces Dont Work
  • Early Visions Memex
  • Discussion Questions
  • Action Items for Next Time

Credit for some of the slides in this lecture
goes to Marti Hearst
46
What Dr. Bush Foresees
  • Cyclops Camera
  • Worn on forehead, it would photograph anything
    you see and want to record. Film would be
    developed at once by dry photography.
  • Microfilm
  • It could reduce Encyclopaedia Britannica to
    volume of a matchbox. Material cost 5. Thus a
    whole library could be kept in a desk.
  • Vocoder
  • A machine which could type when talked to. But
    you might have to talk a special phonetic
    language to this mechanical supersecretary.
  • Thinking machine
  • A development of the mathematical calculator.
    Give it premises and it would pass out
    conclusions, all in accordance with logic.
  • Memex
  • An aid to memory. Like the brain, Memex would
    file material by association. Press a key and it
    would run through a trail of facts.

47
Memex
48
Memex Detail
49
Cyclops Camera
50
Vocoder Supersecretary
51
Investigator at Work
  • One can now picture a future investigator in his
    laboratory. His hands are free, and he is not
    anchored. As he moves about and observes, he
    photographs and comments. Time is automatically
    recorded to tie the two records together. If he
    goes into the field, he may be connected by radio
    to his recorder. As he ponders over his notes in
    the evening, he again talks his comments into the
    record. His typed record, as well as his
    photographs, may be both in miniature, so that he
    projects them for examination.

52
Memex
  • A memex is a device in which an individual
    stores all his books, records, and
    communications, and which is mechanized so that
    it may be consulted with exceeding speed and
    flexibility. It is an enlarged intimate
    supplement to his memory.

53
Associative Indexing
  • associative indexing, the basic idea of
    which is a provision whereby any item may be
    caused at will to select immediately and
    automatically another. This is the essential
    feature of memex. The process of tying two items
    together is the important thing.

54
The WWW circa 1945
  • It is exactly as though the physical items had
    been gathered together from widely separated
    sources and bound together to form a new book.
    But it is more than this for any item can be
    joined into numerous trails, the trails can
    bifurcate, and they can give birth to side
    trails.
  • Wholly new forms of encyclopaedias will appear,
    ready-made with a mesh of associative trails
    running them, ready to be dropped into the memex
    and there amplified.

55
Selection
  • The heart of the problem, and of the personal
    machine we have here considered, is the task of
    selection. And here, in spite of great progress,
    we are still lame.
  • Selection, in the broad sense, is still a stone
    adze in the hands of a cabinetmaker.
  • Memex Revisited (Bush 1965)

56
Interaction Paradigms for IR
  • Direct manipulation
  • Query specification
  • Query refinement
  • Result selection
  • Delegation
  • Agents
  • Recommender systems
  • Filtering

57
The Adaptive Memex
  • In an adaptive Memex, the owner has delegated to
    the machine the ability to propose or effect
    changes in the stored information. By analogy to
    business practice, the Memex is said to be
    functioning as an agent (Kay, 1984). The machine
    is playing an autonomous role within a restricted
    charter to attempt a more effective organization
    of the information based on observations of
    actual use and topical similarities.

58
Lecture Overview
  • Review of Last Time
  • Web Search Engines and Algorithms
  • Interfaces for Information Retrieval
  • Introduction to HCI
  • Why Interfaces Dont Work
  • Early Visions Memex
  • Discussion Questions
  • Action Items for Next Time

Credit for some of the slides in this lecture
goes to Marti Hearst
59
Discussion Questions
  • Alison Billings on MIR 10.1 10.3
  • In section 10 of Modern Information Retrieval
    Marti A. Hearst touches on the difficulty
    untrained users face in doing Boolean searches
    (i.e. the misinterpretation of OR and AND, nets
    being cast too wide or too narrow) so I thought
    it best to rely on both our experience and the
    reading to address the following questions In
    doing the Boolean searches for assignment 8, did
    you use the KWIC search function to help you sort
    through the documents you retrieved? Did it help
    you find the information you needed? Did you
    have to reformat your Boolean queries several
    times in order for them to return the results you
    expected? Is it reasonable to expect users to
    continue use Boolean searches when there are more
    effective search methods available?

60
Discussion Questions
  • danah boyd on Why Interfaces Dont Work
  • While Norman frames his argument through users,
    tasks, invisible tools, and make the right things
    visible, his examples are quite flawed.
  • He spent the majority of the paper talking about
    the problems with set-up. Yesterday, i purchased
    a brand new 12" Mac to replace my battered one.
    Turned it on it worked and connected to my
    wireless. Put a cable between it and my old one
    and sucked off all of the data, including the
    programs. I installed 2 new programs. Inserted
    them into the CD drive and dragged them from the
    disk to my Applications folder. They worked.
    Brand new machine and it was immediately
    functional and identical to my old one in less
    than 2 hours (copy time). Even the proprietary
    stuff like my Audible.com files just asked me if
    i wanted to assign them to this new machine.
  • I opened up a Sidekick yesterday. Turned it on.
    It connected to T-Mobile, told me what my email
    address was, told me to sign on to AIM and voila
    it worked.
  • Is set-up really the problem?

61
Discussion Questions
  • danah boyd on Why Interfaces Dont Work
  • Norman argues to put the user first. What user?
    Can you really design a mass-produced item that
    takes into consideration all users who use it?
  • Take the keyboard. What size is chosen? I have
    small fingers and yet it's hard to find small
    keyboards.
  • What are the consequences of designing for an
    "average" user?

62
Discussion Questions
  • danah boyd on Why Interfaces Dont Work
  • Norman argues for a comparison to RL tasks,
    making the task the priority.
  • Users have vastly different sets of tasks that
    they want, but the majority of computer consumers
    use their computer to 1) communicate (email, IM,
    chatrooms, voice over IP) 2) find information on
    the Web (surf).
  • Neither of these tasks has a comparable off-line
    equivalent. How can you do a task-first analysis
    without an interface when you don't have an
    offline model to work with? What are the
    problems with modeling this behavior off of
    physical metaphors?
  • For example, we've conceptualized email to be a
    metaphor to mail. This has created more problems
    than trying to design for an entirely new
    behavior.

63
Discussion Questions
  • Jeff Towle on As We May Think
  • Vannevar Bush throws out quite a few ideas in
    this piece. A large portion of his piece is an
    analysis of instances where and idea was not
    feasible at the time, but was later built into
    something successful. Is this the case with
    Bush's ideas? They were clearly not feasible
    when he wrote this, but are they now?
  • Many of Bush's proposals sound very familiar.
    His description of 'dry photography' seems to
    closely match digital photography technology.
    His description of information trails is quite
    similar to hypertext. But are we still missing
    some of Bush's great ideas?

64
Discussion Questions
  • Denise Green on Memex II
  • Vannevar Bush suggests that Memex "...merely
    supplements a human memory, does so precisely
    and comprehensively, and aids the process of
    recollection." At the heart of Memex are
    information trails, which Bush believes are
    similar in nature to trails of association in our
    brains. How does this model compare with current
    ideas about how the brain works?
  • How are Memex trails related to today's
    hypertext, as used commonly on the Internet? How
    are they dissimilar?

65
Discussion Questions
  • Ryan Shaw on Memex Revisited
  • Bush emphasizes compression and rapid access as
    the two most important developments for
    data-handling technology. In retrospect, he seems
    to have given networking short shrift. Given
    Bush's uncanny vision, why did he overlook the
    importance of the network?
  • While the Memex is a marvelous idea, Bush's
    article on the topic betray certain biases in his
    views on who uses information and why. How might
    these biases have affected his beliefs about 1)
    the feasibility of actually building a Memex-like
    system and 2) the effects of such a system, were
    it to be built?

66
Lecture Overview
  • Review of Last Time
  • Web Search Engines and Algorithms
  • Interfaces for Information Retrieval
  • Introduction to HCI
  • Why Interfaces Dont Work
  • Early Visions Memex
  • Discussion Questions
  • Action Items for Next Time

Credit for some of the slides in this lecture
goes to Marti Hearst
67
Next Time HCI For IR
  • Browsing
  • Visualizing collections and documents
  • Navigating collections and documents
  • Searching
  • Formulating queries
  • Visualizing results
  • Navigating results
  • Refining queries
  • Selecting results

68
Next Time HCI for IR
  • Interfaces for Information Retrieval
  • Readings
  • MIR 10.4 10.10
Write a Comment
User Comments (0)
About PowerShow.com