Title: SI 503 Search and Retrieval
1SI 503 Search and Retrieval
- Prof. George W. Furnas
- Prof. Amy Warner
- Qiping Zhang
- Mark Handel
2SI 503 Search and RetrievalOutline for the Day
- Welcome and brief intro to ourselves and the
course - Mechanics of the course syllabus, requirements,
etc. - Exercise Search is everywhere you look - Part 1
- -- Break 1 --
- Exercise Search is everywhere you look - Part 2
- Exercise Search, Scale and Structure - Parts
1,2,3 - -- Break 2 --
- Why SI students should care about search and
retrieval - The Bigger Picture
- How different searches fit together
- How search fits with other activities
- Looking to next week...
3Welcome and Brief Intro to Ourselves and the
Course
- Welcome to 503!
- Who We Are
- Instructors
- Prof. George W. Furnas
- Prof. Amy Warner
- TAs
- Qiping Zhang
- Mark Handel
- About the course...
4Foundations Sequence
- Use of Information (501)--concepts, issues and
practices aimed at providing an understanding the
actual use of information in real work settings - Choice and Learning (502)--examines how
information affects rational choice making and
how rational choice theory can be applied to the
design and management of information systems - Search and Retrieval (503)--looks at search and
retrieval in formation systems as a continuous
process, ranging from concepts and procedures
integral to human-mediated search, to the basic
issues and mechanisms in collection search, to
the data structures and algorithms necessary to
automate the search and retrieval process - Social Systems and Collections (504)--considers
collections of information resources in the
broadest sense of the term, and the fundamental
social processes within which such collections
are embedded and the processes that shape their
creation - Design and Management of Information Systems and
Services (505)--prepares professionals to invent,
develop, and implement new systems and services
and manage their ongoing operation
5Background and Motivation for SI 503
- Serves as a gateway course for all
specializations--Library and Information Services
(LIS), Archives and Records Management (ARM),
Human-Computer Interaction (HCI), Multi-Agent
Systems Design (MAS), Economics of Information
(EI) - Helps us determine the scope, magnitude and
specific content of Search and Retrieval in
this emerging, synergistic combination of fields - Is primarily based on concepts, issues,
principles, and theories, rather than specific
tools, techniques and practices, which are
covered in advanced courses in specific
specializations - Covers both professional and research literatures
and perspectives
6About Search and retrieval
- Why are search and retrieval important?
- In its most general form, looking for and getting
things are significant parts of much human
activity - from cavepersons hunting and gathering food
- to scholars seeking previous literature,
mathematicians seeking a proof, or engineers
seeking a good design - Hierarchy of goals, reach an impasse, seek a
resolution - We want to give foundations for understanding
role of information technology and information
professions in this activity
7Approach of this Course
- This course looks at search and retrieval from a
variety of perspectives - Of use to professionals dedicated to
- making information, technology, and people work
together more successfully. - Range from understanding
- how humans search the external visual world
- and their internal memories
- to fundamentals of both conceptual and
computational aspects - of electronic information search and retrieval
- to navigational search
- to social and organizational memory and retrieval
processes.
8Mechanics of the course syllabus, requirements,
etc.
- One line at course website
- http//madison.si.umich.edu/Transfer/503Lecture01
-intro.ppt.sit - Lets go look...
9Mechanics of the course syllabus, requirements,
etc.
- IMPORTANT
- One more thing - always bring to class
- your copy of the readings, for discussion
- some blank paper and pen/pencil for exercises
10Exercise Search Is Everywhere You Look
11Exercise Search Is Everywhere You Look
- Part 1
- In pairs, brainstorm and write down as many
examples of search as you can come up with be
as broad as you can, being inclusive of all
disciplinary and professional perspectives and
real life as well (15 min.) - As a class, share our lists of examples, making a
combined list (10 min.)
12-- Break 1 --
- We will restart promptly in 10 minutes!
13Exercise Search Is Everywhere You Look
- Part 2
- In pairs again, try to determine some general
categories or dimensions along which you would
group the search examples we have generated (10
min) - As a class, share our findings (5 min)
14Discussion
- What makes search hard vs. easy?
- OPTIONAL Discuss How could info tech play a
role? - OPTIONAL Talk about search v. retrieval
- distinction
- examples
15Exercise Search, Scale and Structure
16Exercise Search, Scale and Structure
- Part 1 - Brute Force Search
- N 1 volunteers
- 1 searcher
- N people to form collections of search items
- Collection Line up. First 3 stand, rest
squat/sit... - Searcher Find the person whose last name would
come just before yours in alphabetical order - Try again with 10 search items (10 standing)
17Exercise Search, Scale and Structure
- Part 1 - The Brute Force Search (cont.)
- The Brute Force List-Search Algorithm
- 1 Go to beginning of line of people
- 2 Let your best so far be nothing
- 3 Ask person in front of you his/her name
- 4 If it is before you alphabetically and closer
than best so far, or if best so far is
nothing, - Remember the new name as the new best so far
- 5 If you are not at the end,
- Move to next person
- Go to Step 3
- If you are at the end, best so far is your
target (or if that is nothing, you are first in
the ordering)
18Exercise Search, Scale and Structure
- Part 1 - The Brute Force Search (cont.)
- Discussing the Algorithm
- Structure
- setup, iteration, stopping condition
- Important properties
- Well Definedness Do all the steps have clear,
unambiguous meaning? - Correctness Does it do the right thing?
- Completeness Does it work for all inputs?
- Complexity How much resource does it take as the
size of the input, N, gets larger? - Time
- Space
19Exercise Search, Scale and Structure
- Part 2 - The Sort
- Collection Everyone stand
- We are going to sort you alphabetically Right to
Left (our L-R) - Parallel Sort Algorithm
- The set up
- 1 count off from your right by twos (base 2 -)
- 2 All 0s raise left hand, 1s raise right hand
- 3 find the hand nearest yours
- 4 hold it (and put your hands down)
20Exercise Search, Scale and Structure
- Part 2 - The Sort (cont.)
- Now the actual sort part of the algorithm
- 5 Ask your partner her/his name
- 6 If you are out of alphabetical order, switch
places - 7 If there were any switches...
- everyone hold your current partners hand
- raise your free hand
- grab the nearest free hand (if there is one)
- drop your old partner
- you now have a new partner
- go to to Step 5, and repeat
- If there were no switches, you are done!
21Exercise Search, Scale and Structure
- Part 2 - The Sort (cont.)
- Discussing the Algorithm
- Structure
- setup, iteration, stopping condition
- Important properties
- Well Definedness Do all the steps have clear,
unambiguous meaning? - Correctness Does it do the right thing?
- Completeness Does it work for all inputs?
- Complexity How much resource does it take as the
size of the input, N, gets larger? - Time
- Space
22Exercise Search, Scale and Structure
- Part 3 - The Search
- Binary Search of a Sorted List
- Searcher
- 1 Go to the person in the middle of the standing
row - 2 Ask his/her name
- 3 If he/she is before you alphabetically,
- tell all those before (but not including)
him/her to sit - If he/she is after you alphabetically
- tell him/her and all those after to sit
- 4 If there is more than one person standing
- go to step 1
- If only one is standing, he/she is your
target! - (If no one is standing, you are alphabetically
first.)
23Exercise Search, Scale and Structure
- Part 3 - The Binary Search (cont.)
- Discussing the Algorithm
- Structure
- setup, iteration, stopping condition
- Important properties
- Well Definedness Do all the steps have clear,
unambiguous meaning? - Correctness Does it do the right thing?
- Completeness Does it work for all inputs?
- Complexity How much resource does it take as the
size of the input, N, gets larger? - Time
- Space
24Exercise Search, Scale and Structure
- Conclusions
- Scale Hurts - as N gets large, harder to find
things - e.g., Brute force is O(N)
- 10 items takes approx. 10 time units
- 100 items takes approx. 100 time units
- 1,000 items takes approx. 1,000 time units
- 1,000,000 items takes approx. 1,000,000 time
units - Organizing (e.g., sorting) takes up front effort
- But, can lead to much more efficient search
- e.g., binary search of sorted list is O(logN)
- 10 items take approx. 3 time units
- 100 takes approx. 7 time units
- 1,000 takes approx. 10 time units
- 1,000,000 takes approx. 20 time units
25-- Break 2 --
- We will restart promptly in 10 minutes!
26Why SI students should care about Search and
Retrieval
- Prep for future courses and specializations
- HCI
- Human mem/vis search
- Better HCI for large systems, helping users
- search for functionality, information,
- Design Space search
- CS/AI/ProblemSpace search for Intelligent
interfaces - MultiAgent Systems
- Multiagent Information search
- CS/AI/ProblemSpace search
- Design space search
27Why Care? (cont.)
- Organizational Behavior
- How organizations maintain and access their
accumulated knowledge about how to conduct their
work - Information Economics
- Optimization search
- Producer-consumer matching
- Economically optimized search multiagent search
- Cost structure of information and search
28Why Care? (cont.)
- The Collection Perspective
- Documents and collections are traditionally
represented by fairly static mechanisms (i.e.,
often by human or computer-generated surrogates).
What would happen if we used concepts and
methods outside this traditional paradigm to
visualize virtual documents and collections? - Documents are traditionally organized within
collections on the basis of topical or
disciplinary similarity of items (LIS), on the
statistical correlation of the words they contain
(LIS), or on the basis of the organization of the
institution from which they came (ARM). What
would happen if we designed classification and
other organizational schemes based on what we
know about how human memory works?
29Why Care? (cont.)
- The Computer Science Perspective
- Designers and developers, as well as service
providers, of electronic information systems,
need to know about the fundamental ways of - structuring and organizing information in a
system (data structures) - searching basic structures and organizations
efficiently (algorithms) - These same information professionals need to know
about some of the basic properties of structures
and algorithms - This is part of algorithmic thinking, which is
fundamental to understanding both the feasibility
and method for implementing a particular logical
organization and search approach in an actual
information system
30The Bigger Picture
- How different searches fit together
- How search fits with other activities
31 How Different Searches Fit Together in The Big
Picture of Search
- Mini EXERCISE (10min)
- Look for interactions between the search examples
(or close their variants) - Ways they dovetail
- Ways the compete/complement
- Pair (5min) and share (5min) list
32Why care about the Big Picture of Search?
- Rethinking the big picture in this age of change
- In a stable world, things get optimized over time
- established decomposition, compartmentalized,
routinized - Physical libraries hold books, orgd and searched
a particular way - e.g., Dont hold comic books, your personal mail,
picture archives - In a changing world -- all thrown up in the air
- taking a new view of the big picture...
- new decompositions, new syntheses
- e.g., should these things be treated more
uniformly, integrated into other activities
differently...
33Why care about the Big Picture of Search? (cont.)
- You should understand what the big picture and
the various interrelationships - so the world can have more useful, integrated
support tools - E.g.,
- Human Memory search (for search terms)
- Followed by computerized IR search (for docs
containing those terms) - Followed by Human Visual search (of resulting doc
lists) - Support it all with new IT design
- That is, so you can better
- develop more integrated support tools yourselves,
perhaps - look for them, as others develop them
- evaluate them, as they are proposed
- use them better, once you have them
- teach and encourage others to do the same
34How to think about the Big Picture...A Quick
Intro to the MoRAS
- We live in a world of many Responsive Adaptive
Systems - Peoples heads, HCI systems, organizations,
economies, culture, ... - Each studied separately by disciplines
represented in the school - Each system, by being a RAS, has rough equivalent
of - motivational mechanisms, choice behavior,
sensors, effectors, and - memory with storage,
- and (yes!) Search and Retrieval capabilities.
- Instructive to examine analogies
- The different RASs fit together
- Not on different planets
- but linked or coupled together in a single super
system we inhabit - a Mosaic of Responsive Adaptive Systems (MoRAS)
35The MoRAS
- Example New technology introduced, like
Caller-ID - Because of couplings, many parts of the MoRAS
change - Each is perturbed, and responds and adapts
- Peoples heads and beh, laws, new technology,
marketplace, new businesses,... - Moreover - essence of Coupling in the MoRAS is
Information - so altering IT alters the fundamental structure
of the Mosaic - If we want to design in this env must understand
the MoRAS - MoRAS search example
- Human Memory -gtIR -gt Human Visual search
- compete/complement each other...
- all of this is our design space
- Grander cost structure of information
36Role SR play in the Bigger Task Picture
(optional)
- Mini-EXERCISE (pair and share)
- What is not search?
- List some examples of Non-Search activities
- E.g., get inspiration from daily activities, or
from the other foundation courses - Info Needs
- Choice, Learning
- Social Systems, Collections
- Design, Management
- Consider
- What is role of SR in these?
- What is role of these in SR?
37Looking to next week...
- Week 2--January 15, 1998 (AW)
- Topic Collection Search and Retrieval-- I
- Discussion of the notions of collection/document/i
nformation space from the LIS and archives
perspectives, including definitions of what
documents and collections are and how they are
currently represented in a variety of systems and
using a variety of mechanisms - Readings
- Hagler Simmons (1991) ch. 2 8
- Warwick Framework Dublin Core
http9/12/09/www.bibsys.no/warwick.html - owley (1992) ch. 12 Miller (ch. 2 3)
- Dont forget to bring readings, and paper and
pen/pencil - Food Volunteers???