Title: CS 4300 INFO 4300 Information Retrieval
1CS 4300 / INFO 4300 Information Retrieval
Discussion 10 User Interfaces and Visualization
2Course Administration
Assignment 4 has been posted If you have not
received an automated email message giving a
password to the Center for Advanced Computing and
the Hadoop cluster, please send email to the
course team.
3Course Administration
Assignment 4 The mechanics of running a Hadoop
program are unfortunately rather complex and we
strongly urge you to get started early on this
assignment. See The materials that describe
the Hadoop cluster are new. Please send
corrections to the course team. After the
discussion class this evening there will be a
short workshop for people who are new to Linux.
http//www.infosci.cornell.edu/courses/info4300/20
08fa/HadoopHints.html
4Discussion Classes
Format Question Ask a member of the class to
answer. Provide opportunity for others to
comment. When answering Stand up. Give your
name. Make sure that the TA hears it. Speak
clearly so that all the class can
hear. Suggestions Do not be shy at presenting
partial answers. Differing viewpoints are
welcome.
5Question 1 Information Visualization
What is the role of information visualization in
information retrieval? Describe a typical use
(good or bad) of brushing and
linking panning and zooming focus-plus-context
magic lenses animation
6Question 2 The Standard Model of Information
Retrieval
- The author has several criticisms of the standard
model. What are they? - The author claims that this is the model used by
Web search engines. Do you agree? - Later she disagrees with this assessment. What
does she say about learning during the search?
7Question 3 The Berry Picking Model of
Information Seeking
- The berry picking model is an alternative model
of information seeking. It has two main
components. What are they? - How do browsing, querying, navigating, and
scanning fit into this model? - What does this model teach us about the design of
information retrieval systems?
8Search / Scan / Browse model
Return objects
Return hits
Browse content
Scan results
Search index
9Question 4 Query Formulation
- What are the problems faced in designing an
interface for Boolean queries? - What are the strengths and weaknesses of each of
the following command language, form fill-in,
menu selection, direct manipulation, natural
language? - The author describes the following techniques for
direct manipulation. Explain each of them and
its effectiveness. - Venn diagrams
- filter flow
- magic lenses as filters
10Question 5 Relevance Feedback
Explain the concept of relevance feedback. What
are the benefits or disadvantages of opaque or
transparent interfaces to relevance
feedback? Explain the concept of fetching
relevant information in the background. What
are group relevance judgments? How does it relate
to PageRank? What is pseudo-relevance feedback?
11Question 6 Graphical Overviews
- The author is an expert in information
visualization, yet she has some reservations
about visualization as a tool for exploring large
collections of information. - Describe some of the strengths of
visualization. - Describe some of the problems.
- How much do the problems appear to be fundamental
and how much are they short term difficulties
with current systems?
12Question 7 Future Directions
- In the final section, the author expresses her
ideas about the future directions of research
into user interfaces and visualization. - (a) What trends does she see?
- The chapter was published in 1999. How do you
think - that these ideas might be different if the
chapter were - written today?
13Introduction to using Linux and the Hadoop
cluster
14The Web Lab Hadoop Cluster
File Server cacfs01.cac.cornell.edu
mount local directory
Personal Computer
local directory
ssh connection
Hadoop Cluster wl01.cac.cornell.edu
15Notes on using Linux for beginners
Information about the Hadoop Cluster is
at http//www.infosci.cornell.edu/hadoop/ In
particular see the documentation on Preparing a
MapReduce job from a Macintosh computer Preparing
a MapReduce job from a Windows computer Hadoop
hints for CS/Info 4300 These guides explain how
to open a Linux shell (terminal) window on
Windows or Macintosh computers. A useful guide
on Linux for beginners is http//www.cgi101.com/
help/unixhelp.html
16Basic File Operations
pwd print working directory ls list the current
directory ls William list the directory called
William man ls show the manual pages for
ls (space for next page, q to quit manual) ls
la list -l (long) and -a (all files) cd
William/Program08 change the working directory
to William/Program08 cd / change to the root
(/) directory cd change to the user's home
directory cd .. change to the parent
directory . the current directory
17Basic File Operations
cat file1 list file1 more file1 list
file1 (space for next page, q to quit) Repeating
commands !! repeat the previous
command !c repeat the last command beginning
"c" up-arrow or down-arrow retrieve previous
commands for editing
18Running Programs
To run a program give the name of an executable
file and a list of arguments Suppose that the
hadoop executable is in the file hd/bin/hadoop
Then a typical command to run a hadoop job
is hd/bin/hadoop jar Indexer.jar setup.Indexer
docstiny temp rm r temp remove recursively (-r)
the directory temp and all files in it
19Copying files to the cluster
See the system diagram on Slide 14. There are
two ways to copy a file to your local directory
on the cluster. The scp command scp Indexer.jar
wya2_at_wl01.cac.cornell.eduindtemp.jar secure copy
(scp) the file Indexer.jar from the current
directory to user wya2 _at_ the named system
indtemp.jar http//www.infosci.cornell.edu/courses
/info4300/2008fa/scp.html Mount the file server
on your desk top See the instructions for
Windows or Macintosh computers.
20Using the Hadoop cluster
ssh wya2_at_wl01.cac.cornell.edu connect using ssh
protocol to user wya2 at wl01.cac.cornell.edu Y
ou now have access to two file systems. The
local file system (which is actually stored on
the file server). Use standard Linux commands,
such as on slides 16 and 17. The Hadoop
distributed file system. (See next slide.)
21The Hadoop distributed file system (dfs)
hadoop dfs lists the dfs options hadoop dfs
-ls list the current directory hadoop dfs -ls
/info4300 list the directory /info4300 hadoop dfs
-rmr temp remove recursively the directory temp
and all files within it hadoop dfs -copyFromLocal
file1 file2 copy file1 from the local file
system to the dfs file system and call it
file2 hadoop dfs -copyToLocall file1 file2 copy
file1 from the dfs file system to the local file
system and call it file2
22Running a Hadoop job
To run a job directly hadoop jar Indexer.jar
setup.Indexer /info4300/docstiny
temp Indexer.jar is the name of the jar
file setup.Indexer is the name of the main
class the other fields are argumtents to the
program Please use Hadoop on Demand for all small
jobs and class assignments. See next slide.
23Hadoop on Demand
Create a shell script hadoop dfs rmr
temp hadoop jar Indexer.jar setup.Indexer
/info4300/docstiny temp One way is to use the
vim editor. See http//www.vim.org vim
hoddemo create a file called hoddemo chmod x
hodddemo change permissions to make hoddemo
executable
24Hadoop on Demand
Run Hadoop on Demand with the shell script hod
script -d /hdfs/wya2 -n 8 -s hoddemo -d
specifies the home directory of the user
(/hdfs/wya2) -n specifies the number of cluster
nodes (8) -s specifies the shell script (hoddemo)