Title: Lecture 02: Information
1Lecture 02 Information
IS 202 Information Organization and Retrieval
- Prof. Ray Larson Prof. Marc Davis
- UC Berkeley SIMS
- Tuesday and Thursday 1030 am - 1200 am
- Fall 2003
- http//www.sims.berkeley.edu/academics/courses/is2
02/f03/
2Lecture Outline
- What Is Information?
- History of Information Search and Organization
- Discussion Questions
- Action Items for Next Time
3Lecture Outline
- What Is Information?
- History of Information Search and Organization
- Discussion Questions
- Action Items for Next Time
4What is Information?
- There is no correct definition
- Can involve philosophy, psychology, signal
processing, physics - Cookie Monsters definition
- news or facts about something
5What is Information?
- Oxford English Dictionary
- Information
- Informing, telling thing told, knowledge, items
of knowledge, news - Knowledge
- Knowing familiarity gained by experience
persons range of information a theoretical or
practical understanding of the sum of what is
known
6Assignment 1 - Discussion
- What is information, according to your background
or area of expertise?
7What Is Information?
- Relating data to a context (situational
interpretation) - Anything that is important to anyone
(significance) - World ?data ?information ?knowledge
- Requires community of interpretation
- All information is dependent on context
- Capable of being recorded and stored and
transmitted (also in physical form e.g.,
fossils) - Information must be recorded
- Information is a record of something that can be
reused - Information is a commodity
8What Is Information?
- Negentropy
- Potential energy to become knowledge
- Potential for it to be built upon
- Does information have to be related to true
data? - Can information be downgraded to data if it is
forgotten?
9Types of Information
- Differentiation by form
- Differentiation by content
- Differentiation by quality
- Differentiation by associated information
10Information Properties
- Information can be communicated electronically
- Broadcasting
- Networking
- Information can be easily duplicated and shared
- Problems of ownership
- Problems of control
Adapted from Silicon Dreams by Robert W. Lucky
11Intuitive Notion (Losee 97)
- Information must
- Be something, although the exact nature
(substance, energy, or abstract concept) is not
clear - Be new repetition of previously received
messages is not informative - Be true false or counterfactual information is
mis-information - Be about something
- This human-centered approach emphasizes meaning
and use of message
12Information from the Human Perspective
- Levels in cognitive processing
- Perception
- Observation/attention
- Reasoning, assimilating, forming inferences
- Knowledge
- Justified true belief
- Belief
- An idea held based on some support an internally
accepted statement, result of inductive
processes combining observed facts with a
reasoning process
13Information from the Human Perspective
- Does information require a human mind?
- Communication and information transfer among ants
- A tree falls in the forest is there information
there? - Existence of quarks
14Meaning vs. Form
- Form of information as the information itself
- Meaning of a signal vs. the signal itself
- What aspects of a document are information?
- Representation (Norman 93)
- Why do we write things down?
- Socrates thought writing would obliterate serious
thought - Sounds and gestures fade away
- Artifacts help us to reason
- Anything not present in the representation can be
ignored - Things left out of the representation are often
what we dont know how to represent
15Information
- Consider Borges infinite Library of Babel
- It has all possible data combinations of letters
- Does it therefore contain all possible
information? - What about all possible knowledge?
- What about wisdom?
- Is the Internet a prototype Library of Babel?
16Information Theory
- Claude Shannon, 1940s, studying communication
- Ways to measure information
- Communication producing the same message at its
destination as that seen at its source - Problem a noisy channel can distort the
message - Between transmitter and receiver, the message
must be encoded - Semantic aspects are irrelevant
Noise
Message Source
Desti- nation
Receiver
Trans- mitter
Channel
17Information Theory
- Better called Technical Communication Theory
- Communication may be over time and space
18Human Communication Theory?
19Communication Theory
- Encompasses a vast array of disciplines
- Mass communications, literary and media theory,
rhetoric, sociology, psychology, linguistics,
law, cognitive science, information science,
engineering, etc. - Questions
- What and how we communicate
- Why we communicate
- What happens when communication works and when
it doesnt - How to improve communication
20Why Study Communication Theory?
- Our understanding of what, how, and why we
communicate informs our - Theory of information and practice of information
production - Analysis, design, and evaluation of information
systems and applications - How we work together in teams
- How we read texts and talk with one another in
this course - Law and public policy
21Etymology of Communication
- Communication - c.1384, from O.Fr. communicacion,
from L. communicationem (nom. communicatio), from
communicare "to impart, share," lit. "to make
common," from communis (see common). - Common - 13c., from O.Fr. comun, from L. communis
"shared by all or many," from L. com- "together"
munia "public duties," those related to munia
"office." Alternate etymology is that Fr. got it
from P.Gmc. gamainiz (cf. O.E. gemæne), from PIE
kom-moini "shared by all," from base moi-,
mei- "change, exchange." - Remuneration - c.1400, from L. remunerationem,
from remunerari "to reward," from re- "back"
munerari "to give," from munus (gen. muneris)
"gift, office, duty." Remunerative is from 1677.
22What and How Do We Communicate?
- What gifts do we give each other?
- What do we do with these gifts?
- How does this gift exchange bring us together (or
not)?
23The Conduit Metaphor
- Language functions like a conduit, transferring
thoughts bodily from one person to another - In writing and speaking, people insert their
thoughts or feelings in the words - Words accomplish the transfer by containing the
thoughts or feelings and conveying them to others - In listening or reading, people extract the
thoughts and feelings once again from the words
24Conduit Metaphor Minor Frameworks
- Thoughts and feelings are ejected by speaking or
writing into an external idea space - Thoughts and feelings are reified in this
external space, so they exist independent of any
need for living beings to think or feel them - These reified thoughts and feelings may, or may
not, find their way back into the heads of living
humans
25Toolmakers Paradigm
26Semantic Pathology
- Semantic Pathology
- Whenever two or more incompatible senses capable
of figuring meaningfully in the same context
develop around the same name - Example
- This text is confusing.
- Text(1) The layout/font of the text is
confusing. - Text(2) The argument of the text is confusing.
- Question Where is Text(2)?
27Lecture Outline
- What Is Information?
- History of Information Search and Organization
- Discussion Questions
- Action Items for Next Time
28Origins Physical Representations
- Very early history of content representation
- Sumerian tokens and envelopes
- Alexandria - pinakes
- Indices
29Origins Mental Representations
- Rhetorical mnemonic theory and practice
(memoria) - Memory palaces
- An organization and retrieval technology for
concepts that combines physical and virtual
places (loci) - Examples
- Simonides of Ceos
- Ciceros testes
30Origins Bibliographic Representations
- Biblical indexes and concordances
- Hugo de St. Caro 1247 A.D. 500 monks KWOC
- Book indexes (Nuremburg Chronicle)
- Library catalogs
- Journal indexes
- Information explosion following WWII
- Bush and Memex
- Cranfield studies of indexing languages and
information retrieval - Development of bibliographic databases
- Index Medicus production and Medlars searching
31How Much Information Today?
- See report by Hal Varian and Peter Lyman
http//www.sims.berkeley.edu/research/projects/how
-much-info/ - Total annual information production including
print, film, magnetic media, etc. - Upper Bound 2,120,539 Terabytes (1012 bytes)
- Lower Bound 635,480 Terabytes
- I.e., between 1 and 2 Exabytes per year (1018
bytes) - How do we organize THIS?
32Lecture Outline
- What Is Information?
- History of Information Search and Organization
- Discussion Questions
- Action Items for Next Time
33Discussion Questions (Borges)
- Yuri Takhteyev on Borges
- How does Borges' view of information compares to
Shannon's (information as reducing uncertainty)? - Why does Borges arrange the books randomly? What
difference would it make in the story? (This
question is also raised by Dennett in the
Library of Mendel, so we may want to leave it
till that discussion) - What leads the Librarians to postulate the
existence of the Man of the Book? Does that
logic make sense?
34Discussion Questions (Borges)
- Yuri Takhteyev on Borges
- What is the significance of the sentence I
cannot combine some characters - htcmrlchtdj -
which the divine library has not foreseen and
which in one of its secret tongues do not contain
a terrible meaning? - What is the significance of the Librarian's
conclusion that the Library is unlimited and
cyclical?
35Discussion Questions (Dennett)
- Joshua Solomin on Dennett
- It is mentioned that books over 500 pages in
length can be represented in the Library by
having them span multiple Library volumes and
that by doing this, some Library volumes will be
reused. But Dennett (from Quine) reduces this
case to the case where the entire Library can be
represented by a 1 and a 0, simply reused in
different combinations. I would argue that this
reductive case is no longer useful, because you
then have to store the formulae for reproducing
each book from your 1 and 0, which would be just
as bad as storing the volumes themselves. So,
does this strategy of reducing the content of a
volume and re-using volumes help with the volume
of information at all? If so, at what point
between the 500-page volume and the 1-character
volume will the strategy break down? Or would it
be argued that it doesn't break down, but rather
the strategy is still useful when condensed to a
1 or 0?
36Discussion Questions (Dennett)
- Joshua Solomin on Dennett
- Dennett mentions even finding one readable
volume in this huge storehouse is unlikely in the
extreme. If no parse-able information can be
gleaned from a given volume (or piece of data),
is it still useful? Can it be said that some
piece of data is absolutely useless, or is it
more that we simply haven't yet developed an
encoding system that corresponds to it (that
would allow us to decode meaning from it)? Or
perhaps some third option? What could be a
possible strategy for declaring some volumes
useless, in order to reduce the scope of the
Library to something easier to deal with?
37Discussion Questions (Dennett)
- Joshua Solomin on Dennett
- It is observed that while Borges did not order
his Library, attempting to do so would have its
own problems associated with it. Dennett's
solution is a kind of alphabetizing, organized in
multiple dimensions. Is there some better way to
perform this sorting? Assuming that we didn't
want to have 1,000,000 dimensions to our file
cabinet (the number of characters per volume),
could we perform some kind of intelligent
grouping of volumes? What kind of metadata could
be developed from this sorted Library to
facilitate searching -- e.g., a section devoted
to books about whales, with subsections on books
involving sea captains as well as books involving
wooden boys who become human? Would this save us
anything over Dennett's alphabetizing?
38Discussion Questions (Reddy)
- Katherine Ahern and Brooke Maury on Reddy
- Is there any model of communication other than
the conduit metaphor and the toolmaker's
paradigm? Do these two visions leave any aspects
of communication out? - If information is not actually stored in the
'signal', then is the only value in this
transmitted matter how one interprets it?
39Discussion Questions (Reddy)
- Katherine Ahern and Brooke Maury on Reddy
- What is the value of information (ideas, data,
facts, etc.) without someone to receive, decode
and interpret that information? - Reddy seems to put the responsibility on the user
or consumer of information in terms of correct
interpretation. However, are there tools that can
be 'packaged' with the information, that can
assist in this unpacking? - How does one develop a common context from which
we can establish the rules or semantics of
information exchange?
40Discussion Questions (Reddy)
- Katherine Ahern and Brooke Maury on Reddy
- Reddy suggests that the increase in signals
(i.e., libraries, recordings, and mass
communication) have resulted in less culture,
because the skill of reconstructing or
extracting ideas is neglected. What are the
implications for information organization and
retrieval? Is it our job to somehow facilitate
this reconstruction? Does Reddy's analysis even
allow the possibility of facilitating extraction
of ideas? If so, how does one encode information
in such a way as to minimize the confusion and
lack of clarity around its meaning during
transmission and upon reception?
41Discussion Questions (Reddy)
- Katherine Ahern and Brooke Maury on Reddy
- Is Reddy's analogy of the evil magician
representing language appropriate? Are
subscribers to the conduit metaphor doomed to
think others hostile or insane? Perhaps the 'evil
magician' is our own laziness or failure to do
the work of communication.
42Lecture Outline
- What Is Information?
- History of Information Search and Organization
- Discussion Questions
- Action Items for Next Time
43Homework (!)
- Read Introduction and Chapters 1 2 of George
Lakoffs Women, Fire, and Dangerous Things - Create your SIMS home page
44Next Time
45Sign Up for Office Hours
- Prof. Marc Davis
- Thursdays 200 pm 400 pm
- 314 South Hall