Title: Citation Indexing
1Citation Indexing
- Nitish Mathew
- Thanks to
- Dr. C. Lee Giles
- Dr. Paul Cohen
2Outline
- Introduction to Citation Indexing
- What is Citation Indexing
- Concept
- Web of Science
- Bias
- Autonomous Citation Indexing
- Future Application
- Technology Forecasting
- Summary
3Why do literature search?
- Avoid unwitting duplication of research
- Wasted time, effort funds
- Plagiarism issues
4Concept of Citations
- Citations symbolize the conceptual association of
scientific ideas as recognized by publishing
research authors. - By the references they cite in their papers,
authors make explicit linkages between their
current research and prior work in the archive of
scientific literature.
5Distinction between "citation" and "reference"
- If Paper R contains a bibliographic footnote
using and describing Paper C, then - R contains a reference to C,
- C has a citation from R.
- The number of references a paper has is measured
by the number of items in its bibliography as
endnotes, footnotes, etc., - The number of citations a paper has is found by
looking it up in a citation index and seeing
how many others papers mention it."
Source Price D. J. D. Little science, big
science...and beyond. New York Columbia
University Press, 1986.
6Paper R
..To start, it is important to clarify the
terminological distinction between "citation6
and "reference". In his classic book Little
Science, Big Science, Derek Price gave a clear
definition of both terms. He said "It seems to
me a great pity to waste a good technical term by
using the words citation and reference
interchangeably. I therefore propose and adopt
the convention that if Paper R contains a
bibliographic footnote using and describing Paper
C, then R contains
R contains a reference to C,
6 The concept of citation indexing A unique
and innovative tool   for navigating the
research literature. Current Contents, January 3,
1994.
Paper C
Little science, big science...and
beyond. This is my first Current Contents
(CC) essay under the rubric of Citation
Comments. As discussed in last week's CC, this
new monthly feature will focus on the
applications of the Institute for Scientific
Information's (ISI's) databases. 1 An appropriate
topic to launch this new series is perhaps the
most rudimentary -- the basic concept of citation
indexing. To start, it is important to clarify
the terminological distinction between "citation"
and "reference". In his classic book Little
Science, Big Science, Derek Price gave a clear
definition of both terms. He said "It seems to
me a great pity to waste a good technical term by
using the words citation and reference
interchangeably. I therefore propose and adopt
the convention that if Paper R contains a
bibliographic footnote using and describing Paper
C, then R contains a.
C has a citation from R.
7Citation Index
Paper C
- Paper X
- Paper Y
- Paper R
- Paper Q
8Citation Indexing
- A citation index indexes the citations an article
makes, linking the article with cited works. - Originally designed mainly for literature search
for researchers to find subsequent articles that
cite a given article. - Invented by Dr. Eugene Garfield
- Example of a Citation Indexing Firm - Institute
for Scientific Information (ISI)
9Institute for Scientific Information (ISI)
- Index the linkages by listing both the cited and
citing works. - The ISI databases
- Science Citation Index (SCI)
- Social Sciences Citation Index (SSCI)
- Arts Humanities Citation Index (AHCI)
- Multidisciplinary. They cover virtually all
disciplines whereas traditional indexing and
abstracting services are limited to a single
field.
10Web of Knowledge
- ISI Web of Knowledge, a dynamic, integrated,
Web-based environment - ISI Web of Science provides access to
- Science Citation Index (over 3,200 journals )
- Social Sciences Citation Index (1400 journals)
- Arts Humanities Citation Index
- Updated weekly.
- Journals from 1986 is available for Penn State
Users - Previous years of each index are available in
PRINT at the Libraries.
11(No Transcript)
12Web of Science
- search current and retrospective
multidisciplinary information from nearly 8,500
research journals in the world. - users can navigate forward, backward, and through
the literature, searching all disciplines and
time spans to uncover lot of information relevant
to their research.
13Advantages
- Compared to traditional indexing-
- no subjective judgments to be made about relevant
descriptors - faster
- no limit to index terms - all cited references
are indexed.
14Problems with ISI Databases
- Require manual effort during indexing
- Expensive
- Bias issues
- One possible solution Autonomous Citation
Indexing
Adapted from Citation Indexing - Its Theory and
Application in Science, Technology, and
Humanities by Eugene Garfield
15Bias in Citation Databases
- Bibliometric indicators do not represent all
publishing -though these databases have an
international coverage, they have a certain
amount of bias- - They contain more minor US journals than minor
European journals - Non-English language journals are not as
comprehensively indexed - From a non-English speaking world perspective,
bibliometric indicators represent only
international level, predominantly English
language, higher impact, peer-reviewed, publicly
available research output.
Source Bibliometric Indicators and the Social
Sciences, prepared for ESRC, J. Sylvan Katz SPRU,
University of Sussex UK, December 1999
16Bias in Citation Databases
- One of the recurrent criticisms journal
selection is biased by the internal management
decisions of ISI. - Only journals are indexed- monographs are left
out. - A lack of correlation between the most highly
cited authors based on the journal sample and
those based on the monograph sample suggests that
there may be two distinct populations of highly
cited authors.
Source Blaise Cronin and Herbert W. Snyder.
Comparative citation rankings of authors in
monographic and journal literature a study of
sociology. Journal of Documentation,53(3)263273,
1997.
17ResearchIndex/CiteSeer
- ResearchIndex A scientific literature digital
library that incorporates - Autonomous citation indexing
- Citation context
- Full-text indexing
- Related document identification
- Query sensitive summaries
- Awareness and tracking
- Citation graph analysis
- http//citeseer.nj.nec.com/cs
Source Presentation on Searching the World Wide
Web General and Scientific Information Access,
Steve Lawrence
18CiteSeer How does it work?
Downloads papers from the Web
Convert to text and parse
Obtain Citations Do Full Text Indexing
Store them in Database
Query by citations or key words
Source CiteSeer An Automatic Citation Indexing
System (1998),C. Lee Giles, Kurt D. Bollacker,
Steve Lawrence, Digital Libraries 98 - The Third
ACM Conference on Digital Libraries
19CiteSeer - Document Acquisition
- Web search engines used for crawling
- Heuristics used to locate papers
- Pages containing words publications, papers,
postscript, etc.). - locates and downloads Postscript files identified
by .ps, .ps.Z, or .ps.gz extensions. - URLs and Postscript files that are duplicates of
those already found are detected and skipped. -
20Document Parsing
- The downloaded Postscript files are first
converted into text - Information extracted include- URL , Header,
Abstract, Introduction, Citations, Citation
context and Full text - Issues in Citation Parsing include
- Natural language citations
- Citations to the same article (affects citation
statistics)
21Querying and Browsing
- First query key word search used to return a
list of citations matching the query or list of
articles. - Finds related documents- a combination of weighed
similarity measures are used - http//citeseer.nj.nec.com/cs
22Advantages of CiteSeer
- Completely Autonomous - cheaper and more
availability - More up-to-date databases - not limited to a
pre-selected set of journals or publication
delays - Literature search based on the context of
citations - Ability to recognize variant forms of citations
- No bias due to no subjective selection of
journals - Not restricted to papers preprints, technical
reports, conference proceedings also indexed. - User feedback on each article
Source Autonomous Citation Matching (1999)Â Steve
Lawrence, C. Lee Giles, Kurt Bollacker Proceedings
of the Third International Conference on
Autonomous Agents
23Areas of Improvement
- 1. Does not cover the significant journals
comprehensively. - (might be less of a disadvantage over time as
more journals become available online) - 2. Cannot distinguish subfields as accurately
- (e.g. CiteSeer will not disambiguate two authors
with the same name.) - 3. Similar document retrieval system could be
enhanced and improved. - 4. Heuristics used to locate articles could be
improved
24Future prospects Technology Forecasting
- DIVA (for Database Information Visualization and
Analysis system) - bibliometric analysis of
collections of scientific literature and patents
for technology forecasting. - Documents, drawn from the technological field of
interest, are visualized as clusters on a two
dimensional map, permitting exploration of the
relationships among the documents and document
clusters - Can yield insight into trends in the
technological field of interest.
Source DIVA A Visualization System for
Exploring Document Databases For Technology
Forecasting by Steven Morris, Zheng Wu, Camille
DeYong, Sinan Salman, Dagmawi Yemenu Computers
and Industrial Engineering, Vol. 43, No. 4
25Clustering of documents
26Document Maps
27Document timelines
28Document timelines
29Document timelines
30Document timelines
Polymers cluster report showing a plot of links
to all other clusters by year
31Document timelines
Polymers cluster report showing a plot of links
to each other cluster by year.
32A comment on bibliometric analysis
- Compared to a drunk who is looking for his
keys under a street lamp . - When asked by a passer-by as to why he is
looking there, the reply was This is where the
lamp is.
33A comment on bibliometric analysis
- Critics say that publications (and citations)
just provide easy data and that the assessment
of real quality needs more quantitative
considerations.
34Summary
- Citation Indexing more the 40 years old.
- Simple concept far reaching influences,
applications - Many possibilities for
- Improvement of existing systems
- Developing new uses in the networked world