Citation Indexing - PowerPoint PPT Presentation

About This Presentation
Title:

Citation Indexing

Description:

... association of scientific ideas as recognized by publishing research authors. ... Finds related documents- a combination of weighed ... Document timelines ... – PowerPoint PPT presentation

Number of Views:196
Avg rating:3.0/5.0
Slides: 35
Provided by: nitish1
Category:

less

Transcript and Presenter's Notes

Title: Citation Indexing


1
Citation Indexing
  • Nitish Mathew
  • Thanks to
  • Dr. C. Lee Giles
  • Dr. Paul Cohen

2
Outline
  • Introduction to Citation Indexing
  • What is Citation Indexing
  • Concept
  • Web of Science
  • Bias
  • Autonomous Citation Indexing
  • Future Application
  • Technology Forecasting
  • Summary

3
Why do literature search?
  • Avoid unwitting duplication of research
  • Wasted time, effort funds
  • Plagiarism issues

4
Concept of Citations
  • Citations symbolize the conceptual association of
    scientific ideas as recognized by publishing
    research authors.
  • By the references they cite in their papers,
    authors make explicit linkages between their
    current research and prior work in the archive of
    scientific literature.

5
Distinction between "citation" and "reference"
  • If Paper R contains a bibliographic footnote
    using and describing Paper C, then
  • R contains a reference to C,
  • C has a citation from R.
  • The number of references a paper has is measured
    by the number of items in its bibliography as
    endnotes, footnotes, etc.,
  • The number of citations a paper has is found by
    looking it up in a citation index and seeing
    how many others papers mention it."

Source Price D. J. D. Little science, big
science...and beyond. New York Columbia
University Press, 1986.
6
Paper R
..To start, it is important to clarify the
terminological distinction between "citation6
and "reference". In his classic book Little
Science, Big Science, Derek Price gave a clear
definition of both terms. He said "It seems to
me a great pity to waste a good technical term by
using the words citation and reference
interchangeably. I therefore propose and adopt
the convention that if Paper R contains a
bibliographic footnote using and describing Paper
C, then R contains
R contains a reference to C,
6 The concept of citation indexing A unique
and innovative tool    for navigating the
research literature. Current Contents, January 3,
1994.
Paper C
Little science, big science...and
beyond. This is my first Current Contents
(CC) essay under the rubric of Citation
Comments. As discussed in last week's CC, this
new monthly feature will focus on the
applications of the Institute for Scientific
Information's (ISI's) databases. 1 An appropriate
topic to launch this new series is perhaps the
most rudimentary -- the basic concept of citation
indexing. To start, it is important to clarify
the terminological distinction between "citation"
and "reference". In his classic book Little
Science, Big Science, Derek Price gave a clear
definition of both terms. He said "It seems to
me a great pity to waste a good technical term by
using the words citation and reference
interchangeably. I therefore propose and adopt
the convention that if Paper R contains a
bibliographic footnote using and describing Paper
C, then R contains a.
C has a citation from R.
7
Citation Index
Paper C
  1. Paper X
  2. Paper Y
  3. Paper R
  4. Paper Q

8
Citation Indexing
  • A citation index indexes the citations an article
    makes, linking the article with cited works.
  • Originally designed mainly for literature search
    for researchers to find subsequent articles that
    cite a given article.
  • Invented by Dr. Eugene Garfield
  • Example of a Citation Indexing Firm - Institute
    for Scientific Information (ISI)

9
Institute for Scientific Information (ISI)
  • Index the linkages by listing both the cited and
    citing works.
  • The ISI databases
  • Science Citation Index (SCI)
  • Social Sciences Citation Index (SSCI)
  • Arts Humanities Citation Index (AHCI)
  • Multidisciplinary. They cover virtually all
    disciplines whereas traditional indexing and
    abstracting services are limited to a single
    field.

10
Web of Knowledge
  • ISI Web of Knowledge, a dynamic, integrated,
    Web-based environment
  • ISI Web of Science provides access to
  • Science Citation Index (over 3,200 journals )
  • Social Sciences Citation Index (1400 journals)
  • Arts Humanities Citation Index
  • Updated weekly.
  • Journals from 1986 is available for Penn State
    Users
  • Previous years of each index are available in
    PRINT at the Libraries.

11
(No Transcript)
12
Web of Science
  • search current and retrospective
    multidisciplinary information from nearly 8,500
    research journals in the world.
  • users can navigate forward, backward, and through
    the literature, searching all disciplines and
    time spans to uncover lot of information relevant
    to their research.

13
Advantages
  • Compared to traditional indexing-
  • no subjective judgments to be made about relevant
    descriptors
  • faster
  • no limit to index terms - all cited references
    are indexed.

14
Problems with ISI Databases
  • Require manual effort during indexing
  • Expensive
  • Bias issues
  • One possible solution Autonomous Citation
    Indexing

Adapted from Citation Indexing - Its Theory and
Application in Science, Technology, and
Humanities by Eugene Garfield
15
Bias in Citation Databases
  • Bibliometric indicators do not represent all
    publishing -though these databases have an
    international coverage, they have a certain
    amount of bias-
  • They contain more minor US journals than minor
    European journals
  • Non-English language journals are not as
    comprehensively indexed
  • From a non-English speaking world perspective,
    bibliometric indicators represent only
    international level, predominantly English
    language, higher impact, peer-reviewed, publicly
    available research output.

Source Bibliometric Indicators and the Social
Sciences, prepared for ESRC, J. Sylvan Katz SPRU,
University of Sussex UK, December 1999
16
Bias in Citation Databases
  • One of the recurrent criticisms journal
    selection is biased by the internal management
    decisions of ISI.
  • Only journals are indexed- monographs are left
    out.
  • A lack of correlation between the most highly
    cited authors based on the journal sample and
    those based on the monograph sample suggests that
    there may be two distinct populations of highly
    cited authors.

Source Blaise Cronin and Herbert W. Snyder.
Comparative citation rankings of authors in
monographic and journal literature a study of
sociology. Journal of Documentation,53(3)263273,
1997.
17
ResearchIndex/CiteSeer
  • ResearchIndex A scientific literature digital
    library that incorporates
  • Autonomous citation indexing
  • Citation context
  • Full-text indexing
  • Related document identification
  • Query sensitive summaries
  • Awareness and tracking
  • Citation graph analysis
  • http//citeseer.nj.nec.com/cs

Source Presentation on Searching the World Wide
Web General and Scientific Information Access,
Steve Lawrence
18
CiteSeer How does it work?
Downloads papers from the Web
Convert to text and parse
Obtain Citations Do Full Text Indexing
Store them in Database
Query by citations or key words
Source CiteSeer An Automatic Citation Indexing
System (1998),C. Lee Giles, Kurt D. Bollacker,
Steve Lawrence, Digital Libraries 98 - The Third
ACM Conference on Digital Libraries
19
CiteSeer - Document Acquisition
  • Web search engines used for crawling
  • Heuristics used to locate papers
  • Pages containing words publications, papers,
    postscript, etc.).
  • locates and downloads Postscript files identified
    by .ps, .ps.Z, or .ps.gz extensions.
  • URLs and Postscript files that are duplicates of
    those already found are detected and skipped.

20
Document Parsing
  • The downloaded Postscript files are first
    converted into text
  • Information extracted include- URL , Header,
    Abstract, Introduction, Citations, Citation
    context and Full text
  • Issues in Citation Parsing include
  • Natural language citations
  • Citations to the same article (affects citation
    statistics)

21
Querying and Browsing
  • First query key word search used to return a
    list of citations matching the query or list of
    articles.
  • Finds related documents- a combination of weighed
    similarity measures are used
  • http//citeseer.nj.nec.com/cs

22
Advantages of CiteSeer
  • Completely Autonomous - cheaper and more
    availability
  • More up-to-date databases - not limited to a
    pre-selected set of journals or publication
    delays
  • Literature search based on the context of
    citations
  • Ability to recognize variant forms of citations
  • No bias due to no subjective selection of
    journals
  • Not restricted to papers preprints, technical
    reports, conference proceedings also indexed.
  • User feedback on each article

Source Autonomous Citation Matching (1999) Steve
Lawrence, C. Lee Giles, Kurt Bollacker Proceedings
of the Third International Conference on
Autonomous Agents
23
Areas of Improvement
  • 1. Does not cover the significant journals
    comprehensively.
  • (might be less of a disadvantage over time as
    more journals become available online)
  • 2. Cannot distinguish subfields as accurately
  • (e.g. CiteSeer will not disambiguate two authors
    with the same name.)
  • 3. Similar document retrieval system could be
    enhanced and improved.
  • 4. Heuristics used to locate articles could be
    improved

24
Future prospects Technology Forecasting
  • DIVA (for Database Information Visualization and
    Analysis system) - bibliometric analysis of
    collections of scientific literature and patents
    for technology forecasting.
  • Documents, drawn from the technological field of
    interest, are visualized as clusters on a two
    dimensional map, permitting exploration of the
    relationships among the documents and document
    clusters
  • Can yield insight into trends in the
    technological field of interest.

Source DIVA A Visualization System for
Exploring Document Databases For Technology
Forecasting by Steven Morris, Zheng Wu, Camille
DeYong, Sinan Salman, Dagmawi Yemenu Computers
and Industrial Engineering, Vol. 43, No. 4
25
Clustering of documents
26
Document Maps
27
Document timelines
28
Document timelines
29
Document timelines
30
Document timelines
Polymers cluster report showing a plot of links
to all other clusters by year
31
Document timelines
Polymers cluster report showing a plot of links
to each other cluster by year.
32
A comment on bibliometric analysis
  • Compared to a drunk who is looking for his
    keys under a street lamp .
  • When asked by a passer-by as to why he is
    looking there, the reply was This is where the
    lamp is.

33
A comment on bibliometric analysis
  • Critics say that publications (and citations)
    just provide easy data and that the assessment
    of real quality needs more quantitative
    considerations.

34
Summary
  • Citation Indexing more the 40 years old.
  • Simple concept far reaching influences,
    applications
  • Many possibilities for
  • Improvement of existing systems
  • Developing new uses in the networked world
Write a Comment
User Comments (0)
About PowerShow.com