Delineating the Citation Impact of Scientific Discoveries

1 / 25
About This Presentation
Title:

Delineating the Citation Impact of Scientific Discoveries

Description:

Delineating the Citation Impact of Scientific Discoveries Chaomei Chen1, Jian Zhang1, Weizhong Zhu1, Michael Vogeley2 1College of Information Science and Technology ... – PowerPoint PPT presentation

Number of Views:3
Avg rating:3.0/5.0
Slides: 26
Provided by: cch138

less

Transcript and Presenter's Notes

Title: Delineating the Citation Impact of Scientific Discoveries


1
Delineating the Citation Impact of Scientific
Discoveries
  • Chaomei Chen1, Jian Zhang1, Weizhong Zhu1,
    Michael Vogeley2
  • 1College of Information Science and Technology,
    Drexel University
  • 2Department of Physics, Drexel University

This work is supported by the National Science
Foundation under Grant No. 0612129. Thomson ISI
provides the bibliographic data for the analysis.
2
As We May Thinkby Vannevar Bush
  • There is a growing mountain of research. But
    there is increased evidence that we are being
    bogged down today as specialization extends. The
    investigator is staggered by the findings and
    conclusions of thousands of other
    workersconclusions which he cannot find time to
    grasp, much less to remember, as they appear. Yet
    specialization becomes increasingly necessary for
    progress, and the effort to bridge between
    disciplines is correspondingly superficial.

3
An Increasingly Strong Trend in Science Gray
Szalay 2004
  • massive scientific data are being collected
    by one group of scientists
  • and
  • being analyzed by another group of
    scientists.
  • Two notable examples
  • 1. The SDSS project in astrophysics
  • 2. The human genome project in
    biomedicine

4
Sloan Digital Sky SurveyThe most ambitious
astronomical survey ever undertaken
There is an increasingly strong trend in science
that massive scientific data are being collected
by one group of scientists and being analyzed by
another group of scientists (Gray Szalay 2004).
Two notable examples the SDSS project in
astrophysics and the human genome project in
biomedicine.
  • Sloan Survey Data
  • June, 2006 Data Release Five8000 square
    degrees, 1,048,960 spectra.
  • June, 2005 Data Release Four6670 square
    degrees, 806,400 spectra.
  • September, 2004 Data Release Three5282 square
    degrees, 528,640 spectra.
  • March, 2004 Data Release Two3324 square
    degrees, 367,360 spectra.
  • April, 2003 Data Release One2099 square
    degrees, 186,240 spectra.
  • June, 2001 Early Data Release462 square
    degrees, 52,896 spectra.
  • SDSS Literature
  • Total number of articles 1,478
  • Total citations 47,282
  • June 18, 2007 H 95
  • January 30, 2007 H 89

Time Slice Space Node Link
2001-2001 1699 300 7249
2002-2002 2703 519 14808
2003-2003 4294 1036 40133
2004-2004 5580 1218 43398
2005-2005 6692 1685 76009
2006-2006 10279 2815 139300
2007-2007 3136 496 15259
5
Integrating Microscopic and Macroscopic
perspectives
  • Connecting text-level patterns (microscopic) and
    paper-level citation impacts (macroscopic)
  • improve our understanding of science in the
    making
  • develop data mining and visual analytics
    algorithms

6
(No Transcript)
7
Figure 3. Prominent keywords assigned by authors
and burst terms extracted from titles and
abstracts (2002-2006).
8
Class I
Hc, Ht Split
Class II
9
Fast-Growing SDSS Literature
  • 1,400 papers
  • 40,000 citations
  • The total citation number doubled in the past 1.5
    years.
  • H-index of SDSS literature 89 95

10
As of June 18, 2007, 95 SDSS papers have 95 or
more citations. It was 89 in January 2007.
11
Measuring the Citation Impact
  • Sc discounts citations accumulated over a long
    period of time.
  • Sc is adjusted for publication age.
  • St measures the recent impact
  • St gives heavier weights to relatively recent
    citations than earlier citations.

12
Year Title Cites Sc St
2004 Cosmological parameters from SDSS and WMAP 404 404.00 367.00
1995 THE FIRST SURVEY - FAINT IMAGES OF THE RADIO SKY AT 20 CENTIMETERS 455 140.00 301.64
2003 Stellar population synthesis at the resolution of 2003 371 296.80 263.47
2001 Evidence for reionization at z similar to 6 Detection of a Gunn-Peterson trough in a z6.28 quasar 307 175.43 255.07
2001 The luminosity function of galaxies in SDSS commissioning data 250 142.86 196.73
2003 A survey of z gt 5.7 quasars in the Sloan Digital Sky Survey. II. Discovery of three additional quasars at z gt 6 195 156.00 175.80
2001 A survey of z gt 5.8 quasars in the Sloan Digital Sky Survey. I. Discovery of three new quasars and the spatial density of luminous quasars at z similar to 6 226 129.14 174.87
2002 Evolution of the ionizing background and the epoch of reionization from the spectra of z similar to 6 quasars 211 140.67 170.00
2001 Composite quasar spectra from the Sloan Digital Sky Survey 221 126.29 168.21
2004 The three-dimensional power spectrum of galaxies from the Sloan Digital Sky Survey 224 224.00 167.00
13
Hg Indices and Splits
  • The 1,293 records
  • H-index 65, including 3 papers have 65
    citations
  • Hc index 52
  • Ht index 53
  • The H split
  • 67 papers in the highly cited group
  • 1,226 remaining papers in the second group

14
Class I
Class I
Class II
15
Significant Noun Phrases
  • 22,665 noun phrases identified by a
    part-of-speech tagging and pattern matching
    process.
  • 290 of them are selected based on their
    log-likelihood ratios.

Sc Sc St St
Total terms 22,665 A(Sc) G(Sc) A(Sc) G(Sc)
Pivotal value 11.70 11.06 11.46 8.61
High 379 379 328 401
Low 914 914 965 892
16
Figure 4. An overview of a decision tree
generated based on 216 terms selected by
log-likelihood ratio values (plt0.01) and a
geometric mean split (74.44 of classification
accuracy). The tree should be read from the root
downwards .
17
Figure 5. A part of the tree shown in Figure 4.
The presence (gt0) or absence (lt0) of a term is
associated with a citation status group, i.e.
highly and timely cited group.
18
Figure 6. An ADTree derived from the data
selected with the same selection criteria with
70.55 of accuracy.
19
n-
Figure 7. A decision tree of 95.82
classification accuracy derived from 721 terms
and 1,267 records.
20
Figure 10. The citation history of timeliness
papers shows recently published papers are moved
up in the rankings.
21
Future Work
  • Unsupervised ontology construction to smooth the
    feature space
  • Incremental classification of incoming new data
    and scholarly publications
  • Self-directed optimization of existing decision
    trees based on new evidence
  • Full-text analysis that can model associative
    relations between hypotheses and evidence and
    between facts and opinions

22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
Write a Comment
User Comments (0)