Title: Delineating the Citation Impact of Scientific Discoveries
1Delineating the Citation Impact of Scientific
Discoveries
- Chaomei Chen1, Jian Zhang1, Weizhong Zhu1,
Michael Vogeley2 - 1College of Information Science and Technology,
Drexel University - 2Department of Physics, Drexel University
This work is supported by the National Science
Foundation under Grant No. 0612129. Thomson ISI
provides the bibliographic data for the analysis.
2As We May Thinkby Vannevar Bush
- There is a growing mountain of research. But
there is increased evidence that we are being
bogged down today as specialization extends. The
investigator is staggered by the findings and
conclusions of thousands of other
workersconclusions which he cannot find time to
grasp, much less to remember, as they appear. Yet
specialization becomes increasingly necessary for
progress, and the effort to bridge between
disciplines is correspondingly superficial.
3An Increasingly Strong Trend in Science Gray
Szalay 2004
- massive scientific data are being collected
by one group of scientists - and
- being analyzed by another group of
scientists. - Two notable examples
- 1. The SDSS project in astrophysics
- 2. The human genome project in
biomedicine
4Sloan Digital Sky SurveyThe most ambitious
astronomical survey ever undertaken
There is an increasingly strong trend in science
that massive scientific data are being collected
by one group of scientists and being analyzed by
another group of scientists (Gray Szalay 2004).
Two notable examples the SDSS project in
astrophysics and the human genome project in
biomedicine.
- Sloan Survey Data
- June, 2006 Data Release Five8000 square
degrees, 1,048,960 spectra. - June, 2005 Data Release Four6670 square
degrees, 806,400 spectra. - September, 2004 Data Release Three5282 square
degrees, 528,640 spectra. - March, 2004 Data Release Two3324 square
degrees, 367,360 spectra. - April, 2003 Data Release One2099 square
degrees, 186,240 spectra. - June, 2001 Early Data Release462 square
degrees, 52,896 spectra.
- SDSS Literature
- Total number of articles 1,478
- Total citations 47,282
- June 18, 2007 H 95
- January 30, 2007 H 89
Time Slice Space Node Link
2001-2001 1699 300 7249
2002-2002 2703 519 14808
2003-2003 4294 1036 40133
2004-2004 5580 1218 43398
2005-2005 6692 1685 76009
2006-2006 10279 2815 139300
2007-2007 3136 496 15259
5Integrating Microscopic and Macroscopic
perspectives
- Connecting text-level patterns (microscopic) and
paper-level citation impacts (macroscopic) - improve our understanding of science in the
making - develop data mining and visual analytics
algorithms
6(No Transcript)
7Figure 3. Prominent keywords assigned by authors
and burst terms extracted from titles and
abstracts (2002-2006).
8Class I
Hc, Ht Split
Class II
9Fast-Growing SDSS Literature
- 1,400 papers
- 40,000 citations
- The total citation number doubled in the past 1.5
years. - H-index of SDSS literature 89 95
10As of June 18, 2007, 95 SDSS papers have 95 or
more citations. It was 89 in January 2007.
11Measuring the Citation Impact
- Sc discounts citations accumulated over a long
period of time. - Sc is adjusted for publication age.
- St measures the recent impact
- St gives heavier weights to relatively recent
citations than earlier citations.
12Year Title Cites Sc St
2004 Cosmological parameters from SDSS and WMAP 404 404.00 367.00
1995 THE FIRST SURVEY - FAINT IMAGES OF THE RADIO SKY AT 20 CENTIMETERS 455 140.00 301.64
2003 Stellar population synthesis at the resolution of 2003 371 296.80 263.47
2001 Evidence for reionization at z similar to 6 Detection of a Gunn-Peterson trough in a z6.28 quasar 307 175.43 255.07
2001 The luminosity function of galaxies in SDSS commissioning data 250 142.86 196.73
2003 A survey of z gt 5.7 quasars in the Sloan Digital Sky Survey. II. Discovery of three additional quasars at z gt 6 195 156.00 175.80
2001 A survey of z gt 5.8 quasars in the Sloan Digital Sky Survey. I. Discovery of three new quasars and the spatial density of luminous quasars at z similar to 6 226 129.14 174.87
2002 Evolution of the ionizing background and the epoch of reionization from the spectra of z similar to 6 quasars 211 140.67 170.00
2001 Composite quasar spectra from the Sloan Digital Sky Survey 221 126.29 168.21
2004 The three-dimensional power spectrum of galaxies from the Sloan Digital Sky Survey 224 224.00 167.00
13Hg Indices and Splits
- The 1,293 records
- H-index 65, including 3 papers have 65
citations - Hc index 52
- Ht index 53
- The H split
- 67 papers in the highly cited group
- 1,226 remaining papers in the second group
14Class I
Class I
Class II
15Significant Noun Phrases
- 22,665 noun phrases identified by a
part-of-speech tagging and pattern matching
process. - 290 of them are selected based on their
log-likelihood ratios.
Sc Sc St St
Total terms 22,665 A(Sc) G(Sc) A(Sc) G(Sc)
Pivotal value 11.70 11.06 11.46 8.61
High 379 379 328 401
Low 914 914 965 892
16Figure 4. An overview of a decision tree
generated based on 216 terms selected by
log-likelihood ratio values (plt0.01) and a
geometric mean split (74.44 of classification
accuracy). The tree should be read from the root
downwards .
17Figure 5. A part of the tree shown in Figure 4.
The presence (gt0) or absence (lt0) of a term is
associated with a citation status group, i.e.
highly and timely cited group.
18Figure 6. An ADTree derived from the data
selected with the same selection criteria with
70.55 of accuracy.
19n-
Figure 7. A decision tree of 95.82
classification accuracy derived from 721 terms
and 1,267 records.
20Figure 10. The citation history of timeliness
papers shows recently published papers are moved
up in the rankings.
21Future Work
- Unsupervised ontology construction to smooth the
feature space - Incremental classification of incoming new data
and scholarly publications - Self-directed optimization of existing decision
trees based on new evidence - Full-text analysis that can model associative
relations between hypotheses and evidence and
between facts and opinions
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)