Title: Jenny Fry and Mike Thelwall
1Jenny Fry and Mike Thelwall
- Measuring the Impact of e-Research Accounting
for Disciplinary Differential in Patterns of
Diffusion
2New ways of working
Collective level
e.g. bioinformatics
e.g. chemistry
Tools
Solving problems in new fields
Accompanied by new techniques
Old ways of measuring
Individual level
Outputs
Impact
Inputs
Collaboration
Co-authorship
Publications
Funding
Citations
3Disciplinary cultures
4Differential uptake and use of e-Research
- Different intellectual communities different
ways of working - Different relations with object of research some
more technically mediated than others - Innovation introduces uncertainty into a social
system uncertainty has a special role in science - Current initiatives by NCeSS, and others, to
facilitate user engagement through education,
training and awareness raising - As yet, do not know if or how e-research
technologies might diffuse across the social
sciences and beyond
5Focus
- What metrics are appropriate for new ways of
working? - What is the appropriate unit of analysis?
- project, intellectual field, discipline
- What new techniques do we need for measuring
these traces? - What are the factors (variables) that shape these
patterns of diffusion? - How to account for disciplinary difference?
6The Case Study
- Corpus-based linguistics
- CLARIN consortium - 32 partners from 22 countries
- Aims to build a federation of trusted archive
centers providing access to resources and tools
through web services - Fragmented information landscape
- Linguistics and language resources communities
already using web to disseminate tools and
provide access to resources, but in a
decentralized ad hoc way
7The challenge
8Methodology
- From bibliometrics to webometrics (hyperlink
analysis) - Interpreting hyperlinks
- Informal communication liminal traces of esteem
- Different disciplines have different hyperlinking
patterns - What might hyperlinking patterns tell us about
e-research communities? - Rate of appropriation
- Disciplinary constitution
- Centrality of infrastructure
- Effects of competing technologies
- Emergence of new communities
- Hyperlinks a limited source of evidence
researchers may use online resources but not link
to them
9Starting Point?
- Creating a seed set of URLs
- Initial URLs provided by CLARIN members
- Differentiated resources and tools
- Resources structured data objects e.g. corpora
and dictionaries - Tools technologies by which data are processed
e.g. lemmatisers and parsers - Sorting levels of granularity
- Thinking about variables
10Capturing inlinks
- Generating a list of pages containing links that
point to URLs in the seed set - Website domain name of each URL was extracted and
a Yahoo! search generated for pages outside that
domain, but linking to the URL - Using the advanced search link command
- linkhttp//ucrel.lancs.ac.uk/IIwizard.html
sitelancs.ac.uk - Via Yahoo!s Applications Programming Interface
allowing automatic searches - Yahoo! Only returned the first 1,000 and the
query splitting technique (Thelwall, 2008) was
used to retrieve additional URLs
11Illustration of the link counting process
a.com/1.htm a.com/2.htm a.com/3.htm
CLARIN1
a.com
b.com/1.htm
b.com
CLARIN2
c.com/1.htm
c.com
The diagram shows two sites linking to the
CLARIN1 resource/tool and two sites linking to
the CLARIN2 resource/tool. Although there are
three pages in a.com linking to CLARIN1, they
collectively count as one site link
12Interlinking patterns
- All seed URLs were crawled using the research web
crawler SocSciBot - All links from the crawled pages extracted
- Links within the same web site were discarded
- Links outside of the seed set were also discarded
- Remaining URLs were condensed to just the domain
name and duplicate links removed - Interlinking could then be visualized in a single
network diagram
13Network density
Colour-coded by country
19 web sites not connected to any others and not
shown
14Explanation of network density
- Diagram contains 22 nodes each representing a web
site - An additional 19 web sites are not represented
because they did not have any links to or from
them - The theoretical max of links between the total
41 (2219) web sites is 41x41-411640 - But, there were only 29 links in total counting
bidirectional arrows as two links - This means that the link density 29/16400.02
- In other words the diagram shows about 2 of the
links that could theoretically exist between
pairs of CLARIN web sites
15Benchmarking network density
CLARIN 2 reflects the more specialist character
of the CLARIN network?
Ackland, R., Fry, J., and Schroeder, R. (2007)
Scoping the online visibility of e-research by
means of e-research tools. Proceedings of the 3rd
International e-Social Science Conference. Ann
Arbor, Michigan, 7-9 October 2007
16Top five linked-to websites
17Resource to Tool Linking
18Academic homepage
19Lay audience
20Lay audience
21Commercial content
22Blogs
23Link Context
- Minimal tool/resource to tool/resource
interlinking (even between those URLs categorized
as being within linguistics) - Majority of links appeared to be unreciprocated
indicating a unidirectional dependency on the
CLARIN tools/resources - Representation of a lay-audience
- Presence of commercial content
- Wide range of page types e.g. homepages, project,
blog, portal - High proportion of links were contextualised, but
some generalisation particularly in relation to
the BNC - Links to Wavesurfer highly contextualised
24Sampling Links
- Five step process to generate a random sample of
URLS - URLs of pages linking to CLARIN resources/tools
were obtained from the Yahoo! API searches - A list was constructed of all the different web
sites linking to each resource/tool - The list of linking web sites were merged
together to give one long list - 100 web sites were randomly selected from the
list, using Excels random number generator - For each selected web site a page within the web
site that links to a CLARIN resource/tool was
selected at random
25Conclusions
- Can hyperlinking patterns actually tell us
anything about patterns of diffusion? - Can give indications, but need additional
analyses to confirm e.g. content analysis - Tentative hypotheses
- Density related to large-scale resources of value
to multiple intellectual fields and multiple
research problems - Interconnectedness (e.g. branching off of
clusters) related to small-scale tools that
address a specific research problem across
multiple intellectual fields
26Future work
- In it for the long haul
- Longitudinal data capture and analysis
- Possible modelling and simulation
- Need to identify discipline specific factors that
shape diffusion of e-research - In particular
- The influence of interdependence and uncertainty
on patterns of diffusion across intellectual
fields