Jenny Fry and Mike Thelwall - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Jenny Fry and Mike Thelwall

Description:

Different intellectual communities; different ways of working ... Modern common (literature, colloquial) Lithuanian words 50,000, with examples of usage ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 27
Provided by: tampe9
Category:

less

Transcript and Presenter's Notes

Title: Jenny Fry and Mike Thelwall


1
Jenny Fry and Mike Thelwall
  • Measuring the Impact of e-Research Accounting
    for Disciplinary Differential in Patterns of
    Diffusion

2
New ways of working
Collective level
e.g. bioinformatics
e.g. chemistry
Tools
Solving problems in new fields
Accompanied by new techniques
Old ways of measuring
Individual level
Outputs
Impact
Inputs
Collaboration
Co-authorship
Publications
Funding
Citations
3
Disciplinary cultures
4
Differential uptake and use of e-Research
  • Different intellectual communities different
    ways of working
  • Different relations with object of research some
    more technically mediated than others
  • Innovation introduces uncertainty into a social
    system uncertainty has a special role in science
  • Current initiatives by NCeSS, and others, to
    facilitate user engagement through education,
    training and awareness raising
  • As yet, do not know if or how e-research
    technologies might diffuse across the social
    sciences and beyond

5
Focus
  • What metrics are appropriate for new ways of
    working?
  • What is the appropriate unit of analysis?
  • project, intellectual field, discipline
  • What new techniques do we need for measuring
    these traces?
  • What are the factors (variables) that shape these
    patterns of diffusion?
  • How to account for disciplinary difference?

6
The Case Study
  • Corpus-based linguistics
  • CLARIN consortium - 32 partners from 22 countries
  • Aims to build a federation of trusted archive
    centers providing access to resources and tools
    through web services
  • Fragmented information landscape
  • Linguistics and language resources communities
    already using web to disseminate tools and
    provide access to resources, but in a
    decentralized ad hoc way

7
The challenge
8
Methodology
  • From bibliometrics to webometrics (hyperlink
    analysis)
  • Interpreting hyperlinks
  • Informal communication liminal traces of esteem
  • Different disciplines have different hyperlinking
    patterns
  • What might hyperlinking patterns tell us about
    e-research communities?
  • Rate of appropriation
  • Disciplinary constitution
  • Centrality of infrastructure
  • Effects of competing technologies
  • Emergence of new communities
  • Hyperlinks a limited source of evidence
    researchers may use online resources but not link
    to them

9
Starting Point?
  • Creating a seed set of URLs
  • Initial URLs provided by CLARIN members
  • Differentiated resources and tools
  • Resources structured data objects e.g. corpora
    and dictionaries
  • Tools technologies by which data are processed
    e.g. lemmatisers and parsers
  • Sorting levels of granularity
  • Thinking about variables

10
Capturing inlinks
  • Generating a list of pages containing links that
    point to URLs in the seed set
  • Website domain name of each URL was extracted and
    a Yahoo! search generated for pages outside that
    domain, but linking to the URL
  • Using the advanced search link command
  • linkhttp//ucrel.lancs.ac.uk/IIwizard.html
    sitelancs.ac.uk
  • Via Yahoo!s Applications Programming Interface
    allowing automatic searches
  • Yahoo! Only returned the first 1,000 and the
    query splitting technique (Thelwall, 2008) was
    used to retrieve additional URLs

11
Illustration of the link counting process
a.com/1.htm a.com/2.htm a.com/3.htm
CLARIN1
a.com
b.com/1.htm
b.com
CLARIN2
c.com/1.htm
c.com
The diagram shows two sites linking to the
CLARIN1 resource/tool and two sites linking to
the CLARIN2 resource/tool. Although there are
three pages in a.com linking to CLARIN1, they
collectively count as one site link
12
Interlinking patterns
  • All seed URLs were crawled using the research web
    crawler SocSciBot
  • All links from the crawled pages extracted
  • Links within the same web site were discarded
  • Links outside of the seed set were also discarded
  • Remaining URLs were condensed to just the domain
    name and duplicate links removed
  • Interlinking could then be visualized in a single
    network diagram

13
Network density
Colour-coded by country
19 web sites not connected to any others and not
shown
14
Explanation of network density
  • Diagram contains 22 nodes each representing a web
    site
  • An additional 19 web sites are not represented
    because they did not have any links to or from
    them
  • The theoretical max of links between the total
    41 (2219) web sites is 41x41-411640
  • But, there were only 29 links in total counting
    bidirectional arrows as two links
  • This means that the link density 29/16400.02
  • In other words the diagram shows about 2 of the
    links that could theoretically exist between
    pairs of CLARIN web sites

15
Benchmarking network density
CLARIN 2 reflects the more specialist character
of the CLARIN network?
Ackland, R., Fry, J., and Schroeder, R. (2007)
Scoping the online visibility of e-research by
means of e-research tools. Proceedings of the 3rd
International e-Social Science Conference. Ann
Arbor, Michigan, 7-9 October 2007
16
Top five linked-to websites
17
Resource to Tool Linking
18
Academic homepage
19
Lay audience
20
Lay audience
21
Commercial content
22
Blogs
23
Link Context
  • Minimal tool/resource to tool/resource
    interlinking (even between those URLs categorized
    as being within linguistics)
  • Majority of links appeared to be unreciprocated
    indicating a unidirectional dependency on the
    CLARIN tools/resources
  • Representation of a lay-audience
  • Presence of commercial content
  • Wide range of page types e.g. homepages, project,
    blog, portal
  • High proportion of links were contextualised, but
    some generalisation particularly in relation to
    the BNC
  • Links to Wavesurfer highly contextualised

24
Sampling Links
  • Five step process to generate a random sample of
    URLS
  • URLs of pages linking to CLARIN resources/tools
    were obtained from the Yahoo! API searches
  • A list was constructed of all the different web
    sites linking to each resource/tool
  • The list of linking web sites were merged
    together to give one long list
  • 100 web sites were randomly selected from the
    list, using Excels random number generator
  • For each selected web site a page within the web
    site that links to a CLARIN resource/tool was
    selected at random

25
Conclusions
  • Can hyperlinking patterns actually tell us
    anything about patterns of diffusion?
  • Can give indications, but need additional
    analyses to confirm e.g. content analysis
  • Tentative hypotheses
  • Density related to large-scale resources of value
    to multiple intellectual fields and multiple
    research problems
  • Interconnectedness (e.g. branching off of
    clusters) related to small-scale tools that
    address a specific research problem across
    multiple intellectual fields

26
Future work
  • In it for the long haul
  • Longitudinal data capture and analysis
  • Possible modelling and simulation
  • Need to identify discipline specific factors that
    shape diffusion of e-research
  • In particular
  • The influence of interdependence and uncertainty
    on patterns of diffusion across intellectual
    fields
Write a Comment
User Comments (0)
About PowerShow.com