The Grid Bringing data producers and consumers closer - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

The Grid Bringing data producers and consumers closer

Description:

Producers and consumers of data have become far removed from ... Log each interaction with the cyberinfrastructure that provides access to data...keep tallies ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 28
Provided by: mar337
Category:

less

Transcript and Presenter's Notes

Title: The Grid Bringing data producers and consumers closer


1
The GridBringing data producers and consumers
closer?
  • Mark Gahegan
  • GeoVISTA Center
  • Penn State University, USA

2
Status quo
  • Producers and consumers of data have become far
    removed from each other
  • Data producers cannot anticipate what consumers
    might do with the data
  • Example Conference Signs
  • The burden of metadata production is significant
  • Documenting standards ignored to various degrees
  • Are metadata standards failing?
  • Many realitiesthe way we describe the world
    keeps changing

According to Heraclitus, panta rheieverything
is in flux. But what gives that flux its form is
the logosthe words or signs that enable us to
perceive patterns in the flux, remember them,
talk about them, and take action upon them even
while we ourselves are part of the flux we are
acting in and on. --(John Sowa)
3
What is The Grid (e-Science)? e.g. The
geosciences network (GEON) in the USA
FollowingDavid Ribes, GEON meta researcher
4
How does the Grid change anything?
  • The Grid brings data producers and data consumers
    closer than ever before
  • Grid and web service architectures broker the
    access to data, so provide opportunities for
    gathering and deploying new kinds of metadata

5
Ola Ahlqvist, Penn State
6
Example the metadata Big Five--uncertainty
  • Sinton (1978) and many others since defined the
    important aspects of metadata to be positional
    error, temporal error, measurement or thematic
    error, consistency and completeness
  • Sometimes lineage is included

7
Are they practical to gather?
  • In some cases, the answer is yes, e.g. primary
    surveying data producers
  • Means of recording accuracy in space, time and
    value/theme do exist
  • Means of recording consistency, completeness,
    lineage are for the most part experimental, there
    are no accepted standards

8
Are they practical to use?
  • Clearly there are examples of metadata fields
    that are very useful and easy to deploy, e.g. map
    projection datum
  • But often the answer is no, at least not in a
    quantitative way
  • the propagation of error through data combination
    is extremely difficult to achieve in practice.
  • Communicating uncertainty, either statistically
    or visually, is still a research topic
  • Communicating lineage is likewise experimental
  • metadata would help more if there was more direct
    interaction with it as part of analysis in
    existing GIS
  • Uncertainty metadata may help in a qualitative
    way, current onus is on the userhow should they
    make sense of it?

9
If data could talk?
Is the data suitable / optimal for my current
task?
Do I trust the people who produced this data?
  • Will I get sued?

If I dig here, might I hit a gas main?
  • Has it been used in this way before, and if so
    was it a success? By whom?

Where are the problems / missing values?
Will this work? Will I use it right?
10
Approaches to producing metadata
  • Use existing and emerging metadata
    standards(perspective of data producer, onus on
    data producer)
  • User ranking and feedback
  • What works? What is missing? What is known?
    What is unknown?
  • Use-case logging monitor use via a web portal /
    library, warehouse
  • Use counts by web domains to differentiate
    different user communities to measure impact,
    value to intended users communities
  • Use-case mining and analysis
  • Discover significant usage patterns, use these to
    infer relevance, e.g. recommender systems,
  • Genesis, derivation, workflows
  • By exposing, analyzing and documenting the means
    by which the dataset was produced
  • Ontology mining
  • Ontology creation from either schema (metadata)
    or content (data)

11
User ranking and feedback
  • Virtual Reality in Geography (Geographic
    Information Systems Workshop) by Peter Fisher
    (Editor), et al (Hardcover - January 15, 2002)
    (Rate this item) Usually ships in 24 hoursList
    Price 99.95 Buy new 99.95

Hypothesis users have valuable things to say
about the products they use
12
(No Transcript)
13
Use case logging
  • Log each interaction with the cyberinfrastructure
    that provides access to datakeep tallies

Hypothesis Knowing what is popular is helpful
when making choices
14
Use-case mining and analysis
  • Learn from the use-cases (Recommender systems)
  • Who created this resource?
  • When was it created?
  • How often has it been used?
  • Has it been modified recently?
  • Who has used it?
  • What has it been used with?

Hypothesis I can learn from the actions of others
15
Define situations of use
16
Mining association rules from use-case logs
  • Association rules are mined from user action logs
  • uses the WEKA (Waikato Environment for Knowledge
    Analysis) API that implemented the Apriori
    algorithm (Agrawal, R. and Srikant, R., 1994).
  • Tools added for data preprocessing and
    classifying
  • attribute selector allows user to select a
    subset of data attributes.
  • data filters allows user to define filters to
    convert String, Time, Numeric data in any
    attribute column to nominal data for association
    mining.

17
Data mining tools (association rules)
Results sensitivity settings
Data Filter - String
Attribute Selector
Design
Data Filter - Numeric
Data Filter - Time
18
Applying results of mining, e.g. musicplasma.com
19
Genesis, derivation, workflows
  • Define how a dataset was created

Hypothesis We may not be able to quantify
uncertainty for your use-case, but we can show
you exactly what we did!
20
Ontology mining
  • Recent work has shown that ontologies can be
    built by mining database schema (e.g. GEON
    portal)

Ontologies can also be built by analyzing the
data itself Having a domain ontology is useful
for anchoring
Hypothesis metadata descriptions that are
inferred are useful, consistent and not
burdensome since they can be mined
21
Codex ways of understanding a data resource
22
implemented as a web portal
Bill Pike, PSU
http//hero.geog.psu.edu/codex
23
Concept maps (gravitational anomaly)
24
extend to data and methods
25
and to people
Also to articles / papers using Citeseer metadata
26
A learning activity integrated with semantic
search
Conceptual Space
List of concepts
Embedded learning activity (resource)
Searching Digital Libraries for content relating
to selected concepts (DLESE, ADL)
27
Conclusions
  • Assertion 1 Current attempts to gather and
    utilize metadata are failing
  • Assertion 2 The burden of tagging existing and
    future data with user-relevant metadata is
    overwhelming. We cannot realistically expect data
    producers to carry this burden alone.
  • Many different approaches to metadata creation
    are open to us.
  • Some are new, facilitated by grid and web
    service brokered access to e-resources.
  • We need to try some of these on a large scale.
  • Incentives rewards
Write a Comment
User Comments (0)
About PowerShow.com