Diverse data to diverse visualization systems end to end - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Diverse data to diverse visualization systems end to end

Description:

Other work focusses on these aspects ... We focus on this here. 10/14/09. show me. 12. 12. UK e-Science AHM2004 - 1 September 2004 ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 23

Provided by: julian100

Category:

more less

Transcript and Presenter's Notes

Title: Diverse data to diverse visualization systems end to end

1

Diverse data to diverse visualization systems end
to end
Julian Gallop
CCLRC Rutherford Appleton Laboratory

2
Outline

The present situation
Introduction to solution
More details
Assessment and further work
Acknowledgements

3
diversity
diversity of visualization systems

diversity of data sources
text, de facto, legacy
Comma Separated Values (CSV)-
Text values allowed/disallowed
Missing values
netCDF
HDF5
FEA data
Growth of XML-based data

VisAD
Matlab
AVS
Iris Explorer
vtk
gnuplot

- IDL
PV3
ArcInfo
XMDV
Excel
R

Although, we refer mainly to visualization
systems, most of what follows applies to more
general data analysis tools too

4
Effect of the Grid on diversity

Grid developments aim to bring about more
effective use of data
Find and access any data that you are entitled to
use
Trend towards using XML for descriptive purposes
However, the diversity of data structures
remains
Valuable ( even irreplaceable) legacy data
holdings will continue
XML developments initiated within application
domains (e.g. marine data, earth science)
Grid Virtual Organisations (VOs) form ? change
? disperse
Suppose we have a collaborating group
Multidisciplinary knowledge of multiple data
sources
Multiple preferred visualization systems

So, there is still a gap to be bridged between -

Conventional approaches to this-
- No problem, I only use one combination
- or Thats easy, Ill write a converter
or Collaborating team agrees to use just one
viz system
But Grid-enabled VO encourages teams that
form, change and disperse
and are multidisciplinary

multiple data formats and models
multiple preferred visualization systems
and
Precious legacy data
Programming script oriented e.g. Matlab
Satellite data HDF5
MVE e.g. Iris Explorer
New data
Joe Bloggs data
Application-oriented XML
Toolkit e.g. VisAD, PV3
??
6
Many characterisations of data sources

legacy / current / being planned
de facto / self-describing non-XML / XML
metadata no / some / good
application dependent / independent
text / binary
access private (by intent / by default) /
restricted / public
spatial / non-spatial
regular / irregular
references none / rich (e.g. FEA cells, GIS,
networks)
dimensions single / three / many
DBMS / or not
defined by API / format

7
Many characterisations of visualization and data
analysis systems

fixed function
adaptable by API / scripts / visual networks
API C / Java / Python / etc
dimensions single only / volume / multivariate
regular only / irregular possible
formats readable
native / other popular / netCDF / HDF(5) /
limited XML
purchase cost none / cheap /expensive

Investigate whether we can do this instead

Precious legacy data
Programming script oriented e.g. Matlab
Satellite data HDF5
MVE e.g. Iris Explorer
New data
Joe Bloggs data
Application-oriented XML
Toolkit e.g. VisAD, PV3
Investigate moving from mxn to mn .. In
more detail Axmxn to Bxm Cxn D (and avoid
making B,C,D too big)
9
A possible framework

This work is investigating a possible framework.
Some general principles
Make use of XML for description wide acceptance
and supported by wide range of conversion tools
Convert description to XML as soon as possible in
the chain
No requirement for tagging each datum with XML
(unless the data source does this already)
Adopt a descriptive approach, not prescriptive
(approach followed by BinX, DFDL, ESML)
Decompose transformations into single purpose
components, which could potentially be located in
different places
Use existing tools such as XSLT or, when more
complexity required, XQuery
Avoid undue gross loss of speed e.g. avoid
repeated conversions of very large datasets

2 approaches
(1) Use an intermediate form
Data source ? bridge ? ready for vis system
Presence of intermediate form may make this
easier to understand
Suitable for small amounts of data
(2) Convert in one transformation
Requires analysis of data source and vis system
Requires creating a converter instance
Using one transformation may be suitable for
large data object

11
Converting metadata

Metadata has several aspects which include
Essential information about the data
e.g. circumstances in which the data was obtained
Other work focusses on these aspects
Here, we provide a mechanism for delivering them
to the visualization/analysis system
Structure of the data
We focus on this here

12

data source expertise
visualization/analysis expertise
input to visualization / analysis system
data source
DataBridgeML
metadata
metadata
metadata
convert
convert
all except structure
all except structure
all except structure
structure info
structure info
structure info
data object
Metadata converted using an bridge ML referred
at present as DataBridgeML Conversion of data
object deferred
13
Converting the large data object

Next, we deal with converting the large data
object.
For performance reasons, we wish to avoid
converting this twice

14

data source expertise
visualization/analysis expertise
input to visualization / analysis system
data source
DataBridgeML
structure info
structure info
structure info
vis system capabilities
Convert
data object
data object
Conversion of large data object Converter depends
on structure information and the visualization
system capabilities
15
Converting the data structure

Actions of the data converter include
Resequence Extract Convert ASCII/binary
Convert representation of member elements
Manage separators Split/Combine files
In each specific instance, actions required
depend on
Description of the data structure of the data
source
Description of data capabilities of the
visualization system
Description of the required subset
Recognise easy cases e.g.
no conversion required
no resequencing required
Note the conversion itself does not require XML
processing XML is only used to describe it.
Current status still under investigation

16
Use of XML

So, we need XML languages for
the data bridge
specifying the subset to be delivered
specifying the visualization system capabilities
for reading data

17
Candidates for DataBridgeML

Requires
Ability to describe wide range of data sources
Need high level description of data structure
(e.g. arrays and tables, not just the low level
detail). Needs to support data object converter
instance so experimentation with the markup
needed here.
Need to be able to specify how the dataset is
accessed e.g. ftp, DODS, user/password
required, HTML table (or could separate this
out). Current Grid middleware developments could
simplify this (e.g. OGSA-DAI)
Some relevant existing markup languages
BinX
DFDL being discussed within Global Grid Forum
XDF developed at NASA Goddard Centre to convert
their archives
Currently using slightly modified XDF in the
interim, but also tracking DFDL developments.

18
Data source example

Gridded Population of the World at CIESIN at
Columbia University.
Contains the population for each
latitude/longitude cell
Available as a directory of files via FTP
Description file for whole dataset
Dataset divided into files by continent
For each continent file
2 metadata files contents include extent of
data in longitude and latitude number of
entries on each axis projection
1 file containing header and data
Header includes value for missing data
Although comparatively simple, it raises issues
regarding metadata requires knowledge of the
circumstances
Some metadata is duplicated, but not identical
No units easily extractable (000s, 000000s ?)

19
Assessment (1/2)

Diversity of data sources and visualization
systems likely to continue.
Framework for transforming metadata and data
structure from data sources to visualization
systems is presented here. It splits the
knowledge required between data source expertise
and visualization system expertise
Concept appears to be feasible, but further work
needed (next slide)
Framework could be populated with a (distributed)
catalogue of data source descriptions and
visualization system descriptions.

20
Assessment (2/2)

Further work
Further prototyping needed, particularly on
specifying the conversion of data objects
Requires further tests of XML-based descriptions
and investigation of the relation to other
initiatives such as BinX and DFDL (daffodil)
Need to validate feasibility with more complex
cases (more complex data sources adaptable
visualization systems)

21
Acknowledgements

This investigation has been part of the gViz
project (Visualization Middleware for e-Science)
Project in the e-Science core programme, ended 31
July 2004
Partners
Universities of Leeds, Oxford and Oxford Brookes
CCLRC RAL IBM Nag and Streamline Computing
major work in the project included
Visualization for computational steering on the
Grid
talk by Ken Brodlie in the Mini Workshop
Computational Steering and Visualisation on the
Grid Practice and Experience
Generalization of data flow networks for
visualization, using an XML-based language (sKML)

22
Invitation

I am interested in evaluating the approach on
different classes of data source
If you use or are responsible for datasets which
may be of interest, please contact me
Email Julian.Gallop_at_rl.ac.uk
or see me at this meeting
Thank you for your attention

Write a Comment

User Comments (0)