Title: Diverse data to diverse visualization systems end to end
1- Diverse data to diverse visualization systems end
to end - Julian Gallop
- CCLRC Rutherford Appleton Laboratory
2Outline
- The present situation
- Introduction to solution
- More details
- Assessment and further work
- Acknowledgements
3diversity
diversity of visualization systems
- diversity of data sources
- text, de facto, legacy
- Comma Separated Values (CSV)-
- Text values allowed/disallowed
- Missing values
- netCDF
- HDF5
- FEA data
- Growth of XML-based data
- VisAD
- Matlab
- AVS
- Iris Explorer
- vtk
- gnuplot
- - IDL
- PV3
- ArcInfo
- XMDV
- Excel
- R
- Although, we refer mainly to visualization
systems, most of what follows applies to more
general data analysis tools too
4Effect of the Grid on diversity
- Grid developments aim to bring about more
effective use of data - Find and access any data that you are entitled to
use - Trend towards using XML for descriptive purposes
- However, the diversity of data structures
remains - Valuable ( even irreplaceable) legacy data
holdings will continue - XML developments initiated within application
domains (e.g. marine data, earth science) - Grid Virtual Organisations (VOs) form ? change
? disperse - Suppose we have a collaborating group
- Multidisciplinary knowledge of multiple data
sources - Multiple preferred visualization systems
5- So, there is still a gap to be bridged between -
- Conventional approaches to this-
- - No problem, I only use one combination
- - or Thats easy, Ill write a converter
- or Collaborating team agrees to use just one
viz system - But Grid-enabled VO encourages teams that
- form, change and disperse
- and are multidisciplinary
multiple data formats and models
multiple preferred visualization systems
and
Precious legacy data
Programming script oriented e.g. Matlab
Satellite data HDF5
MVE e.g. Iris Explorer
New data
Joe Bloggs data
Application-oriented XML
Toolkit e.g. VisAD, PV3
??
6Many characterisations of data sources
- legacy / current / being planned
- de facto / self-describing non-XML / XML
- metadata no / some / good
- application dependent / independent
- text / binary
- access private (by intent / by default) /
restricted / public - spatial / non-spatial
- regular / irregular
- references none / rich (e.g. FEA cells, GIS,
networks) - dimensions single / three / many
- DBMS / or not
- defined by API / format
7Many characterisations of visualization and data
analysis systems
- fixed function
- adaptable by API / scripts / visual networks
- API C / Java / Python / etc
- dimensions single only / volume / multivariate
- regular only / irregular possible
- formats readable
- native / other popular / netCDF / HDF(5) /
limited XML - purchase cost none / cheap /expensive
8- Investigate whether we can do this instead
Precious legacy data
Programming script oriented e.g. Matlab
Satellite data HDF5
MVE e.g. Iris Explorer
New data
Joe Bloggs data
Application-oriented XML
Toolkit e.g. VisAD, PV3
Investigate moving from mxn to mn .. In
more detail Axmxn to Bxm Cxn D (and avoid
making B,C,D too big)
9A possible framework
- This work is investigating a possible framework.
- Some general principles
- Make use of XML for description wide acceptance
and supported by wide range of conversion tools - Convert description to XML as soon as possible in
the chain - No requirement for tagging each datum with XML
(unless the data source does this already) - Adopt a descriptive approach, not prescriptive
(approach followed by BinX, DFDL, ESML) - Decompose transformations into single purpose
components, which could potentially be located in
different places - Use existing tools such as XSLT or, when more
complexity required, XQuery - Avoid undue gross loss of speed e.g. avoid
repeated conversions of very large datasets
10- 2 approaches
- (1) Use an intermediate form
- Data source ? bridge ? ready for vis system
- Presence of intermediate form may make this
easier to understand - Suitable for small amounts of data
- (2) Convert in one transformation
- Requires analysis of data source and vis system
- Requires creating a converter instance
- Using one transformation may be suitable for
large data object
11Converting metadata
- Metadata has several aspects which include
- Essential information about the data
- e.g. circumstances in which the data was obtained
- Other work focusses on these aspects
- Here, we provide a mechanism for delivering them
to the visualization/analysis system - Structure of the data
- We focus on this here
12 data source expertise
visualization/analysis expertise
input to visualization / analysis system
data source
DataBridgeML
metadata
metadata
metadata
convert
convert
all except structure
all except structure
all except structure
structure info
structure info
structure info
data object
Metadata converted using an bridge ML referred
at present as DataBridgeML Conversion of data
object deferred
13Converting the large data object
- Next, we deal with converting the large data
object. - For performance reasons, we wish to avoid
converting this twice
14 data source expertise
visualization/analysis expertise
input to visualization / analysis system
data source
DataBridgeML
structure info
structure info
structure info
vis system capabilities
Convert
data object
data object
Conversion of large data object Converter depends
on structure information and the visualization
system capabilities
15Converting the data structure
- Actions of the data converter include
- Resequence Extract Convert ASCII/binary
Convert representation of member elements
Manage separators Split/Combine files - In each specific instance, actions required
depend on - Description of the data structure of the data
source - Description of data capabilities of the
visualization system - Description of the required subset
- Recognise easy cases e.g.
- no conversion required
- no resequencing required
- Note the conversion itself does not require XML
processing XML is only used to describe it. - Current status still under investigation
16Use of XML
- So, we need XML languages for
- the data bridge
- specifying the subset to be delivered
- specifying the visualization system capabilities
for reading data
17Candidates for DataBridgeML
- Requires
- Ability to describe wide range of data sources
- Need high level description of data structure
(e.g. arrays and tables, not just the low level
detail). Needs to support data object converter
instance so experimentation with the markup
needed here. - Need to be able to specify how the dataset is
accessed e.g. ftp, DODS, user/password
required, HTML table (or could separate this
out). Current Grid middleware developments could
simplify this (e.g. OGSA-DAI) - Some relevant existing markup languages
- BinX
- DFDL being discussed within Global Grid Forum
- XDF developed at NASA Goddard Centre to convert
their archives - Currently using slightly modified XDF in the
interim, but also tracking DFDL developments.
18Data source example
- Gridded Population of the World at CIESIN at
Columbia University. - Contains the population for each
latitude/longitude cell - Available as a directory of files via FTP
- Description file for whole dataset
- Dataset divided into files by continent
- For each continent file
- 2 metadata files contents include extent of
data in longitude and latitude number of
entries on each axis projection - 1 file containing header and data
- Header includes value for missing data
- Although comparatively simple, it raises issues
regarding metadata requires knowledge of the
circumstances - Some metadata is duplicated, but not identical
- No units easily extractable (000s, 000000s ?)
19Assessment (1/2)
- Diversity of data sources and visualization
systems likely to continue. - Framework for transforming metadata and data
structure from data sources to visualization
systems is presented here. It splits the
knowledge required between data source expertise
and visualization system expertise - Concept appears to be feasible, but further work
needed (next slide) - Framework could be populated with a (distributed)
catalogue of data source descriptions and
visualization system descriptions.
20Assessment (2/2)
- Further work
- Further prototyping needed, particularly on
specifying the conversion of data objects - Requires further tests of XML-based descriptions
and investigation of the relation to other
initiatives such as BinX and DFDL (daffodil) - Need to validate feasibility with more complex
cases (more complex data sources adaptable
visualization systems)
21Acknowledgements
- This investigation has been part of the gViz
project (Visualization Middleware for e-Science) - Project in the e-Science core programme, ended 31
July 2004 - Partners
- Universities of Leeds, Oxford and Oxford Brookes
CCLRC RAL IBM Nag and Streamline Computing - major work in the project included
- Visualization for computational steering on the
Grid - talk by Ken Brodlie in the Mini Workshop
Computational Steering and Visualisation on the
Grid Practice and Experience - Generalization of data flow networks for
visualization, using an XML-based language (sKML)
22Invitation
- I am interested in evaluating the approach on
different classes of data source - If you use or are responsible for datasets which
may be of interest, please contact me - Email Julian.Gallop_at_rl.ac.uk
- or see me at this meeting
- Thank you for your attention