Title: Introduction to the BinX Library
1Introduction to the BinX Library
- eDIKT project team
- Ted Wen tedwen_at_nesc.ac.uk
- Robert Carroll robertc_at_nesc.ac.uk
- About the BinX project
- A brief introduction to the BinX language
- Introduction to the BinX library
- Advanced API to the BinX library
- Use cases and requirements
- Dr Bob Mann
- Dr Chris Maynard
- Discussion
3About the BinX project
4The problem
- XML is useful to represent metadata
- Scientific datasets can be too large in XML
- Most scientific data are in binary files
- Binary data files are not all standardized
- Binary data files are platform-dependent
5BinX a solution
- Initially designed for the Grid environment
- Annotate data schema for any binary file
- Data elements are marked up in XML
- Describe three levels of features in a binary
file - Underlying physical representation (byte order)
- Primitive data types (integer, float)
- Structure of the dataset (array, table)
6The BinX project at eDIKT
- Implementing a software library for BinX
- Develop a series of tools based on the library
- Choose C for performance
- Write portable code for different platforms
- Robust and easy to use
7Development status
- Requirement gathering from July 2002
- Development started in October 2002
- Prototype finished in December 2002
- Alpha version complete in April 2003
- Beta version to be released in June 2003
8The deliverables
- The BinX library
- Compiled code on different platforms
- Source code with Open Source license
- Documentation
- Users guide
- Developers guide
- Utilities and examples
9The BinX Language
10What is BinX?
- The Binary XML Description Language
- A language for annotating binary data files
- It describes data types, data structures and
attributes such as byte order - A BinX document is an XML file with metadata of a
binary data file
11A BinX document
- ltdataset byteOrderbigEndiangt
- ltdefinitionsgt
- ltdefineType typeNamemyTypgt
- ltarrayFixedgt
- ltcharacter-8/gt
- ltdim indexTo9/gt
- lt/arrayFixedgt
- lt/defineTypegt
- lt/definitionsgt
- ltfile srcmyfile.bingt
- ltuseType typeNamemyTyp/gt
- ltinteger-32 varNameX /gt
- lt/filegt
- lt/datasetgt
Root element
Data class section
Abstract data type
Data instance section
12Data elements
- Primitive data elements
- Byte, character, integer, real
- Complex data elements
- Arrays, struct, union
- User-defined data elements
13Primitive data types
- Bit
- ltbit-1gt
- Character
- ltcharacter-8gt
- ltunicodeCharacter-16gt
- ltunicodeCharacter-32gt
- Integer
- ltbyte-8gt
- ltshort-16gt, ltunsignedShort-16gt
- ltinteger-32gt, ltunsignedInteger-32gt
- ltlongInteger-64gt, ltunsignedLongInteger-64gt
- Real
- ltieeeFloat-32gt
- ltieeeDouble-64gt
- ltieeeQuadruple-128gt
14Complex data types
- Arrays
- Repetitive collection of any data element
- Multidimensional
- Three types of arrays
- Fixed length array
- Variable-length array
- Streamed array
- Struct
- A sequence of data elements
- Union
- One of a group of possible data elements
conditional to the discriminant
- Streamed array
- ltarrayStreamedgt
- ltbyte-8/gt
- ltdimStreamed/gt
- lt/arrayStreamedgt
- Fixed-length array
- ltarrayFixedgt
- ltieeeDouble-64/gt
- ltdim indexTo3 nameX /gt
- ltdim indexTo4 nameY /gt
- ltdim indexTo5 nameZ /gt
- lt/arrayFixedgt
- Variable-length array
- ltarrayVariable sizeRefbyte-8gt
- ltieeeFloat-32 /gt
- ltdim indexTo7/gt
- ltdimVariable/gt
- ltarrayVariablegt
- ltstructgt
- ltshort-16 varNameID /gt
- ltinteger-32 varNameCount /gt
- ltieeeDouble-64 varNameVar /gt
- lt/structgt
- ltuniongt
- ltdiscriminantgt
- ltbyte-8/gt
- lt/discriminantgt
- ltcase discriminantValue32gt
- ltieeeFloat-32 /gt
- lt/casegt
- ltcase discriminantValue64gt
- ltieeeDouble-64 /gt
- lt/casegt
- ltcase discriminantValue0gt
- ltvoid-0 /gt
- lt/casegt
- lt/uniongt
18User-defined data type
- ltdefineType typeNameHeaderStructgt
- ltstructgt
- ltcharacter-8 varNameA/gt
- ltcharacter-8 varNameB /gt
- ltinteger-32 varNameLength /gt
- lt/structgt
- ltdefineTypegt
19Data elements as instances
- ltfile srcmyfile.bingt
- ltshort-16 varNameid/gt
- ltarrayFixed varNamenamegt
- ltcharacter-8 /gt
- ltdim indexTo7 /gt
- lt/arrayFixedgt
- ltstruct varNamerecordgt
- ltshort-16 /gt
- ltieeeFloat-32 /gt
- lt/structgt
- lt/filegt
20Reference defined elements
- ltdefinitionsgt
- ltdefineType typeNameAgt
- ltstructgt
- ltshort-16/gt
- ltinteger-32/gt
- lt/structgt
- ltdefineTypegt
- lt/definitionsgt
- ltfile srcmyfile.bingt
- ltuseType typeNameA varNameFirstUse/gt
- ltuseType typeNameA varNameSecondUse/gt
- lt/filegt
21The BinX Library
22Fundamental requirements
- Access to data elements in binary files via BinX
- Parse the BinX document
- Build in-memory data structures
- Read data values from the binary file
- Automatic conversion
- Byte ordering
- Padding
- Producing BinX document and binary data
- Generate BinX document for data structures
- Save assigned data values into binary files
23General use cases
- Data conversion (byte order)
- Data extraction (sub-dataset)
- Data combination (two arrays to one)
- Data presentation (browse, pure XML)
24BinX Components
- The library has core functionality to support
generic utilities and applications
BinX core functionality Parse BinX document
Read binary data
BinX Library Core
Generic tools Data conversion Extraction
Applications Domain-specific
25The BinX library core
- Input SchemaBinX, binary data file
- Output DataBinX, In-memory dataset
In-memory Data structure (Values loaded on
ltdatasetgt lt/datasetgt
The BinX library
ltshort-16gt 100 lt/short-16gt
26The BinX Utilities
- DataBinX generator
- DataBinX splitter
- SchemaBinX creator
- Binary file indexer
27DataBinX generator
- Put binary data inside XML
- For browsing, web service return, query result
ltdatasetgt lt/datasetgt
The BinX library
ltshort-16gt 100 lt/short-16gt
28DataBinX splitter
- The reverse of DataBinX generator
- Generate binary file for testing, transportation
- Cross-platform (byte order)
ltdatasetgt lt/datasetgt
The BinX library
ltshort-16gt 100 lt/short-16gt
29SchemaBinX creator
- GUI and Web-based utilities
- Build BinX document interactively
- Create a BinX document based on another
30Binary file indexer
- Generating indices for binary data files
- Such indices can be used for fast data access
ltdatasetgt lt/datasetgt
The BinX library
31Applications for astronomy
- FITS and VOTable conversion
DataBinX Utility
BinX library Core
SIMPLE T END 01010101
lt?xml version. ltVOTABLEgt lt/VOTABLEgt
32FITS ?DataBinX ?VOTable
- FITS to VOTable conversion
DataBinx Utility
XSLT transformer
Schema BinX
- VOTable to FITS conversion
Schema BinX
DataBinx Utility
XSLT transformer
Binary Data
Post processor
FITS Header
34FITS-VOTable experiment
- Sample FITS file
- A data table of 82 rows X 20 fields
- File size 37KB
- Generated DataBinx by DataBinx utility
- Time spent 268 ms
- DataBinx document size 1.2MB
- VOTable transformed by MSXML
- Time spent about 1 second
- VOTable document size 51KB
35Possible future releases
- DataBinX parsing
- Utilities (GUI BinX editor)
- XPath-based data query
- DFDL support
- Preserving special tags
- For comments, application-specific tags
- Text file support
36Features or issues to consider
- Converting floating point numbers
- 80-bit, 96-bit, 128-bit floating point
- Array manipulation (slice, section)
- SAX-based XML document parsing
- Use cases in place of DOM parsing
- Built in the library or as add-on component?
- Database support
- Annotating database tables?
- Query database tables through BinX?
- Java version of the library
- Keeping exactly the same features with the C
version? - Supporting XQuery
- Query binary data files with XQuery on BinX
- For problems of usage
- http//www.edikt.org/binx (coming soon)
- support_at_edikt.org
- For requirements and suggestions
- tedwen_at_edikt.org
- robertc_at_edikt.org