Title: XML, Bioinformatics, and Data Integration
1XML, Bioinformatics, and Data Integration
- Presentation on the review paper from
- Achard, F., Vaysseix, G., and Barillot, E. (2001)
XML, bioinformatics and data integration.
Bioinformatics. 17(2)115-125.
December 22, 2006
2Questions to Answer
- What is XML?
- How is it applicable to bioinformatics?
- What are some possible alternatives, and how does
it compare?
3What is it?
- Markup Language originating from SGML
- Developed to provide the flexibility of SGML for
the growing needs of the internet and provides
many advantages over HTML - XML documents consist of elements with start and
end tags, attributes, and data which may follow
the structure defined in a Document Type
Definition
4Other XML-Related Terminology
- XSLT- XML Stylesheet Language
- XLink (XLL)- Language for XML hyperlinks
- XQuery (XQL)- Query languae for XML
- XHTML- Implementation of HTML in XML
5Why XML?
- Easy to parse and to generate
- Supports more complex structures and semantics
than HTML - Widely implemented and supported by
- All major web browsers
- Other programming languages
- Projects like Apache XML
- Groups defining DTDs for specific fields
6DTDs Related to Science and Math
- CML- Chemical Markup Language
- MathML
- BSML- Bioinformatic Sequence Markup Language
- BioML- Biopolymer Markup Language
- DTDs have also been developed for gene
ontologies, taxonomy, etc
7General Qualities of Bioinformatics Data
- Quickly evolving- Must be able to handle the
addition of new types of data, and also be able
to relate it to previously existing data - Large numbers of data objects, with a tendency
only to increase in number- and all of this data
usually needs to be archived - Often multiple users
- All of this points to a need for scalability and
expressiveness
8Advantages of XML
- Flexible- Ease of human readability of both XML
data files and DTDs allows for easy modifications - Allows for creation of data format standards for
storage and interchange - Easy to implement
9Possible Alternatives
- Abstract Syntax Notation One (ASN.1)
- Binary data format with structure description
- Corba
- Object oriented data server offering platform and
language independence, - Java Remote Method Invocation
- Allows for clients to invoke methods on Java
objects stored elsewhere - Object Oriented Databases
10How does it Compare?
Table 1 from Achard, F., Vaysseix, G., and
Barillot, E. (2001) XML, bioinformatics and data
integration. Bioinformatics
Note the XML query language, XQuery has been
adopted and developed further by the W3C
11Conclusions
- XML has some weak spots in terms of biological
data modeling - XML, while supporting semantics more than HTML
still may fail to provide the level of
expressiveness required for some applications - No inheritance, limited constraints, limited type
support - Thus the optimal solution may be XML OODBMS
- XML becomes the interface
- Will be most useful when DTD standards are
developed and utilized
12Happy Holidays!
13Example XML Application- Pedigrees
- Using XML for pedigrees could make them easier to
read, and can certainly be more descriptive than -