Title: Macromolecular Structure Middleware
1Macromolecular Structure Middleware
- OpenMMS
- An Ontology Driven Architecture
2Overview
- The mmCIF Ontology
- OpenMMS Toolkit
- Macromolecular Structure (MMS) Metamodel
- Parser, XML
- SQL / Corba Servers and Clients
- Corba
- UML and the future...
3How do we Enable Science?
- Promote well defined Macromolecular Structure
(MMS) Specifications - Distribution Open Interfaces
- Now
- flat files
- W3 browsing and searching
- Future
- XML, SQL, CORBA
4Why OpenMMS?
- Allow programmers to more easily create
efficient, high performance and robust
applications. - A Java-only toolkit with that creates XML, CORBA
and Relational DB representations of the mmCIF
Macromolecular Structure Data. - Source code is publicly available so users can
easily modify the metamodel or create an entirely
new one.
5What Do We Mean by an Ontology Driven
Architecture?
What do we mean by an Ontology?
A bridge between Our World of Natural
Language and the World of Machines.
6mmCIF Dictionary and Data Files
- Based on Ontology for Macromolecular Structure
defined by the International Union of
Crystallography - Replaces the older 80-Column PDB files
- mmCIF Dictionary contains over 140 Category and
1600 Item definitions - Open, Extensible
- Provides a well-defined reference standard for
data distribution
7OpenMMS Toolkit Data Flow
8Metamodel Information Flow
mmCIF Dictionary
mmCIF Ontology Metamodel
Metamodel Framework
Corba IDL, SQL Schema, XML DTD, Java Data
Loaders JDBC Loaders
9What can OpenMMS do?
- PDBase program will load any or all PDB files
into any SQL-92 compatible database (Oracle,
mySQL, Sybase...) - Translate any PDB file into an XML file.
- Contains Two Corba servers
- Reference server will cache and serve data read
from PDB flat files. - DB server will cache and serve data read from a
SQL database (very quickly...) - All Source code written in Java and publicly
available.
10Some Advantages of Using an Ontology Driven
Architecture
- Scales to very large Ontologies
- More reliable and maintainable code
- Transfer between representations
- Scientific Correctness of representation
- Help in maintaining backward compatibility
11How does one actually represent an
ontology?(OpenMMS Internal Metamodel Overview)
Root
Visitor Abstract Class
Module
Module
Interface
Struct
Visitor Subclass
Struct
Struct
Field
Field
12mmCIF Parsers
- General Purpose, Low-level access to data
- Parsers available in many languages
- OpenMMS toolkit includes Java Parser
- Uses Builder Design Pattern
- An application subclasses Abstract Builder class
and stores data into its data structures
13MMS in XML
- Large Flat Files (open and close tags)
- Tables can be grouped by rows or columns
- XML from SQL Query
- Many requests from Web browsers dont really need
or want all the data - SW available from DB Vendors and ISVs for
creating XML files from SQL result sets - Smaller files load faster
14Relational DB Expression
- SQL-92 Compatible
- Schemas for all the standard DB vendors
- Fast and Flexible Keyword searches
- PDBase loader allows structures to be selectively
loaded - Oracle Instance Tested
- 14,556 Structures
- 16GB, 88 Million Atom Records
15A very high-level (and very-rough)
classification of communication
- Person-to-Person communication
- email
- Person-to-Machine communication
- HTTP/HTML
- Machine-to-Machine communication
- CORBA, SQL, .NET, Soap
- Not Communications -gt Data Formats
- XML, mmCIF (STAR), many more
16What is CORBA?
- Common Object Request Broker Architecture
- Defines a family of open software interface
specifications for distributed object computing. - http//www.omg.org
17What is an Object? A Data Structure with an
Attitude
- Programs Algorithms Data Structure
- Object Oriented Programming Principle
- Partition the parts of algorithms with the
data structures they use
18Side View of a Distributed Application
Client E.g. a Java Applet
Server
Middle Ware
Middle Ware
E.g. Mainframe Computer Server
IDL
IDL
Network
Internet (TCP/IP)
19The Hourglass view of the Internet
Applications
HTTP, Corba, .NET
? Reliable Bitsteam
TCP, RTP,...
IP
? Unreliable Datagrams
Copper, Glass Radio Spectrum
(ATM, Ethernet, V.90, SONET...)
20Where is Corba?
- Inside every Java Runtime Environment.
- Commonly used in middle tier and backend (e.g.
database) connections. - Open Source and Commercial Implementations
Available - Usually buried deep inside the software
- Difficult or impossible to tell when it is being
used
21What is Distributed Object Computing?
- Extends the benefits of object-oriented
technology across process and machine boundaries
to encompass entire networks. - Attempts to make remote objects appear to
programmers as if they were local objects in the
same process. This is called location
transparency.
22Advantages of Distributed Object Computing
- Easier (and faster) for programmers to create
distributed applications - Increases Reliability
- Increases Maintainability
- Increases Portability
- Increases Extensibility
23The Alphabet Soup
- OMG Object Management GroupConsortium of 800
companies founded in 1989. - IDL Interface Definition Language
24Boundaries, Interfaces
- The key is to focus on boundaries, interfaces,
how things fit together - Not on the internal details of how theyre built
assume that will be diverse changing
25Boundaries, Interfaces
- The Interface to an object can be distributed
over a network
Shape of boundary is defined in IDL
26Corba Independence
- Open Standard for Distributed Object Oriented
Design - Independent of Hardware Platform
- Independent of Operating System
- Independent of Programming Language
- Independent of Object Location
27Object Request Broker
- ORBs mediate between objects and things that use
them (clients)
Object Request Broker
28Terminology
- IIOP
- The Internet Inter-ORB Protocol, defined in the
Spec as a vendor-independent, wire-level network
protocol on top of TCP/IP. This allows ORB
implementations of different vendors to
interoperate.
29ORBs Medium for Integration
ORB
ORB
ORB
30Corba FacilitiesIndustry Standards in Vertical
Markets
- Manufacturing
- Finance
- Life Sciences Research
- C4I
- Many others...
31Using Corba to accessMacromolecular Structure
Data
- No Parsing of Flat Files
- Direct Access to Binary Data Structures
- Strongly Typed Data
- Granularity of Access
- Indices and Presence Flags Pre-computed
- Highest Performance
32OMG/LSR Macromolecular Structure Adoption Process
- August 1999 RFP issued
- March 2000 Initial Submission
- September 2000 Revised Submission
- February 2001 Adopted Spec by the OMG
- 4Q 2001 OpenMMS LSR/MMS1.0 compliant
implementation source code publicly available - February 2002 Approved as a Formal
OMG Available Specification.
33Using the CORBA MMS Server
An excerpt from legacy PDB Formatted File
ATOM Record (4hhb.ent) ... ATOM 6 CG1 VAL A
1 7.009 20.127 5.418 6.00 61.79
... ATOM 7 CG2 VAL A 1 5.246
18.533 5.681 6.00 80.12 ... ATOM 8 N
LEU A 2 9.096 18.040 3.857 7.00 26.44
... ATOM 9 CA LEU A 2 10.600
17.889 4.283 6.00 26.32 ... ATOM 10 C
LEU A 2 11.265 19.184 5.297 6.00 32.96
... ATOM 11 O LEU A 2 10.813
20.177 4.647 8.00 31.90 ... ATOM 12 CB
LEU A 2 11.099 18.007 2.815 6.00 29.23
... ATOM 13 CG LEU A 2 11.322
16.956 1.934 6.00 37.71 ... ATOM 14 CD1
LEU A 2 11.468 15.596 2.337 6.00 39.10
... ATOM 15 CD2 LEU A 2 11.423
17.268 .300 6.00 37.47 ... ...
34LSR/MMS ATOM Record
DsLSRMacromolecularStructure.idl excerpt
struct AtomSite string id
IndexId type_symbol AtomIndex label
IndexId label_entity VectorXYZ
cartn float occupancy float
b_iso_or_equiv
35Example Code and Resulting Output
Entry e entryFactory.get_entry_from_id(4hhb")
AtomSite a e.get_atom_site_list() for (int i
0 i lt a.length i)
System.out.println(ai.id " "
ai.type_symbol.id " ("
ai.cartn.x ", " ai.cartn.y ", "
ai.cartn.z ")") produces 1 N
(11.065, 7.352, 9.598) 2 C (12.436, 7.764,
9.902) 3 C (12.883, 7.09, 11.208) 4 O (12.088,
7.0, 12.147) 5 C (12.611, 9.264, 10.06) ...
36What are the alternatives to Corba?
- TCP/IP Sockets - Byte stream
- DCOM, COM, OLE, .NET (Microsoft Only)
- DCOM ? ? Corba Bridges are available from several
vendors - SOAP (Simple Object Access Protocol)
- XML Based
37Unified Modeling Language UMLWhat do all those
arrows and boxes Mean?
- Schematic Language for Defining SW
- Graphics Representations
- UML Things, Relations and Diagrams
- 9 types of Diagrams
- The most commonly used diagram is the Class
Diagram
38 UML Class Diagram Example
EntryFactory
get_version() get_entry_id_list() get_entry_modifi
cation_dates() native_formats_supported() get_nat
ive_entry_representation()
ModificationDate
Entry_id EntryId date TimeBaseTimeT
39UML Class Diagram Basics
? Underlined for Class Instances, Italics
for Abstract Classes
Class_Name
var1 Type var2 Type
? Variables
method1() method2() method3()
Details may be omitted if not important
40UML Relationships
Dependency
0..1
Association
Generalization (Inheritance)
Aggregation
41 UML Example
EntryFactory
get_version() get_entry_id_list() get_entry_modifi
cation_dates() native_formats_supported() get_nat
ive_entry_representation()
ModificationDate
Entry_id EntryId Date TimeBaseTimeT
42XMI XML Metadata Interchange
- UML is a graphical representation need some way
to exchange UML models between applications - XMI is used to store and transmit UML models
- XML based
- Defines XML tags for classes, relationships
between classes etc.
43OMG MDA
- Platform Independent Models (PIMs) that define
the interface are defined in UML - The PIMs are translated to Platform Specific
Models (PSMs) such as Corba, SOAP, .NET or XML
Schemas - The Corba servers and clients may be the same,
but now the interface is defined in UML and the
IDL is then generated from the UML
44MDA Platform Independent toPlatform Dependent
Translation
UML
.NET
Corba
SOAP
XML
45Thanks and Acknowledgments
- Phil Bourne
- John Westbrook
- David Benton
- Karl Konnerth
- Lynn TenEyck