Memops Data modelling and automatic code generation - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Memops Data modelling and automatic code generation

Description:

Fully automatic code generation from model. Several programming ... Native Anarchy. Convert. Task1. Task2. Convert. Task2. Task1. Task1. Convert. Task3. Convert ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 50
Provided by: timst7
Category:

less

Transcript and Presenter's Notes

Title: Memops Data modelling and automatic code generation


1
MemopsData modelling and automatic code
generation
  • Edinburgh 9 September 2008

2
Memops - main points
  • Code generation framework
  • Data access subroutine libraries
  • Fully automatic code generation from model
  • Several programming languages in parallel
  • Precise, detailed, validated data

3
Memops
  • Introduction
  • Code generation
  • Generated libraries
  • Applications of Memops

4
The CCPN Project
  • Collaborative Computing Project for NMR
  • Since 1999
  • Unifying platform for NMR software similar to
    CCP4 for X-ray crystallography
  • Community-based, open-source, software
    development
  • Code generation, data model, applications,
    meetings

5
NMR Structural Biology Pipeline
Sample Preparation
NMR Machine
Structure Calculation
Data Processing
Spectrum Analysis
Slow, complex,interactive
Repository Database
6
Native Anarchy
Task1
Task2
Convert
Task1
Task1
Convert
Convert
Convert
Task2
Convert
Task3
Task3
Task3
7
With Data Standard
Task1
Task2
Task1
Convert
Convert
Convert
DataStandard
Task2
Task1
Convert
Convert
Convert
Task3
Task3
Task3
8
Data standard - objectives
  • Lossless data transfer between programs-
    different approaches and architectures
  • All data needed for pipeline software
  • Creating data, not analysing end results
  • Intermediate results needed
  • Comprehensive, detailed, complex
  • Completeness, integrity of changing data
  • Precisely defined standard
  • A single central description
  • Validation directly against standard

9
CCPN approach
  • Standard API, no stable format
  • easier to maintain as model changes
  • Abstract data model
  • Exact correspondence to APIs
  • API implementations for several languages
  • Transparent access to XML or DB storage
  • Complete validation of model rules and
    constraints

10
Memops
  • Introduction
  • Code generation
  • Generated libraries
  • Applications of Memops

11
Automatic Code generation
  • Model will change over time
  • Several parallel implementations
  • Synchronisation between APIs and model
  • Maintenance and debugging
  • Resources are limited
  • Automatic Code Generation
  • Write and debug once and for all
  • Any domain, from Astrophysics to Zoology
  • Quick and simple to extend model
  • E.g. Application-specific packages

12
Code Generation Framework
13
Code Generation
Legend
edit UML
CCPN codeOff-the-shelf files
CCPN generated
API codeSchemasMappings etc.
In-Memory ModelPython objects
MetaModel
On-disk model XML file
14
API generator
  • Written in Python
  • Modular
  • Different generators share code

15
Memops
  • Introduction
  • Code generation
  • Generated libraries
  • Applications of Memops

16
Model features
  • Packages to subdivide model, code, and data files
  • Objects. Unique context, compare-by-identity
  • Complex data types. Different contexts,
    compare-by-value
  • Simple data types, PositiveInt, enumerations,
  • Attributes and links
  • Cardinality, frozen/modifiable, derived
  • Unique/ordered collections (sets, lists, unique
    lists)
  • Ad-hoc constraints on attributes, simple and
    complex datatypes, and objects.

17
Molstructure model package
18
CCPN APIs
  • Application Programming Interface
  • Object oriented
  • Data accessed in memory as if stored in the data
    model
  • Implementations come with
  • Integrated, transparent I/O (file or database)?
  • Complete validity checking
  • Protection against casual change (data
    encapsulation)
  • Versioning and backwards compatibility
  • Event notifier system
  • Slot for application-specific data

19
PythonXML at runtime
User application
Data get, set. Validity check
Python API
XML parser
XML I/O code
Generic XML read/write
XML I/O mappings
What to do for which element
User data in CCPN XMLformat
Data StorageXML files
20
JavaDB at runtime
Legend
CCPN code Off-the-shelf Application
code files
CCPN generated
HQL
Presentation layer
Custom queries(Hibernate QueryLanguage)
Optional
Java API
Hibernate mappings
Hibernate
Hibernate
Database Schema
Database
21
Now Available
  • Version 2.0 just released
  • PythonXML, JavaXML, CXML JavaDB (with
    Hibernate)
  • Available under GPL licensefrom Sourceforge or
    www.ccpn.ac.uk
  • CCPN Data Standard
  • NMR, Macromolecules, LIMS
  • 46 packages
  • 552 classes and data types
  • PythonXML implementation 800,000 lines of code

22
Memops
  • Introduction
  • Code generation
  • Generated libraries
  • Applications of Memops

23
CcpNmr Suite
  • Analysis
  • Interactive NMR analysis
  • FormatConverter
  • Convert between 30 NMR and structure formats
  • Built on top of CCPN model (PythonXML)
  • Version 2.0 released
  • Widely used in macromlecular NMR

24
CcpNmr Analysis
25
ExtendNMR NMR pipeline
  • Integrated macromolecular NMR pipeline- from
    sample to structure
  • Pre-existing programs from 8 groups
  • In-memory conversion to internal data structures
  • Integrated versions released
  • ARIA (NMR structure generation)
  • Bruker TOPSPIN, Manufacturers processing/analysis
    package

26
BIOXDM
  • Software pipeline for on-synchrotron
    crystallography
  • Exploit new technology (? goniometers)
  • Experiment optimisation, acquisition, and on-line
    processing
  • Independent data model, with Memops machinery
  • JavaDB implementation for runtime concurrent
    access

27
EUROCarbDB
  • Distributed deposition database
  • Glycobiology and glycomics
  • NMR, MS, HPLC and topology
  • Java. Database storage using Hibernate
  • CCPN model JavaDB implementation slot in as-is

28
Funding acknowledgements
  • BBSRC CCPN grants
  • European Union grants
  • EXTEND-NMR, EU-NMR, NMR-Life, NMRQUAL, and
    TEMBLOR contracts
  • Industry support
  • AstraZeneca, Dupont Pharma (now BMS), Genentech,
    GlaxoSmithKline
  • Peter Keller (BIOXDM) thanks Synchrotron
    Soleil, the Global Phasing Consortium and EU
    FP6 BIOXHIT

29
People
  • Authors Prof. Ernest Laue, Wayne Boucher,
    Rasmus Fogh, Tim Stevens, John Ionides, Wim
    Vranken (EBI), Peter Keller (Global Phasing)
  • Collaborators at U. Cambridge Dan ODonovan,
    Wolfgang Rieping, Alan da Silva, Darima
    Lamazhapova
  • Collaborators at EBI (MSD), Hinxton Kim
    Henrick, Anne Pajon, Chris Penkett
  • Special thanks to Bruker Biospin GmbH
    (TOPSPIN), Michael Nilges (ARIA), Bas Leeflang
    (EUROCarbDB FP6 contract RIDS-CT-2004-01195

30
END
31
Overview
  • Packages
  • The Implementation package
  • Objects
  • DataTypes and DataObjTypes
  • Access control

32
ARIA structure generation from NMR data
Custom conversion
Application
ARIA XML
ARIA Data Model
CCPN Data Model
CCPN XML
  • ARIA imports
  • Peak Lists
  • Constraints
  • Sequences
  • Chemical shifts
  • ARIA exports
  • Peak Assignments
  • Filtered Constraints
  • Violations
  • Structures

33
API functions
  • get and set (Attributes and links)?
  • add and remove (Collection attributes and
    links)?
  • sorted (Unordered collection links)?
  • findFirst and findAll (Collection links)?
  • Simple filtering (attribute value)?
  • create and new (Objects)?
  • Normal and factory function object creation
  • delete (Objects)?
  • Delete function cascades to objects rendered
    invalid by deletion
  • checkValid, checkAllValid (Objects)?
  • API classes are strongly coupled. For efficiency
    reasons object-to-object links are two-way.

34
FormatConverter - The NMR Translator
Peaks
Chemical shifts
Acquisition parameters
XEasy
NmrView
XEasy
NmrView
Bruker
Varian
...
...
Format specific readers
Generic peak converter
Generic chemical shift converter
Generic acquisition parameters converter
Data model entry
CCPN Data Model
Format specific writers
XEasy
XEasy
NmrView
NMRPipe
Azara
...
...
NmrView
Chemical shifts
Peaks
Processing parameters
35
ExtendNMR ARIA
  • Structure generation from macromolecular NMR
    data, ambiguous distance constraints
  • One of two leading programs
  • Python and scripts, with CNS dynamics engine
  • All input and output integrated to CCPN standard

36
ARIA CCPN object selection
37
ExtendNMR Bruker TOPSPIN
  • NMR processing program of major NMR instrument
    company
  • Java. In-memory conversion to CCPN JavaXML
    implementation
  • CCPN output in current TOPSPIN release,Expanded
    in upcoming release.

38
Data Model v. Data Format
Abstract model (UML)
Relational Database
Atom
Bond
Atom_Bond_Connect
Atom_ID elementName

Bond_ID Atom_ID

Bond_ID bondOrder

XML ltAtom IDAT1 elementNameCgt ltBond
IDBD1 bondOrder1.0gt ltBondListgt
ltAtom1 IDREFAT1/gt ltBond IDREFBD1/gt
ltAtom2 IDREFAT2/gt . lt/Bondgt .
lt/BondListgt lt/Atomgt
39
Packages
40
Packages
  • Partition model, code, and data
  • Import each other
  • Can be omitted
  • All import Implementation and AccessControl
  • Each have a TopObject
  • No links between data from rival Topbjects
    (different extents of data)?

41
Root and TopObjects
42
TopObjects
  • One in every package
  • Ultimate parent to all objects in package
  • Have globally unique identifier (guid)?
  • currentXyz links from root
  • Links can constrain links between descendants
  • In file implementations
  • Hold links to storage and backup locations
  • Live in Implementation as almost empty shell

43
Overview
  • Packages
  • The Implementation package
  • Objects
  • DataTypes and DataObjTypes
  • Access control

44
CcpNmr Analysis
  • NMR Assignment Program
  • Inspired by ANSIG and Sparky
  • Demonstrates CCPN approach
  • Modern interface and scripting
  • Scalable and extensible
  • Operating Systems
  • Linux, Sun, SGI, OSX, Windows
  • Languages
  • Python
  • Data model interaction
  • Tk Graphical interface
  • Scripting
  • C
  • OpenGL/Tk contours
  • Structure display
  • Mathematical operations

45
Implementation Package
  • Model and Code
  • Supertypes that define all objects
  • Objects
  • DataTypes
  • DataObjTyps
  • Basic data types
  • Data how to access the real data
  • Data location pointers
  • Current-package pointers
  • Implementation data are not part of the data set,
    and are not in the database.
  • Represent view or session?

46
Data Location
47
Objects and their Supertypes
48
Simple Data Types
49
Complex Data Types
Write a Comment
User Comments (0)
About PowerShow.com