XML and DANSE - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

XML and DANSE

Description:

Attributes are used to provide additional information ... makes extensive use of namespacing and attribute. specification tags. SOAP is relatively complex ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 45
Provided by: cacrCa
Category:
Tags: danse | xml | attribute

less

Transcript and Presenter's Notes

Title: XML and DANSE


1
XML and DANSE
  • Michael McKerns
  • DANSE Software Workshop
  • Caltech Materials Science

2
a Brief XML Overview
  • PART I

3
What is XML?
  • XML eXtensible Markup Language
  • a subset of SGML (standard generic markup
    language)
  • a 'metalanguage' used to describe other languages
  • a language designed to describe data
  • a language where element names and document
    structure
  • are not predefined
  • a 'clear-text' format language
  • a cross-platform, software and hardware
    independent tool
  • for storing, processing, and transmitting
    information

4
XML is not...
  • designed to display data. If you do want to
    display data
  • over the web, you should try HTML.
  • intended to 'do' anything. It is a language that
    acts as a
  • container for information. It creates and
    describes
  • structure of how the information is stored.
  • a replacement for SGML, it is a subset. XML
    retains much
  • of the functionality of SGML while removing
    many of
  • the options and complexities.
  • difficult to learn or read. XML is written in
    clear text, and
  • has relatively few rules (or exceptions).

5
XML components Elements
  • XML is built from elements. Elements are
    composed of a
  • start-tag, and an end-tag, with text and/or
    more markup
  • (sub-elements) between them. A simple
    example is
  • lttitlegtPython and XMLlt/titlegt.
  • There is always one root element, but there can
    be many
  • nested sub-elements. Elements are built to
    have parent-
  • child relationships.
  • Elements are (usually) named to describe the
    information
  • that is contained between the tags.

6
XML components Attributes
  • Elements can have attributes in the start tag.
    The syntax for
  • an attribute is attribute'description'.
    Quotes may be
  • single or double.
  • Attributes are used to provide additional
    information about
  • elements. While elements are used to store
    and describe
  • data, attributes should be used to store and
    describe
  • metadata.
  • Attributes can not describe structures (no
    children), and can
  • not contain multiple values.

7
A basic XML file
  • lt?xml version'1.0'?gt
  • lt!-- A basic XML file --gt
  • ltbookgt
  • lttitlegtPython and XMLlt/titlegt
  • ltauthorgtChristopher A. Joneslt/authorgt
  • ltauthorgtFred L. Drake, Jr.lt/authorgt
  • ltpublisher email'corporate_at_oreilly.com'gt
  • OaposReillylt/publishergt
  • lttext language'english' format'textbook'gt
  • ltprefacegt ... lt/prefacegt
  • ltchapter1gt ... lt/chapter1gt
  • ...
  • lt/textgt
  • lt/bookgt

8
Making XML more specific
  • The actual element names are unimportant in a
    well-formed XML document, and can be
    replaced as long as the
  • inheritance structure is maintained.
  • ltbookgt
  • ltbananagtPython and XMLlt/bananagt
  • ltkijjipt lgne'english'gt ... lt/kijjiptgt
  • lt/bookgt
  • However to help make sense for the developer and
    user, a
  • schema (a set of naming and structure rules)
    can be
  • imposed on an XML document.

9
Why use a Schema?
  • A schema defines the legal building blocks of an
    XML
  • document through a list of legal elements.
  • A schema defines default and fixed values for
    elements and
  • attributes, as well as the order and number
    of child
  • elements.
  • A schema greatly aids the sender in describing
    information
  • in a way that receiver can understand.
  • A schema can aid in finding errors in a
    (well-formed) XML
  • document.

10
DTD vs XSD
  • A DTD (Document Type Definition) is a type of
    very
  • simple schema. However, a shortcoming of
    DTD is
  • neither extensible or XML. Also, only a
    single DTD can
  • be applied within an XML document.
  • XSD (XML Schema Definition) is a schema language
    that
  • is written in XML. XSD is not only
    extensible, but it
  • inherits all the features of XML. Further,
    through use of
  • namespaces, a single XML document can
    contain many
  • XSD.

11
What are Namespaces?
  • Namespaces in XML are special attributes that are
    used to
  • qualify elements. Namespaces allow
    resolution of
  • element name conflicts caused by reusing the
    same
  • element name in different schema.
  • Namespaces defined in a start tag associate all
    child
  • elements with the element that holds the
    namespace
  • value. The standard is to use a Uniform
    Resource
  • Identifier (URI) to give the namespace a
    unique name,
  • however no information is looked up at the
    URI.

12
An XML file with Namespaces
  • lt?xml version'1.0'?gt
  • ltdtdata_transformation
  • xmlnsxsi'http//www.w3.org/2001/XMLSchema-i
    nstance'
  • xsischemaLocation'http//arcs.caltech.edu/
    jonny/ dtr.xsd'
  • xmlnsdt'http//arcs.caltech.edu/datatrans'gt
  • ltdtdescriptiongt
  • ltdtauthorgtMike McKerns after Jonny
    Linlt/dtauthorgt
  • ltdtcommentsgtcalculate Bose
    factorlt/dtcommentsgt
  • ltdtdescriptiongt
  • ltdtinput dtname'Energy' dttype'Array'/gt
  • ltdtinput dtname'Intensity'
    dttype'Array'/gt
  • ltdtoutput dtname'Energy'
    dttype'Array'/gt
  • ltdtoutput dtname'Phonon DOS'
    dttype'Array'/gt
  • lt/dtdata_transformationgt

13
Processing an XML file
  • Processing XML is broken into two parts a parser
    and an
  • application.
  • To check the structure and format of an XML
    document
  • with a schema, the XML must first be read
    into a parser.
  • If an XML document complies with the rules of the
    schema
  • (validated), then it is parsed into a form
    that is able to be
  • processed by the application.

14
What does a parser actually do?
  • A parser is responsible for reading raw bytes of
    data that
  • make up the serialized XML document,
    reacting to
  • markup specific characters ('lt', '', ...),
    and creating a
  • representation for the elements and
    attributes that
  • compose the conceptual XML document.
  • 609811111110762...
    ltbookgt...
  • Parsers typically output data into an event-based
  • representation or a tree-based
    representation.

15
Event-based parsing
  • lt?xml version'1.0'?gtlt!-- A basic XML file
    --gtltbookgtlttitlegtP
  • ython and XMLlt/titlegtltauthorgtChristopher A.
    Joneslt/authorgt
  • ltauthorgtFred L. Drake, Jr.lt/authorgtltpublisher
    email'corporat
  • e_at_oreilly.com'gtOaposReillylt/publishergtlttext
    language'en

16
Tree-based parsing

17
SAX vs DOM
  • Simple API for XML (SAX) is event-based, while
    the
  • Document Object Model (DOM) is tree-based.
  • SAX is more simple to learn implement, requires
    far less
  • memory resources, and is more resistant to
    format
  • change than DOM.
  • SAX has more difficulty searching XML and forming
    user
  • understandable code. Further, SAX cannot
    modify an
  • XML document, while DOM easily adds/deletes
    nodes.
  • SAX should be used for simple translations and
    filters,
  • while DOM should be used for searching and
    interactive
  • or complex translations.

18
Transform XML to...
  • Stylesheets hold translation templates for
    transforming the
  • markup in a document into another markup
    language or
  • dialect of the same language.
  • Stylesheets were intended to format the document
    to allow
  • display of information in a browser.
  • CSS is a simple stylesheet for enabling HTML to
    be
  • displayed in a browser. XSL is the
    stylesheet for XML.
  • However to view XML in a browser, XSL is
    typically
  • used to transform the XML to XHTML to which
    then a
  • CSS can be applied.

19
XSL... more than just a stylesheet
  • XSL (eXtensible Style Language) is composed of a
  • formatting application and XSLT (a
    transformation
  • application).
  • The formatting portion of XSL is basically a
    translation-
  • specific XML schema language, while XSLT is
  • composed of parsers and serializes built
    around a
  • processing engine.
  • Since XSLT is extensible, it can be formed to
    transform the
  • XML to any language that the XSL stylesheet
    can hold a
  • template for (XML, HTML, XHTML, LaTeX,
    PDF,...).

20
XSLT
  • XSLT uses statements like ltxsltemplate
    match'phonon'gt
  • to define parts of the source document that
    match one or
  • more of the predefined templates.
  • When a match is found, XSLT will transform the
    matching
  • part of the source document into the result
    document.
  • Element names in XSL are very similar to
    protected names
  • in a standard programming language
    (xslfor-each,
  • xslvalue-of, ...)
  • The parts of the source document that do not
    match the
  • templates are passed unmodified to the
    result document.

21
The Big Picture (so far)

22
Working outside your computer
  • Even though you can very happily 'do everything'
    on your
  • own pc, consider the convenience of having
    the ability
  • to access datasets and applications that are
    external to
  • your computer and integrate them with your
    favorite
  • local application...
  • Then, once you have a RPC client and access to a
    server,
  • you have a whole set of new tools available
    to you
  • without ever having to insufficient memory
    to run them,
  • or to fight through the trouble of
    installing, or finding
  • disk space for them on your pc.

23
What are Remote Procedure Calls?
  • A procedure call is the name of a procedure, its
    parameters,
  • and the result it returns a remote
    procedure call (RPC)
  • is a call made to a remote machine.
  • An RPC is a communication protocol that allows
    cross-
  • platform distributed computing. RPC's
    typically use
  • HTTP as the transport and XML as the
    encoding.
  • Especially when called from Python, a very few
    lines of
  • code can activate a RPC and return a result
    from a
  • calculation done on a remote computer.

24
XMLRPC
  • XMLRPC is a simple and effective means to request
    and
  • receive information. Many of the element
    names and
  • much of the structure is fixed.
  • XMLRPC can only use structs (an anonymous set of
    name-
  • value pairs) and arrays (an anonymous
    grouping of
  • elements with no limits on type mixing).
  • The simplicity of XMLRPC is both a strength and a
  • weakness. It has difficulty when passing an
    object as an
  • argument to a function, specifying what
    portion of a
  • receiving application the message is
    intended for, ...

25
SOAP
  • SOAP is written in XML, and thus is extensible.
    SOAP
  • makes extensive use of namespacing and
    attribute
  • specification tags. SOAP is relatively
    complex
  • and somewhat unstable and documentation is
    scarce.
  • Even though SOAP is only a submission at W3C, it
    is
  • currently used by MS as the core for the
    .NET
  • framework and is being used by IBM as the
    transport
  • protocol for the Grid.
  • SOAP is more secure than XMLRPC, by implementing
  • greater controls on what and how the message
    is sent
  • and received, including message specific
    processing
  • control, the ability to specify the
    recipient, ...

26
Some simple XMLRPC code
  • POST /RPC2 HTTP/1.0
  • Host betty.userland.com
  • Content-Type text/xml
  • Content-length 181
  • lt?xml version'1.0'?gt
  • lt!-- Simple XMLRPC code --gt
  • ltmethodCallgt
  • ltmethodNamegtexamples.getStateNamelt/methodName
    gt
  • ltparamsgt
  • ltparamgtltvaluegtltidgt41lt/idgtlt/valuegtlt/param
    gt
  • lt/paramsgt
  • ltmethodCallgt

27
XMLRPC from Python
  • gtgtgt import xmlrpclib
  • gtgtgt server_url 'http//betty.userland.com/RPC2'
  • gtgtgt server xmlrpc.Server(server_url)
  • gtgtgt server.examples.getStateName(41)
  • 'South Dakota'

28
The Bigger Picture

29
Python XML Resources
  • xmlrpclib
  • http//www.pythonware.com/products/xmlrpc
  • PyXML
  • http//pyxml.sourceforge.net
  • 4Suite
  • http//4suite.org
  • SOAPy
  • http//soapy.sourceforge.net
  • Python XML by Jones Drake (O'Reilly)
  • Python Cookbook by Martelli Ascher (O'Reilly)

30
the DANSE Client and Server
  • PART II

31
Distributed Computing Services
  • The server can provide access to the best
    combination of
  • hardware and software
  • Most experimental data and analysis codes reside
    on the
  • servers, so little bandwidth is needed
  • Computing resources can be changed without
    affecting the
  • user
  • Computation can be local or non-local
  • Clean separation of GUI from analysis codes
  • One web portal for all neutron instruments (?)

32
Two Key Concepts
  • Components
  • Pre-compiled Python objects called and
  • re-arranged by the Python Interpreter.
  • Data Streams
  • Standard communication protocol between
  • components. Standard streams can connect
  • components located anywhere

33
Data Analysis Execution
  • User hits Run
  • Client interprets wiring diagram as XMLRPC
    commands
  • Server receives commands,arranges Python script,
    and data processing commences.

34
core of the DANSE Server
  • DTServer.py top level server
  • DataTransformations.py manages requests for
  • information executes DTs
  • RPC2.py called by Apache for XMLRPC hands off
    to
  • appropriate modules, writes result (XML) to
    http
  • util_RPC2.py convert URI to URL, make RPC call
  • UserDir.py manages changes to user files
  • UserHash.py manages session hash, security,
    encryption
  • CustomerProfile.py manages user profiles

35
conceptual DANSE Server

36
core of the DANSE Client
  • ViPEr visual programming environment (GUI)
  • loginGUI.py login dialogue box
  • Cobra.py top level load show libraries
  • RPC/Remote.py manages login remote library
    calls
  • Local/Library.py manages local library calls
  • Network abstract components for networking
  • RPC/DataTransformation.py executes local DT's
  • requests remote DT's

37
conceptual DANSE Client

38
Adding/Modifying a DataTransformation
  • One of the first questions commonly asked about
    any new
  • software is
  • How can I use it to do MY research?
  • Well, (currently) if you can provide a pure
    python program
  • (or a program wrapped in python) that
    transforms data in
  • the way you desire, then it can be easily
    added to the
  • DANSE architecture.

39
Demo
  • Now, let's add a new DataTransformation...

40
library.xml
  • lt?xml version"1.0"?gt
  • ltLibrary name"Default" owner"commune"
  • xmlnsxlink"http//www.w3.org/1999/xlink"gt
  • ltShelf name"ARCS" owner"commune"
  • xlinktype"simple"
  • xlinkhref"file//kittel.caltech.edu/m
    mckerns/RPC2/RPC2.py
  • ?UserDir.getFilecommune/libraries
    /ARCS.xml" /gt
  • lt/Librarygt

41
Shelf.xml (ARCS.xml)
  • lt?xml version"1.0"?gt
  • ltShelf name"ARCS Data" owner"commune"gt
  • ltDataTransformation name"ARCS.ReduceNi"
  • owner"commune" /gt
  • ltDataTransformation name"ARCS.Bose_Factor"
  • owner"commune" /gt
  • ltDataTransformation name"ARCS.Born_von_Karma
    n"
  • owner"commune" /gt
  • lt/Shelfgt

42
DataTF (Bose_Factor)
  • lt?xml version"1.0"?gt
  • ltdtdata_transformation
  • ...
  • dtname"ARCS.Bose_Factor"
  • dtowner"commune"
  • dttype"basic_python"
  • dtaddress"file//kittel.caltech.edu/mmcker
    ns/RPC2/RPC2.py
  • ?UserDir.getFilecommune/ARCS/Bose_Fact
    or.py"gt
  • ltdtinput dtname"Energy" dttype"Array"/gt
  • ltdtinput dtname"Intensity"
    dttype"Array"/gt
  • ltdtoutput dtname"Energy"
    dttype"Array"/gt
  • ltdtoutput dtname"Phonon DOS"
    dttype"Array"/gt
  • lt/dtdata_transformationgt

43
DataTF.py (Bose_Factor.py)
  • import math
  • def doTransformation(inputs,outputs)
  • energyinputs0.value
  • intensityinputs1.value
  • outputs0.value
  • outputs1.value
  • for i in range(len(energy))
  • if(energyi gt 0.0)
  • outputs0.value.append(energyi)
  • factor 1-math.exp(-energyi/25.3)
  • outputs1.value.append(intensityie
    nergyifactor)
  • return 1

44
adding to DTServer vs Local
  • If the new DataTransform is being added to DANSE
  • locally, then that's it!
  • However, if the DataTransform is to be added to
    the
  • DANSE Server, then there is one more step
  • add the new DataTransform to the DTServer
    database by
  • using xforms.py
  • python xforms.py -i Bose_Factor.xml
Write a Comment
User Comments (0)
About PowerShow.com