Metadata Services on the GRID - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Metadata Services on the GRID

Description:

DB Server Data streamed using DB cursors. Server Client Response sent in chunks ... Open cursor on DB. Return initial chunk of data and session token ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 21
Provided by: ppephysi
Category:

less

Transcript and Presenter's Notes

Title: Metadata Services on the GRID


1
Metadata Services on the GRID
  • Nuno Santos
  • ACAT05
  • May 25th, 2005

2
Contents
  • Metadata on the GRID
  • ARDA-gLite Metadata Interface
  • The ARDA Implementation
  • Performance study SOAP vs TCP Streaming

3
Metadata on the GRID
  • Metadata is data about data
  • Metadata on the GRID
  • Mainly information about files
  • Other information necessary for running jobs
  • Usually living on DBs
  • Need simple interface for Metadata access
  • Advantages
  • Easier to use by clients - no SQL, only metadata
    concepts
  • Common interface - clients dont have to reinvent
    the wheel
  • Must be integrated in the File Catalogue
  • Also suitable for storing information about other
    resources

4
ARDA-gLite Metadata Interface
  • ARDA proposed an interface for Metadata access on
    the GRID
  • Designed jointly with the gLite/EGEE team
  • Incorporates feedback from GridPP
  • Endorsed by the EGEE standards committee (PTF)
  • Being implemented in gLite File Catalog (FiReMan)
  • Interface concepts
  • Metadata - Key-value pairs
  • Entry - Entities to which metadata is attached
  • Attribute Holds information about an entry
  • Schema A collection of attributes
  • Type The type (int, float, string,)
  • Name/Key The name of the attribute
  • Value - Value of an entry's attribute
  • Entries are associated with schemas
  • Think of schemas as tables, attributes as
    columns, entries as rows

5
Interface Operations
  • Schema management
  • void createSchema(String schemaName, Attribute
    attributes)
  • void dropSchema(String schemaName)
  • void removeSchemaAttributes(String schemaName,
    String attributeNames)
  • void addSchemaAttributes(String schemaName,
    Attribute attributes)
  • Entry management
  • void createEntry(MDEntry entries, String
    schemas)
  • void removeEntry(String query)
  • int setAttributes(String query, Attribute
    attributes)
  • Attribute listAttributes(String entry)

6
Interface Operations
  • Searching and retrieving entries
  • MDResult query(MDQuery query)
  • MDResult nextQuery(String token, MDQuery query)
  • void endQuery(String token)
  • Datatypes
  • ? Allows either stateful or stateless server
    implementations

7
ARDA Prototype
  • Validate proposed interface
  • Architecture
  • Metadata organized in a hierarchy
  • Schemas can contain sub-schemas
  • Can inherit attributes
  • Analogy to file system
  • Schema ? Directory Entry ? File
  • Stability with large responses
  • Send large responses in chunks
  • Otherwise preparing large responses could crash
    server
  • Stateful server
  • DB ? Server Data streamed using DB cursors
  • Server ? Client Response sent in chunks

8
ARDA Implementation
  • Backends
  • Currently Oracle, PostgreSQL, SQLite
  • Two frontends
  • TCP Streaming
  • Chosen for performance
  • SOAP
  • Formal requirement of EGEE
  • Compare SOAP with TCP Streaming
  • Also implemented as standalone Python library
  • Data stored on filesystem

9
TCP Streaming Frontend
  • Text based protocol (like SMTP, POP3,)
  • Data streamed to client in single connection
  • Implementation
  • Server C, multiprocess
  • Clients C, Java, Python, Perl, Ruby

10
SOAP Frontend
  • Most operations in interface implemented as
    simple SOAP calls
  • query() - based on iterators
  • Initial request create session
  • Open cursor on DB
  • Return initial chunk of data and session token
  • Subsequent requests
  • Client calls nextQuery() using session token
  • Termination session closed when
  • End of data
  • Client calls endQuery()
  • Client timeout
  • Implementations
  • Server gSOAP (C).
  • Clients Tested WSDL with gSOAP, ZSI (Python),
    AXIS (Java)

11
Current Uses of the ARDA prototype
  • Evaluated by LHCb-bookkeeping
  • Migrated bookkeeping metadata to ARDA prototype
  • 20M entries, 15 GB
  • Feedback valuable in improving interface and
    fixing bugs
  • Interface found to be complete
  • ARDA prototype showing good scalability
  • Ganga (LHCb, ATLAS)
  • User analysis job management system
  • Stores job status on ARDA prototype
  • Highly dynamic metadata

12
Performance Study
  • SOAP increasingly used as standard protocol for
    GRID computing
  • Promising web services standard -
    Interoperability
  • Some potential weaknesses
  • XML encoding increases message size (4x to 10x
    typical)
  • XML processing is compute and memory intensive
  • How significant are these weaknesses? What is the
    cost of using SOAP?
  • ARDA metadata implementation ideal for comparing
    SOAP with a traditional RCP protocol

13
Benchmark Description
  • Protocols
  • TCP-S TCP Streaming
  • SOAP Clients with gSoap (C), Axis (Java) and
    ZSI (Python)
  • Operations
  • ping A null RPC
  • add Adds an entry
  • get Gets all attributes of an entry
  • get (bulk) Gets all attributes of several
    entries in a single operation
  • Entries
  • 60 attributes (ints, floats and strings)
  • 700 bytes on average
  • HTTP Keepalive/Persistant connections
  • HTTP Keepalive increase HTTP performance. Should
    improve SOAP performance.
  • gSOAP supports Keepalive. Axis and ZSI dont.
  • TCP-S uses persistent TCP connections to compare
    with HTTP Keepalive

14
SOAP Data Overhead
  • Measure size overhead of XML encoding
  • Ping
  • 1000 requests
  • Minimal payload less than 5 bytes per request
  • SOAP overhead around 8 times
  • Get attributes in bulk
  • Retrieve 1000 entries
  • Around 800KB of application data
  • Streaming in TCP
  • Iterators with SOAP 4KB average SOAP packet
    payload
  • With keepalive
  • SOAP overhead around 2.5 times

Total data transferred (in KB)
15
SOAP Toolkits performance
  • Test protocol performance
  • No work done on the backend
  • Switched 100Mbits LAN
  • Language comparison
  • TCP-S with similar performance in all languages
  • SOAP performance varies strongly with toolkit
  • Protocols comparison
  • Keepalive improves performance significantly
  • On Java and Python, SOAP is several times slower
    than TCP-S

16
Single client results (LAN)
  • Compare performance of different operations
  • C clients (gSOAP)
  • When backend must do work, differences between
    gSOAP and TCP-S are small
  • Bulk operations very important for performance
  • getBulk 4x faster than get

17
Single client results (WAN)
  • Client CERN, server Taiwan
  • 300 ms latency
  • Results dominated by latency
  • Execution time at server irrelevant
  • Large performance boost from latency hiding
    techniques
  • keepalive fewer TCP handshakes
  • bulk operations fewer client/server interactions

18
Scalability with Multiple Clients - Pings
  • Measure scalability of protocols
  • Switched 100Mbits LAN
  • TCP-S 3x faster than gSoap (with keepalive)
  • Poor performance without keepalive
  • Around 1.000 ops/sec (both gSOAP and TCP-S)

19
Scalability with Multiple Clients - getAttr
  • Measure scalability with realistic payload
  • Switched 100Mbits LAN
  • All tests with keepalive
  • Smaller difference between gSOAP and TCP-S
  • TCP-S 2x faster (1000 vs 500 entries/sec)
  • Poor performance of non-bulk operations
  • 100 entries/sec

20
Conclusions
  • A common Metadata Interface was developed by ARDA
    and gLite
  • Endorsed by the EGEE standards committee
  • Interface validated by ARDA prototype
  • Prototype in use by LHCb (bookkeeping, Ganga) and
    ATLAS (Ganga)
  • SOAP performance studied using ARDA
    implementation
  • Toolkit performance varies widely
  • Large SOAP overhead (over 100)
Write a Comment
User Comments (0)
About PowerShow.com