Title: CMS Data Analysis Current Status and Future Strategy
1CMS Data AnalysisCurrent Status and Future
Strategy
- On behalf of CMS Collaboration
- Lassi A. Tuura
- Northeastern University, Boston
2Overview
- The Context CMS Analysis Today
- Data Analysis Environment Architecture
- Overview
- COBRA
- IGUANA
- GRID/Production
- Tomorrow and Beyond
- Leveraging current frameworks in the
Grid-enriched analysis environment - Clarens client-server prototype
- Other prototype activities
3Context
Challenges Complexity Geographic
Dispersion Direct Access To Data Migration from
Reconstruction to Trigger
Environments Real-Time Event Filter, Online
Monitoring Pre-emptive Simulation,
Reconstruction, Analysis Interactive Statistical
Analysis
4Current CMS Production
5Complexity of Production 2002
6Interactive Analysis
Lizard Qt plotter
7Behind the Scenes Frameworks
Data Browser
Generic analysis Tools
GRID
Distributed Data Store Computing Infrastructure
Analysis job wizards
Objy tools
ORCA
COBRA
OSCAR
FAMOS
Detector/Event Display
CMS tools
Federation wizards
- Consistent User Interface
Coherent basic tools and mechanisms
8Frameworks Disected
Specific Frameworks
Grid-Uploadable
Physics modules
Calibration Objects
Generic Application Framework
Configuration Objects
Event Objects
Adapters and Extensions
ODBMS
GEANT 3 / 4
CLHEP
PAW Replacement
C Standard Library Extension Toolkits
Basic Services
9Framework Design Basis
- Several frameworks provide the environment
together - Open No central framework with all functionality
- Frameworks are designed to be extensible
- and to collaborate with other software
- Coherent User sees final smooth interface
- Achieved by integrating the frameworks together
- but the user does not do this work him/herself
! - Design applied at both framework and object
design level - Successfully applied in many parts of CMS
software - Applications, persistency sub-frameworks
visualisation - No loss of usability, functionality or
performance - Has made it easy to integrate directly with many
existing tools - This is nothing novel it is part of the
standard risk-mitigation strategy of any modern
industrial solution
10Frameworks COBRA
Data Browser
Generic analysis Tools
GRID
Distributed Data Store Computing Infrastructure
Analysis job wizards
Objy tools
ORCA
COBRA
OSCAR
FAMOS
Detector/Event Display
CMS tools
Federation wizards
- Consistent User Interface
Coherent basic tools and mechanisms
11COBRA Main Components
- Push- and pull-mode executionand any mixture
- Reconstruction-on-demand is a key concept in
COBRA - Detector-centric reconstructionpush data from
event - Reconstruction-unit-centric reconstructionpull/cr
eate data as needed - Event data and related structures
- Basic support for commonly needed objects (hits,
digis, containers, ) - Application environments
- Basic application frameworks, various
semi-specialised applications - Lots of error-handling and recovery code
(automatic recovery after crash, ) - Meta data a key component
- Data chunking, system and user collections, data
streams, file management, job concepts,
configuration and setup records, redirected
navigation after reprocessing,
12COBRA Main Strengths
- Algorithms in plug-ins
- Publish-yourself-plug-insself-describing data
producers - Strong meta-data facilities
- Reconstruction-on-demand matches data product
concept very well - Grid virtual data products concept really just an
extension - Convenient mapping of data products to chunks
files, containers, - Scatter / gather decompose jobs, gather data
- One logical job can be chopped into many physical
processes, we still know it is logically the same
job no matter which process it is running in - Adapts automatically to many environments without
special configuration interactive, batch, farm,
stand-alone, trigger, - Through appropriate use of enabling techniques
(transactions, locking, refs) - No data post-processing required
- Well-matched to production tools (IMPALA)
13(No Transcript)
14Queries
Refs Navigation
Cache Management
15Collections
Configurations (Data Sets)
Object Naming
Run Resume Crash Recovery
16File Size Control
System Management
Farm Management
17Frameworks IGUANA
Data Browser
Generic analysis Tools
GRID
Distributed Data Store Computing Infrastructure
Analysis job wizards
Objy tools
ORCA
COBRA
OSCAR
FAMOS
Detector/Event Display
CMS tools
Federation wizards
- Consistent User Interface
Coherent basic tools and mechanisms
18User Interface and Visualisation
- IGUANA a generic toolkit for user interfaces and
visualisation - Builds on existing high-quality libraries (Qt,
OpenInventor, Anaphe, ) - Used to implement specific visualisation
applications in other projects - Main technical focus provide a platform that
makes it easy to integrate GUIs as a coherent
whole, to provide application services and to
visualise any application object - Many categories / layers GUI gadgets support,
application environment, data visualisers, data
representation methods, control panels, - Designed to integrate with and into other
applications - Virtually everything is in plug-ins (can still be
statically linked)
Object Factory
Object Factory
Plug-InCache
Plug-In
Attached
Plug-InCache
ComponentDatabase
Plug-In
Plug-InCache
Plug-In
Unattached
Plug-In
Plug-In
Object Factory
19Illustration 3D Visualisation
20IGUANA GUI Integration
Integration
Action
Visualise Results, Modify Objects, Further
Interaction
21Tomorrow and Beyond
- Leverage the current frameworks on the grid
- Many native COBRA concepts match well with grid
- (Virtual) data products reconstruction-on-demand
- Recording and matching configuration and setup
information - Production interfaces catalogs, redirection, MSS
hooks - Scatter/gather job decomposition, production
environment - COBRA-based applications can be encapsulated for
distributed analysis - IGUANA already separates application objects,
model and viewer - Many possibilities for introducing distributed
links - IGUANACOBRA provides a platform for a coherent,
well-integrated interface no matter where the
code runs and data comes and goes - Both have loads of knobs and hooks for
integration - Aiming at adapting the existing software where
possible - Adapt and work within CMS software (COBRA, ORCA,
) andexisting analysis tools (ROOT, Lizard,
)dont replace them
22Prototypes Clarens Web Portals
- Grid-enabling the working environment for
physicists' data analysis - Communication with clients via the commodity
XML-RPC protocol ? Implementation independence - Server implemented in C access to the CMS OO
analysis toolkit - Server provides a remote API to Grid tools
- The Virtual Data Toolkit Object collection
access - Data movement between tier centres using GSI-FTP
- CMS analysis software (ORCA/COBRA)
- Security services provided by the Grid (GSI)
- No Globus needed on client side, only certificate
Service
Clarens
Web Server
http/https
RPC
Client
23Prototypes Clarens Web Portals
Tier 0/1/2
Tier 1/2
Production data flow
TAGs/AODs data flow
Tier 3/4/5
Physics Query flow
User
24Other Prototypes
- Tag database optimisation
- Fast sample selection is crucial
- Various models already tried
- Experimenting with RDBMS
- MOP distributed job submission system
- Allows submission of CMS production jobs from a
central location, run on remote locations, and
return results - Job Specification IMPALA
- Replication GDMP
- Globus GRAM
- Job Scheduling Condor-G and local systems