Title: ROOT%20I/O%20TTree%20Queries
1ROOT I/OTTree Queries
- CHEP 2004
- René Brun / CERN Philippe Canal / Fermilab Fons
Rademakers / CERN
http//root.cern.ch
2Contents
- Status
- Overview
- List of other presentations
- ROOT I/O
- Large Files
- Double32_t
- Foreign objects
- New interfaces
- XML back-end
- Historical recap.
- Containers Support
- Mainly for STL containers
- Splitting
- TTree Query
- TTree
- Auto load of TRefed branches
- UserInfo
- CloneTree
- TTree Query
- Calling free standing functions
- Rebinning
- Support for Indexed Friends
- Arbitrary C in queries (TTreeMakeProxy)
- Support for SQL back-end
- Future Plans
3Presentations and Posters
- 328 The Next Generation Root File Serverby
Andrew ANUSHEVSKY (Theatersaal Sept 27,1630 -
1650) - 412 XML I/O in ROOTby Sergey LINEV (Brunig 1
2 Sept 29, 1520 - 1540) - 430 Global Distributed Parallel Analysis using
PROOF and AliEnby Fons RADEMAKERS (Theatersaal
Sept 29, 1520 - 1540) - 104 Authentication/Security services in the
ROOT frameworkby Gerardo GANIS (Brunig 3 Sept
29, 1650 - 1710) - 169 Guidelines for Developing a Good GUIby
Ilka ANTCHEVA (Brunig 12 Sept 30,1400 -
1420) - 287 Super scaling PROOF to very large
clustersby Maarten BALLINTIJN (Ballsaal Sept
30,1500 - 1520)
- Poster on September 29
- 128 XTNetFile, a fault tolerant extension of
ROOT TNetFile client - Poster on September 30
- 298 The ROOT 3-D graphics and geometry classes
- 170 The User Interface Design in ROOT
- 303 The ROOT Linear Algebra Package
- 98 RDBC ROOT DataBase Connectivity
- 99 Interactive Data Analysis with Carrot (ROOT
Apache Module)
4Status
- ROOT 4.01/02 just released
- Production Release of 4.01 planned for December
2004 - Many improvements since CHEP2003
- This talks
- I/O and TTree queries
- For other developments, see the other ROOT
related talks
- XROOTD
- A new generation ROOT file server
- Authentication Overhaul
- Object Property Editor
- e.g.. TH1Editor, TH2Editor, TGraphEditor
- New classes for GUI
- GUI builder
- Brand new GL viewer
- Math and Stats
- New Matrix package Implementation
- New functions in TMath (Now a namespace)
- Quadratic programming
5TFile and TDirectory
- Very Large Files
- Support on all platforms for 64 bits integers via
the portable typedefs Long64_t and ULong64_t. - Long long on Unix, _int64 with VC
- Support for File larger than 2Gb added in ROOT
4.00 - File smaller than 2Gb still readable by older
version of ROOT - Support for TTree with more than 231 entries
- Double32_t
- Same as Double_t in memory
- Same as Float_t on disk
- Support automatic schema evolution to and from
float and double - Warning too many read/write cycle could result
in some loss of precision
6XML output format
- Update to the I/O classes to allow the
customization of the backend. - Implemented for XML
- Will be used for SQL support.
- XML files allow the interchange of data with
applications unable to read ROOT file directly - Example
- Refer to Sergey Linevs presentation for more
details
ltXmlKey name"c1" cycle"1"gt ltObject
class"TCanvas"gt ltVersion v"5"/gt ltTPad
version"8"gt ltTVirtualPad version"2"gt
ltTObject fUniqueID"0" fBits"3000008"/gt lt
TAttLine version"1"gt ltfLineColor
v"1"/gt ltfLineStyle v"1"/gt ltfLineWidth
v"1"/gt lt/TAttLinegt ltTAttFill
version"1"gt ltfFillColor v"19"/gt ltfFillStyle
v"1001"/gt lt/TAttFillgt
TCanvas c h.Draw() c.SaveAs("c.xml")
c.SaveAs("c.root")
7ROOT I/O History
- Version 2.25 and older
- Only hand coded and generated streamer function,
Schema evolution done by hand - I/O requires ClassDef, ClassImp and CINT
Dictionary - Version 2.26
- Automatic schema evolution
- Use TStreamerInfo (with info from dictionary) to
drive a general I/O routine. - Version 3.03/05
- Lift need for ClassDef and ClassImp for classes
not inheriting from TObject - Any non TObject class can be saved inside a TTree
or as part of a TObject-class - Version 4.00/00
- Automatic versioning of Foreign classes
- Version 4.00/08
- Non TObject classes can be saved directly in
TDirectory
2000
2001
2002
2004
8Foreign Objects
.... Bytecount (4 bytes) 0 (2 bytes) checksum (4
bytes) ObjectN Bytecount 0 checksum
objectN1 ....
TBuffer
- To save non instrumented classes
- Need just the data dictionary
- Default versioning provided by a Checksum based
on the type and name of the persistent data
members - Checksum stored as an additional 4 bytes
- ClassDef advantages
- The IsA function generated by ClassDef speeds up
considerably the access to the TClass for a
given object. - The version number (2 bytes maximum) consumes
less space on disk than the 0checksum - New interface to store and retrieve object with
Type Safety
ptrclass ptr directory-gtWriteObject(ptr,"na
me") ptrclass ptr directory-gtGetObject("name",p
tr)
0 if object absent or of wrong type
9TClonesArray
- Optimization of the number of calls to new and
deletes - Ability to split the collection of objects in a
TTree - Improve compression and run-time
- Ability to save object member-wise
- Store the same data member of all the elements of
the collections consecutively - Improve compression (buffer data more
homogeneous) - Improve run-time (avoid n-1 tests of the data
type) - Ability to use in TTreeDraw as a collection
- Ability to read back without the original
compiled code
10Old STL Container Support
- For versions older than 4.00/00
- Collection always stored object wise
- Nesting of STL collections was extremely limited
- No splitting was possible
- STL containers stored using a generated function
- One generated function per actual data member.
- Compiled version of these functions required for
writing and also for reading
void R__User_fList1(TBuffer R__b,
void R__p, int) if
(R__b.IsReading()) vectorltTHitgt fList1
(vectorltTHitgt )R__p int
R__n fList1.clear() R__b gtgt R__n
R__stl.reserve(R__n) for (int R__i 0
R__i lt R__n R__i)
THit R__t
R__t.Streamer(R__b)
fList1.push_back(R__t) else
writing
11New Container Support
- New Abstract Interface
- TVirtualCollectionProxy
- Can be implemented for almost any collections
- Allows
- Splitting (for collection of homogenous objects)
- Use in Tree Query (with automatic looping)
- Will allow
- Member-wise streaming (as opposed to Object wise
streaming) - Also
- Arbitrary nesting of STL containers
- Reading of STL containers without original code
(Emulated mode) - Note as of 4.00/08 only stdvector has Proxies.
- Early Prototype and fundamental Concepts by
Victor Perevoztchikov
12STL Support
- Each STL container instance now has an associated
TClass object - Several co-existing streaming implementations
- Generated Streamer
- For object-wise streaming
- Fully respect custom allocators and comparators
- Easier to implement and similar run-time cost as
a templated solutions - Templated Proxy (e.g.. TVectorProxy)
- For splitting and member-wise streaming Fully
respect custom allocators and comparators - Emulation Proxy (e.g.. TEmulatedVectorProxy)
- For reading without a compiled version
- Allow easy sharing of ALL ROOT files that have no
custom streamers.
- Why not rely only on the Emulation Proxy
- Implementation difficulties
- An emulation proxy acting on live STL object
requires a few tricks and assumptions - memory footprint of the STL container object is
(usually?) independent from the template
parameter - List proxy would need a series of list of
increasing fixed size content (aka.
listltchar1024gt, listltchar2048gt) - Does not respect allocators and comparator
- Templated proxy can be faster and more memory
efficient. - The emulation layer might actually be implemented
using alternative collections (if we assume it
does not have to deal with real objects)
13Container I/O Implementation
- Any container can be summarized by the sequence
of its contents addresses - Use TVirtualCollectionAt via TVirtualCollection
operator - Pros
- I/O Code completely independent of the collection
- Reduced code duplication in TStreamerInfo
- No run-time cost for TClonesArray
- Cons
- Implementation for containers with no random
access iterator needs to cache the iterator. - Member-wise implementation
- Member-wise/object-wise choice will be encoded in
the version number of the STL collections - API will be provided to select member-wise or
object-wise for data member that are STL
collections
14TTree
- TRef autoload
- Added (optional) support for the auto-loading of
branches referenced by a TRef object. - Generate one table of references to branches per
entry - TRefGetObject uses this table to find and load
the branch containing the referenced object - To enable it call
- .
tree-gtBranchRef()
class Event TClonesArray fTracks TRef
fLastTrack branchtree.GetBranch("fLastTrack"
) branch-gtGetEntry(7) tlast
event-gtGetLastTrack()
- TTreeGetUserInfo
- Used to store with the TTree any user defined
object(s) that is not depending on the entry
number - Examples
- Luminosity, Calibrations etc.
- .
tree.GetUserInfo()-gtAdd(myobject)
15Copying a TTree
- Very flexible simple copying tools allowing cut
on - Number of entries
- Number of branches
- Selection of entries base on a Formula
- Useable for both TTree and TChain
- Important simplification of the interface
- Removed the requirement of explicitly setting the
addresses for ALL the branches.
3 Branches
2 Branches
tree-gtSetBranchStatus(br,kFALSE) newtreetree-
gtCloneTree()
3 Branches
tree-gtCopyTree(fTracks.fPxlt1.2)
16TTree Queries
- Implemented Boolean expression optimization (
and ) - Rebinning now possible from the TTree data (via
new histogram editor) - Improved TTreeScan output (customization and
array display) - Call to external functions
- Free standing function or class static member
function - Compiled or interpreted with Numerical arguments
and Numerical return type - Example
-
tree-gtDraw("TMathProb(var,5)")
17TTree Queries
- Support for Collections
- TTreeFormula now treats any collection class
which has a TVirtualCollection in the exact same
way as a TClonesArray - Automatically loops over the elements
- Can access a specific element
- Synchronized with other collections and arrays in
the formulas
- Connecting several TTrees
- TChain adds more entries
- TTree Friends adds (virtually) more branches
- Prior to ROOT 4.00/08 correlation between Friends
made only by entry number - This is a problem if Trees have semantically a
different sequence of entries - Can now connect the Friend using an Index
- For example Run Number/Event Number
- Use abstract interface TVirtualIndex
- Concrete implementation TTreeIndex
Main Tree
User Tree
Indexed Main Tree
User Tree
1
1
2
2
1
1
2
2
1
2
1
1
1
2
1
1
2
1
2
1
2
2
2
2
18The MakeClass Revolution
- Current Fast Analysis Frameworks
- TTreeDraw
- Fast histogramming
- Load branch on Demand
- Only simple expressions
- MakeCode
- C-Style
- Obsolete
- MakeClass
- Flat representation of the tree
- Difficulties with variable size arrays
- Branch loaded explicitly
- MakeSelector
- Proof Ready
- Flat representation of the tree
- Difficulties with variable size arrays
- Branch loaded explicitly
- Elegant Replacement for MakeClass/MakeSelector
- Currently named MakeProxy
- Creates a C context where branch names
(including periods) can be used as variable - On demand loading of branches
- Respect/recreate the original class structure
- Array bound check
- Use the users shared libraries (when available)
19MakeProxy Examples
- TTreeDraw of a script
- Implemented using MakeProxy
- Enables complex looping
- Allow call to any C functions or member
functions! - Still provide on-demand loading of the branches
- And allow any arbitrary C
tree-gtDraw(hsimple.C)
Double_t hsimple() int last
fTracks.GetLast() for(int i1 i lt last-1
i) htemp-gtFill(fTracks.fPti-fTrack
s.fPti-1) return fTracks.fPtlast
fTracks.fPtlast-1
20File types Access in 4.01/xx
user
Local File X.xml
TTreeSQL
TFile TKey/TTree TStreamerInfo
TSQLServer TSQLRow TSQLResult
http
rootd/xrootd
Oracle
Local File X.root
Castor
Dcache
MySQL
PgSQL
RFIO
Chirp
SapDb
21New RDBMS interface Goals
- Access any RDBMS tables from TTreeDraw
- Create a Tree in split mode ? creating a RDBMS
table and filling it. - The table can be processed by SQL directly.
- The interface uses the normal I/O engine,
including support for Automatic Schema Evolution.
22New RDBMS Interface
- Current prototype
- Simple TTree (branch with leaf list)
- Implemented via TSQLxxx for reading and writing
- Implemented via RDBC for reading
- See http//carrot.cern.ch/onuchin/RDBC/
- Should be released in December 2004.
- Should be expanded to support branch of objects
- Need to implement a way to store and retrieve
TStreamerInfo(s) and TProcessID(s) in the
database - Will probably use SQL binary blob to store
non-split objects.
23RDBMS Examples
Connect to an existing db
TTreeSQL tree(const char db,const char
uid,) tree.Print(), Browse, Scan,
etc tree.Draw(var1var2,varx lt0)
Create the data base on server
TTree query style converted to SQL
TTreeSQL tree(mysql//localhost/test,nobody,n
ew) Event event new Event tree.Branch(top,
Event,event) tree.Fill() tree.AutoSave()
Columns created using the normal split
algorithm. Blobs created below split.
A TSQLRow is filled and sent to the server
24Future Plans for I/O and TTree
- Implement member-wise storing for stdvector
(late 2004) - Implement TVirtualCollectionProxy for each of the
STL containers (late 2004, early 2005) - Add support for auto loading of TRef branches
across trees - TChain, TTree Friends and Indexing
- Add support for befriending TChain objects
using an Indexed relation - TTree Queries
- Allow following (transparently) TRef and
TRefArray
25Summary
- TFile improvement
- Large files and trees, Double32_t, XML output
format. - Support for non-instrumented classes
- Enhancement in I/O and Tree Query for collection
- Split Collections
- Fast histograming of (potentially) any
collections - Lift restrictions on STL I/O
- Nested containers
- Reading without compiled code
- TTree
- Remove stringent requirements on CloneTree
- Add support for auto loading of referenced
objects - Support for RDBMS databases back-end coming soon.
- TTree Queries
- Can call any functions taking numerical arguments
- Can use arbitrary C and still use the branch
names as variables - TTree Friend linked by Index