Title: The ATLAS Liquid Argon Calorimeters ReadOut Drivers
1The ATLAS Liquid Argon Calorimeters ReadOut
Drivers
- A 600 MHz TMS320C6414 DSPs based design
2The LHC
- LHC is an accelerator ring, where the protons
beams are accelerated to energy of 7 TeV. - The LHC goal will be to have protons from 1 beam
collide with the protons from the other. - 4 experiments.
LHC Large Hadron Collider (27 km diameter)
3The ATLAS experiment
- Goal explore the fundamental nature of matter
and the basic forces that shape our universe. - About the size of a five story building.
- Collaboration of 2000 physicists.
- 150 universities and laboratories in 34 countries.
4The electromagnetic calorimeter
- ATLAS Several sub-detectors
- Electromagnetic calorimeter
- Identifies electrons and photons.
- Measures energy carried by these particles.
- 200 000 cells to be read at 40 MHz.
Electromagnetic calorimeter
5The calorimeter electronic chain
Timing Trigger Control (TTC)
FRONT END ELECTRONICS
BACK END ELECTRONICS
1600 optical links Glink
DETECTOR
800 Optical links Slink
ROD
ROB
ANALOG MEMORY (SCA)
AMPLI
12 Bits ADC
Shaping
FEB
6The ROD modules
- Calculate precise energy and timing of
calorimeter signals from discrete time samples
(?t 25 ns). - Perform monitoring.
- Format data for the following element in the
electronics chain.
7The ROD modules goals
? 200 modules, each receiving data from 1024
calorimeter cells. ? Calculate energy for these
data using optimal filtering weights E ? ai
(Si - PED) ? If E gt threshold, calculate timing
and pulse quality factor (lt 10 cells) E ?
? bi (Si - PED) ?2 ? (Si - PED - E gi)
2 ? Performs histograms of E, ?, ?2, ... ?
During calibration runs, perform signal averaging
to calculate calibration constants for each
channel.
8Requirements
- The ROD module must be able to process an event
in less than 10 µs, including histograms. - Use of commercial programmable processor.
- A natural choice is Digital Signal Processor
- Efficient power calculation for that kind of
algorithm. - High I/O bandwidth.
- Modular design. Basic components should be easily
changed/upgraded. - Low power consumption.
9The ROD a 9U VME board
10The ROD Motherboard
11The Staging Mode
- At the beginning of LHC.
- ROD equipped with half of the PU.
- Level 1 trigger rate lt50 kHz.
- Data from 4 FEB are routed to one PU.
- 1 DSP process 256 channels instead of 128.
12The DSP Processing Unit
13The DSP Processing Unit
FIFO
Output FPGA
DSP
Input FPGA
14PU Software Summary
in
out
ROD
DSP For 128 channels per events E calculation
or E, t, ?2
Input FPGA Parallelized data In DSP format
Input data Serial data in FEB format.
Output data Integer 16 bit E or Integer 16
bit E 32 bit t, ?2 and gain or 32 bit E 32 bit
t, ?2 and gain
Output FPGA TTC data
Histograms
VME Interface
Programmable Part
Fixed part
15The TMS320C6414 a last generation DSP from TI
CPU Core C64x
Central Memory 1MB
DMA Controller
Périphérals
16The DSP code structure
17DSP Software
- Developed with Code Composer Studio.
- Whole code written in C language except
- Physics loops written in linear assembly and then
optimized using CCS. - Code complexity limited
- Good legibility and maintenance
18Example of Linear Assembly
- Calculation of the cell energy E?ai(si-p)
- Let the compiler do all the laborious work of
parallelizing, pipelining and register allocation.
a1s1
a2s2a2s2
a5s5a5s5
?aisi (i2..5)
?aisi (i1..5)
E?aisi-?aip
mpy s1,a1,sa1 dotp2 a23,s23,sa23 dotp2 s45,a45,s
a45 add sa23,sa45,sa25 add sa1,sa25,sa15 sub
sa15,px,e
19DSP software results
- Physics calculation of 128 channels 3.5 ?s.
- Includes all the necessary histograms
- ?, ?2 for a fraction of 10 of high energy
cells. - 30 to 40 of time is due to stall cycles.
- Cycles lost because data are not in the cache.
20The Cache Memory
- When a data or instruction is not in the cache
memory gt 6 stalls cycles until the data is
copied from the central memory to the cache. - For the E calculation 6 data to be read gt 36
wait cycles - The cache memory must be understood to ameliorate
these numbers.
21Which improvements ?
- L1D Mapping
- Take care of which data is loaded, from which
address and in what order. - L1D Pipelining
- Use of consecutive loads
- 1 miss 6 wait cycles
- 2 misses 8 wait cycles
- 4 misses 12 wait cycles
- L1D access optimization
- Samples preloading
- Interleaved histograms
22DSP software results
- Physics calculation of 128 channels 3.5 ?s.
- Includes all the necessary histograms
- ?, ?2 for a fraction of 10 of high energy
cells. - 30 to 40 of time is due to stall cycles.
- Cycles lost because data are not in the cache.
- The complete code takes about 7 ?s (600 MHz DSP).
- Includes the RTX kernel, synchronization and send
tasks, - 30 of margin for further improvements.
23Agenda
- Mid March Motherboard PU assembled
- May 2003 Validation in standalone mode.
- Fall 2003 System test in the experiment
environment. - Spring 2004 production launch.
- Summer 2004 Boards installation at LHC.
24Conclusion the ROD
- Calculate precise energy and timing of the
signals calorimeter. - 1 motherboard and 4 Processing Units.
- 1 PU two 600 MHz TMS320C6414 DSP.
- 30 of margin for future improvements.
- 200 ROD to be produced in 2004.
25Thank You