Title: Detektoraufbau
1ALICE TRD Tracklet Processor TRAP1 ? TRAP2
- MCM
- TRAP chip
- ADCs
- Digital filters
- Preprocessor
- MIMD Processor
- Readout
- Configuration
- Power management
- Marcus Gutfleisch
- Falk Lesser
- Rolf Schneider
- Robin Gareus
- Jan de Cuveland
- Christian Reichling
- Venelin Angelov
Kirchhoff Institute for Physics University of
Heidelberg Chair of Computer Science / Computer
Engineering Prof. Dr. Volker Lindenstruth URL
www.ti.uni-hd.de
2Multi Chip Module / TRAP
External PASA
Internal ADCs (Kaiserslautern)
Internal ADCs (Kaiserslautern)
External ADCs (ALTRO)
Digital Frontend and Tracklet Preprocessor
MIMD Processor 4 CPUs, Global Register
File, Interrupt controllers, Counter/Timers, Arbit
er for the Global I/O Bus
Instruction Memery
Master State Machine
External Pretrigger
Serial Interface slave
Serial Interface
Global I/O-Bus
Quad ported Data Memory
Network Interface
Readout Network
3Tracklet Preprocessor
Digital FILter
64 timebins deep
DFIL
Event
Buffer
ADC
Non- Lin
Tail- canc
Cross- talk
Offs
Gain
Q
DFIL
Condition Check
ADC
hit
Event
Buffer
Position
Para
-
CPU0
Calc
meter
COG
Q
DFIL
Condition Check
ADC
LUT
Calc
hit
)
hits
Event
Buffer
Position
Para
-
CPU1
Calc
meter
COG
LUT
Calc
Unit (max. 4
181
channels
FIT Register File and tracklet selection
Position
Para
-
CPU2
Calc
meter
COG
Select
LUT
Calc
Q
DFIL
Condition Check
ADC
Hit
hit
Position
Para
-
Event
Buffer
CPU3
Calc
meter
COG
LUT
Calc
Q
DFIL
Condition Check
ADC
hit
FIT register file is for the CPUs a readonly
register file
Event
Buffer
DFIL
Event
Buffer
ADC
4Pedestal correction (offset)
This filter stage corrects for some offset in
PASA and ADC and adds a programmable offset to
the corrected value in order to proceed using
unsigned integers without underflow.
The correction can be done automatically, with 4
different time-constants, or manually.
5Tail cancellation filter
This filter stage corrects for the gas ion tail.
It is a IIR filter. The tail can be
approximated by a sum of two exponentials. The
parameters are selected with the requirements
output pulse has nearly gaussian shape and no
undershoot.
6Crosstalk filter
The crosstalk behaves in a good approximation
like the first derivative of the signal on the
neighbor channelis. It is caused by the
capacitive coupling of the cathode pads to their
neighbors. The crosstalk cancellation is
performed by a two dimensional filter mask.
At a sampling rate of 10 MHz, the crosstalk
amplitude can be suppressed approximately by a
factor of two.
7Tracklet Fit Concept
During Drift Time (in Preprocessor) N
hitcount yi position ?xi timebin sum ?yi
position sum ? xi yi timebinposition sum ?yi2
position² sum, ?xi2 , ?Qi sum charge
After Drift Time (in MIMD) a intercept b
slope ?2 track quality merge track segments in
padrow
For all groups of 3 neighbor channels will be
checked whether the hit conditions are met or
not. For up to 4 hit candidates/timebin the
position and the other terms are calculated and
stored in the Fit Register File.
8Tracklet Preprocessor
Digital FILter
64 timebins deep
DFIL
Event
Buffer
ADC
Non- Lin
Tail- canc
Cross- talk
Offs
Gain
Q
DFIL
Condition Check
ADC
hit
Event
Buffer
Position
Para
-
CPU0
Calc
meter
COG
Q
DFIL
Condition Check
ADC
LUT
Calc
hit
)
hits
Event
Buffer
Position
Para
-
CPU1
Calc
meter
COG
LUT
Calc
Unit (max. 4
FIT Register File
181
channels
Position
Para
-
CPU2
Calc
meter
COG
Select
LUT
Calc
Q
DFIL
Condition Check
ADC
Hit
hit
Position
Para
-
Event
Buffer
CPU3
Calc
meter
COG
LUT
Calc
Q
DFIL
Condition Check
ADC
hit
FIT register file is for the CPUs a readonly
register file
Event
Buffer
DFIL
Event
Buffer
ADC
9The MIMD Architecture
- Four RISC CPU's
- Coupled by Registers (GRF) and Quad ported data
Memory - Register coupling to the Preprocessor
- Global bus for Periphery
- Local busses for Communication, Event Buffer read
and direct ADC read - I-MEM 4 single ported SRAMs
- Serial Interface for Configuration
- IRQ Controller for each CPU
- Counter/Timer/PsRG for each CPU and one on the
global bus - Low power design, CPU clocks gated individually
FIT
Local Bus
Local Bus
Bus
Const.
CPU 0
CPU 1
32
32
Interrupt
102
D-MEM
32
32
102
32
32
Evt. Buffer
102
4x10
102
Network Interface
I-MEM
102
24
24
10
Cnt/Timer
102
102
32
32
102
GRF
32
32
32
32
CPU 2
CPU 3
4
Config.
Local Bus
Local Bus
4
10Local and Global IO
LPA, LPI, SPA, SPI (load/store private Absolute
Immediate)
Local Bus
Local Bus
Local Bus
Local Bus
32
32
32
32
16
16
16
16
OR
OR
OR
OR
32
32
32
32
CPU 0
CPU 1
CPU 2
CPU 3
32
32
32
32
B
Req.
Req.
B
Req.
Req.
B
B
Arbiter
LGA, LGI, LGC, SGA, SGI, SGC (load/store global
Absolute Immediate addr inCrement )
Address
16
32
OR
OR
32
Data
Periphery Device 0
Periphery Device 1
Configuration
- Load/Store Instructions
- No tri-state, the output data are ORed, the
non-selected devices respond with 0 - 16 Bits for addressing
- Asynchronously read, synchronously write (Local
Bus) - Synchronously read/write on the Global Bus
(Arbiter) - Read has priority over write, Configuration has
priority over CPU0,1,2,3
11Local and Global IO
- Load/Store Instructions
- No tri-state, the output data are ORed, the
non-selected devices respond with 0 - Synchronously read/write on the Global Bus
(Arbiter), the access time can be programmed. - Read has priority over write, the configuration
unit has priority over CPU 0, 1, 2, 3
req we busf
CPU 0, 1, 2, 3
r/w addr
w data
r data
r/w addr
r data
req
w data
Arbiter
Configuration unit
32
32
16
Global Bus Devices
- Local bus uses the same r/w address and w_data
signals. The read data register on the global bus
is a read only device in the local bus. The muxes
on the local bus is not shown.
12The CPU Architecture
IMEM
DMEM
GRF
- CPU
- Harvard Architecture
- Two pipeline stages
- RISC Architecture
- 32 Bit Data
- Register architecture
- Fast ALU
- 32x32 Multiplication
- 64/32 Radix-4 Divider
- Maskable Interrupts
CPU0
Decoder
Const
FIT
PRF
pipeline register
PC
Sel. Operand
write back
Interrupt
ALU
clks
rst
local I/O busses
I/O bus arbiter
power control
external interrupts
global I/O bus
- ALU
- 16 Instructions for Integer data
- All Operations in one cycle
- Radix-4 Division in 18 cycles (6432)
- Processing of signed (twos complement) and
unsigned data - Compact Architecture (0,27 mm2 cell area)
- Memory
- DMEM/IMEM
- Quad Port Memory 4 read and 4 write ports
- Full Custom Design
- Hamming for error correction
13Readout tree specifications
- number of tracklets to be read out
- 64.224 MCMs
- with max. 4 tracklets / MCM
- BUT max. 40 tracklets / chamber is adequate
(simulation) - time for read out
- 200 ns latency (for first tracklet)
- 400 ns data transfer
- mechanical electrical restrictions
- chip pin count
- transfer frequency
- max. length for LVDS transfer on PCB
- modular layout of readout boards
- number speed cost of detector links
- RESULTING READOUT STRUCTURE
- 8 Bit data ports DDR (strobeparityspare 11
LVDS Bit) - tree structure with a max. tree width 4
- ? max. tree depth 4 (84 ns latency incl. TM,
w/o opt. link del.) - ? 2 links / chamber (1080 optical detector links,
each 2.4GBd)
14NI Datapath
10
10
10
10
Processor
Network Interface
port0
port1
port2
port3
16
16
16
16
CPU 1
16
DMEM
I/O 0
A0
16
local I/O 0
D0i
D0o
CPU 2
16
IMEM
I/O 1
A1
16
local I/O 1
D1i
D1o
CPU 3
16
GRF
I/O 2
A2
global bus arbiter
16
local I/O 2
D2i
D2o
CPU 4
16
I/O 3
A3
16
local I/O 3
D3i
D3o
I/O G
Ag
global I/O
Dgi
config
16
16
16
16
Dgo
- Network Interface
- local global I/O interfaces
- input port with data resync. and DDR decoding
- input fifos (zero latency)
- port mux to define readout order
- output port with DDR encoding and programmable
delay unit
16
port4
10
15Readout Board Chamber Layout
- only one type of readout board
- two types of readout scheme
z
phi
chamber (16 padrows)
MCM 0
MCM 4
MCM 8
MCM 12
RB 0
RB 2
RB 4
RB 6
trans mitter
11 diff
MCM 1
MCM 5
MCM 9
MCM 13
11 diff
RB 1
RB 3
RB 5
RB 7
readout board (RB)
trans mitter
MCM 2
MCM 6
MCM 10
MCM 14
chamber (12 padrows)
RB 0
RB 2
RB 4
trans mitter
MCM 3
MCM 7
MCM 11
MCM 15
11 diff
RB 1
RB 3
RB 5
trans mitter
16Slow Control Serial Network (SCSN)
Serial Mode
Bridged Mode
17Frame structure
With stuff bits we have roughly about 100 bits to
transmit. In TRAP1 we use 1/5 of the system clock
to send/receive serially the data. Our effective
bus bandwidth is
18Power management
- Power consumption is very important for this
application. Therefore we need some small unit,
which has permanently clock, to control the clock
gating and LVDS enable signals. These are the
global and acquisition state machines (GSM, ASM) - The transitions of the state machines are
triggered by pretrigger signal or writing to some
special command register - GSM and ASM are coded with redundant bits
(Hamming) with maximal safety (no illegal states) - The other modules with permanent clock are only
Slow Control Serial Network (SCSN) and Reset
generator - The Reset generator is responsible for the reset
of the chip, it has higher priority than SCSN and
GSM
19Fast clock distribution
Redundancy for the clock?
The two clock inputs can be asynchronously. Still
not fixed.
20Global state machine
In Low Power mode all internal clocks are off,
the ADCs are disabled. The only active devices
are the serial network and global state
machine. In Test mode all clocks and ADCs are
enabled. In Config mode the global bus and memory
clocks will be switched on for a short time.
21Acquisition state machine
In warm up state the ADCs and filter clock are
switched on, but the pretrigger is disabled. If
the acquisition is enabled, then first a clear
will be executed. In wait_pre if a pretrigger
comes the preprocessor clock is enabled and the
data are stored in the sum memories. At the end
of the drift time, the preprocessor is disabled,
the MIMD processor is enabled. The tracklet data
are stored by the CPUs in the NI registers and
will be send through the readout tree to the GTU.
In zero suppression the CPUs write the data to
the output FIFOs of the NI, then only NI remains
active. There are 3 programmable time windows,
where a confirmation signal is expected, this
generates internally accept/reject signals, the
reject signals bring the machine to the clear
state.
22The test board with the spider-MCM
23The test board with the MCM in socket
24Summary of the test results
- What has been tested
- Serial Configuration, most of the configuration
registers in all blocks, connected to the Global
Bus - Clock gating, Global State Machine
- The large LUTs (non-linearity, position), Event
Buffers - CPUs with Register Files and Interrupt
controllers - DFF Instruction and Quad Port Data Memories, Quad
Port Full Custom Instruction and Data Memories - Local Buses
- parallel Network outputs with the delay units
- Acquisition in the event buffers, digital
filters - ADCs
- PLL, Clock and Pretrigger distribution outputs
- parallel Network inputs
- What is still not tested Real acquisition mode
25Summary of the test results (cont)
- There are many functional bugs, but none of them
makes the chip unusable, but for the final
version they are not acceptable at all - simulate !!!
- The CPUs operate at maximal clocks about 70-80
MHz, instead of 120MHz. Some parts operate
reliable up to 120MHz (SCSN, GSM). - timing analysis !!!
- The ADC parameters (noise and some bad effects at
large amplitudes) are not influenced by switching
the CPUs on
26Tracklet Preproc. Position Calculation
Quantization about 30 µm Resolution about 120
µm
position
maximum correction about 0.1 pad widths