Title: Pr
1 Moving NN Triggers to Level-1 at LHC Rates
Jean-Christophe Prévotet
Laboratoire des Instruments et Systèmes dIle de
France
Triggering Problem in HEP
Adopted neural solutions
Specifications for Level 1 Triggering
Proposed architecture
Hardware Implementation
Results
Conclusion
2 Triggering problem in High Energy Physics
Detector
Reject
Level 1 Trigger 1µs
Dedicated Hardware Implementation
Y0 Background Y1 Physics
Reject
Level 2 Trigger 20µs
Reject
Level 3 Trigger
Conventional Microprocessors
Incoming data from sub-detectors
Reject
Level 4 Trigger
Reject
Offline event reconstruction
3Hardware Adopted Solutions
Current solutions
Level 1 Trigger
Latency of 500ns gt No digital circuits possible
OR Straightforward Circuits made of RAMs lack
of precision, small networks
Level 2 Trigger
Latency of 10µs gt Possible use of digital
circuits Exple CNAPS in the H1 experiment gt 8µs
to execute a 64x64x1 net DSPs
Future solutions
Technology trend enables to transpose L2
complexity of neural computations into L1
4Level 1 Trigger Scheme
500ns
Neural processing FPGAs
Demultiplex unit
Multiplex unit
Output data To Level2 (every 25ns)
Analog signals from the calorimeter
Preprocessor
Digitization Pre-Sums,
Main control module
Timing Specifications of the ATLAS experiment at
LHC
Data arrive each BC (25ns) and processed in a
time multiplexed way
5Specifications
Electrons, tau, hadrons, jets
4
64
..
..
128
Execution time 500 ns
with data arriving every BC25ns
Weights coded in 16 bits States coded in 8 bits
6Neural processor Architecture
Control unit
Matrix of nm Processing Elements (PEs)
Control unit
PE
PE
PE
PE
ACC
TanH
I/O module
PE
PE
PE
PE
TanH are stored in LUT
ACC
TanH
1 matrix row computes a neuron
PE
PE
PE
PE
The result is back-propagated To calculate the
output layer
ACC
TanH
PE
PE
PE
PE
TanH
ACC
256 PEs for a 128x64x4 network
I/O module
7PE architecture
Data in
Data out
Accumulator
Multiplier
Input data
8
X
16
Weights mem
Addr gen
Control Module
cmd bus
8Row Accumulator
Input bus (data coming from other rows)
Register bank
Din
29
32
8
Adder
Trunc
8
8
Output bus (data going to other rows)
Registers
Multiplexers / Demultiplexers
Truncation unit
9Hardware Implementation in a FPGA
What is a FPGA
I/O Ports
Block Rams
DLL
Programmable Logic Blocks
Programmable connections
10Results
Timing
Time in clock cycles for the whole neural net
around 60 cycles.
Target Clock frequency
Processing time
gt VIRTEX2 compatible
8.33ns
120MHz
What is done today
Description of the whole design in VHDL
Functionnal simulations of the different modules
(Multipliers, acc, control, PE..)
Individual Modules synthesis (translated into
logic blocks)
What has to be done
Global synthesis and implementation on the FPGA
Timing and resources optimization
11 Summary
Implementation of digital neural network feasible
in real time Transposition of level2 concepts
into Level 1
Proposed architecture
Advantages
Flexibility
Implementation in a FPGA gt easily re-configurable
Coding precision easily changeable
Weight Precision, activation functions, etc.
Processing time doesnt really depend on the
number of neurons in the hidden layer
1 neuron 4 added PEs
Disadvantages
Resources consuming gt many FPGAs required
Fewer performances than custom circuits