Title: Fast TraceDriven HWSW Cosimulation Using Virtual Synchronization Technique
1Fast Trace-Driven HW/SW Co-simulation Using
Virtual Synchronization Technique
- Dohyung Kim, Youngmin Yi and Soonhoi Ha
- Seoul National University, Seoul, Korea
2Content
- Motivation
- Related Work
- Virtual Synchronization Technique
- The Proposed Approach
- Experiment Results
- Conclusion
3Hardware/Software Co-simulation
- HW/SW cosimulation evaluates performance of
system variations, which helps to make design
decisions
From algorithm
System Design
To implementation
4Synchronization in Co-simulation
- All simulators are synchronized to one global
time at every cycle - Functions can receive data in legal sequence
- Resource conflicts in architecture can be
resolved properly
func_A
func_A
simulator 1
one global time
func_B
simulator 2
5Performance Bottleneck in Cosimulation
- As simulator speed increases, synchronization
overhead becomes a major performance bottleneck
of co-simulation - Synchronization overheads from different
implementations - Remote TCP/IP 200 us
- Local TCP/IP 30 us (18 us using POSIX thread)
- Function call 0.5 us Linux 2.4, Pentium
1.8GHz dual, 100M LAN
100
45K (90)
303K (90)
18M (90)
synchronization overhead
33K (50)
5K (50)
2M (90)
555 (10)
222K (10)
3.7K (10)
0
simulator speed
0
1K
10K
100K
10
100
1M
10M
6Formulation of Cosimulation Time
- Co-simulation time is composed of
- T0 Simulator time to execute simulators
- T1 Data exchange time to deliver data between
simulators - T2 Context change time to change context
between simulators - Synchronization time (T1T2)
- data exchange time
context change time
one clock cycle n
simulation time
n
n1
Simulator 1
n
Simulator 2
n
Simulator 3
T0
T1
T2
7Formulation of Cosimulation Time
- Co-simulation time is composed of
- T0 Simulator time to execute simulators
- T1 Data exchange time to deliver data between
simulators - T2 Context change time to change context
between simulators - Synchronization time (T1T2)
- data exchange time
context change time - Simulation time for each simulator
- (simulator time synchronization time)
total simulated cycles
8Related Work
- Optimized approach Sung1998
- Utilize the next event time from simulators
- It is hard to acquire the exact next event time
- Optimistic approach Yoo1998
- Advance clocks without synchronization and
support roll-back - Performance enhancement depends on frequency of
data exchanges - SeamlessCVE Bailey2002
- Synchronized only when simulators exchanges
shared data - It does not handle resource conflicts and still
slow - Transaction level modeling Grotker2002
- Change accuracy level to reduce simulator time
- Synchronization overhead is still problem
9Virtual Synchronization Kim2002
- Predict the next synchronization point based on
- a computation model which defines algorithm
behavior precisely - At that points, take relative times (t1,t2,t3)
from simulators - Then, transform those times to the global times
- t0, t0t1, t0t1t2, t0t1t2t3
simulator 1
local time
simulator 2
local time
simulator 1
simulator 2
global time
t0
10Limitation in Virtual Synchronization
- Assumed that there is no resource conflict on the
communication architecture - Otherwise, simulators should be synchronized at
every cycle to calculate delays caused by the
resource conflicts - Ex) func_A and func_B are executed concurrently
on different processors and access a memory
through a shared bus
simulator 1
func_A
func_B
func_A
shared resource (Ex. memory)
proc 1
proc 2
func_B
simulator 2
MEM
delay from resource conflict
11The Proposed Approach
- Propose a new technique to reduce synchronization
overhead - Predict the next data exchange time based on
computation model - Reconstruct the resource conflicts later using
trace-drivenco-simulation - Over 99.95 synchronization points are removed
- Synchronization overhead becomes under 3.86.4
- Simulation time (simulator time total
simulated cycles) -
(synchronization time synchronization count) - Limitation assumption for a computation model
12First Part Trace Generation
- Like virtual synchronization, execute simulators
assuming no resource conflicts BUT store all
accesses to architecture components (resources)
during the execution - Stored traces have relative times between traces
to apply virtual synchronization at the second
part
simulation engine
3. output data resource access traces
1. input data
function_A
simulator 0 (proc0)
function_C
simulator 2 (proc2)
function_B
simulator 1 (proc1)
2. execute a simulator
13Second Part Trace-driven Co-simulator
- Transform the relative times in the resource
access traces to the global times by considering
conflicts on architecture resources - Operating system model resolves conflicts on a
processor - Communication architecture model resolves
conflicts on a memory - Request new traces if it consumes all traces or
can not determine the next trace to evaluate
4. resource access traces
5. request resource access traces
simulation engine
14Example Scenario
simulation engine
function_A (proc0)
function_C (proc2)
simulator 1
function_B (proc1)
simulator 2
simulator 3
simulation time
processor 0
proc0
proc1
proc2
processor 1
processor 2
MEM
INTR
simulated cycles
Trace-driven Co-simulator
15Experiment Environment
- Example DIVX Player (H.263 decoder MP3
decoder) - Machine Linux 2.4 kernel, Dual Xeon 2.6 GHz
CPUs, 1GB RAM - Simulator ADS 1.2 from ARM, ModelSim from
MentoGraphics - PeaCE framework automatically generates different
co-simulation environments for different
architectures
IDCT
DQ
H.263 Decoder
Header decoder
MC
Display
AVI Reader
MP3 Decoder
Simplified view of DIVX Player
16Candidate Architectures
17Co-simulation Result
18Performance Comparison with Other Approaches
- Target architecture Hardware IDCT ARM
Processor
result comes from a slower machine
19Conclusion and Future Work
- Overcome the synchronization problem in HW/SW
co-simulation combining two different approaches - Virtual synchronization predicts when functions
can receive data in legal sequence based on a
computation model, - Trace-driven co-simulation guarantees that
resource conflicts in architecture can be
resolved properly - Future work
- Implement distributed execution of multiple
simulators using reduced synchronization overhead - Focus on modeling of real systems and compare
accuracy
- Please visit PeaCE demonstration at University
Booth (34PM)