Title: Microarchitecture Simulators An Overview
1Microarchitecture Simulators An Overview
ECE 7970 Advanced Computer Architecture
2Implementation
Concept
Performance Evaluation
Viability
3Performance Evaluation
Create a Model
Appropriate Workloads
Measure Performance
Evaluate Performance
4Model
Hardware Model
Software Model
Too Complex and Time Consuming
Not very Precise but very Efficient
5Hardware Simulator
- A tool that reproduces the behavior of a computer
device. - - Todd Austin
- Inputs System Specification
- Outputs System Metrics
6What does a Simulator do?
- Simulates the hardware at the required level of
abstraction - Runs appropriate benchmarks
- Performance metrics
- Viability
7What has it got to offer?
- Flexibility
- to modify and analyze the impact of various
architectural parameters and components. - Statistical Detail
- more detailed statistics collection than real
hardware.
8Architectural Simulators
Functional
Performance
Trace Driven
Execution Driven
Instruction Schedulers
Cycle Timers
Interpreters
Direct Execution
Source SimpleScalar Hacker's Guide and
SimpleScalar 4.0 Tutorial by Todd Austin
9Purpose of a Simulator
- Evaluate any microarchitectural modification or
innovation - Maybe since memory is always a bottleneck
evaluate new cache hierarchy - Perhaps power needs to be optimized and hence
assess the power consumption - Or evaluate a new multiprocessor architecture
10Classification of Simulators
- Simulators described in this presentation are
classified as follows - Single Processor Performance Simulators
- Full System Simulators
- Single processor power consumption Simulators
- Multi Processor Performance Simulators
- Modular Simulators
11Single Processor Performance Simulators
- Model a single processing unit
- All onboard components like pipelining, ALU,
cache, etc. are modeled - Level of detail depends on the simulator
- Sometimes various levels are implemented in the
same toolkit - Eg. SimpleScalar, M-Sim.
12SimpleScalar
- Most popular simulation tool around.
- Originally released in 1995 current stable
version is 3.0d. - Toolkit consists of simulators and also
supporting tools like a gcc compiler, linker and
assembler. - More to come later
13M-Sim
- An extension to the SimpleScalar3.0d toolset.
- Accurate models of the pipeline structures,
including explicit register renaming. - Support for the concurrent execution of multiple
threads Simultaneous Multithreading (SMT)
model.
14M-Sim
- Includes the Wattch framework to estimate power
consumption as applied to SimpleScalar (accuracy
not guaranteed by authors). - Support of only Alpha AXP binaries as opposed to
the support of PISA too in the original
SimpleScalar toolkit. - Lack of support for simultaneous execution of
dependent threads.
15Full System Simulators
- Single processor simulators like SimpleScalar do
not model the entire system or run an operating
system. - Absence of an operating system can introduce
errors of more than 100. - Full system simulators model entire system in
sufficient depth to run the OS. - E.g. Simics, SimOS, Mambo from IBM and
Sim-OS-Alpha from DEC.
16Simics
- Attempts to strike a balance between accuracy and
performance. - Sufficiently abstract to achieve tolerable
performance levels. - Sufficient functional accuracy to run commercial
workloads. - Sufficient timing accuracy to interface to
detailed hardware models. - Available from Virtutech AB (www.virtutech.com).
17Simics
- Fastest
- In-order processor
- Single cycle execution latencies
- Compromise or optimize?
- Scaled back version of the out-of-order model
- No user visible API
- number of instruction execution options reduced.
- Slowest
- Detailed out-of-order model
- Allows the user to specify a detailed timing
model and manner of instruction execution.
18Simics
- Very detailed includes device models with
enough accuracy to run real firmware and device
drivers. - Simics/x86 can run Windows XP right off the
installation disks. - Hence the highly accurate results.
19SimOS
- Designed at Stanford for the efficient and
accurate study of both uniprocessor and
multiprocessor computer systems. - Capable of running commercial unmodified
operating systems. - Can change levels of detail on-the-fly.
20SimOS
- Embra (10x) slowdown
- Mipsy (100x) slowdown)
- MXS (1000x) slowdown
Source http//simos.stanford.edu/
21Power Consumption Simulators
- Importance of power
- Power/performance tradeoffs be made more visible
to not only circuit designers but also to
architects and compiler writers. - Power analysis tools like Wattch and SimPower
used to estimate consumption.
22Wattch
- Traditional power analysis tools - power
estimates available only after layout and floor
planning are complete. - Wattch, developed at Princeton, is based on the
sim-out-order, ver 3.0. - Faster than lower level power estimate tools.
23Wattch
- Uses following formula
- C - capacitance along all paths
- Vdd - supply voltage
- a - internal switching activity
- f - clock frequency
24The Others
- SimplePower
- Created at Penn State
- Includes a transition-sensitive, cycle-accurate
datapath energy model - Integrates itself with the SimpleScalar toolkit
- SimWattch
- Also from Penn State
- Integrated Simics and Wattch.
25Multiprocessor Simulators
- Shared memory multiprocessor simulators prior to
1994 - Focused on the memory system and ignored the
processor model assumed a simple in- order
processor. - Advent of complex out-of-order processors
demanded a different simulator. - Thus RSIM Rice Simulator was released in 1997.
26RSIM
- Rsimoriginally an acronym for Rice simulator for
ILP multiprocessors - Features include
- out-of-order issue
- register renaming
- branch prediction
- nonblocking caches
- multiple cache-coherence protocols
- user configurable parameters such as instruction
window size, cache sizes and latencies, and flit
size and delay.
27The M5
- Encompasses system-level architecture as well as
processor microarchitecture. - Implemented in python and C and built with the
Alpha ISA in mind. - Distinguishing feature Object Oriented Tool
- Can now model a full DEC Tsunami system and run
unmodified Linux 2.4/2.6 and FreeBSD.
28The M5
- Currently it provides three interchangeable CPU
objects - a simple, functional, single CPI CPU
- a detailed model of an out-of-order SMT capable
CPU - a random memory-system tester.
- Specifically suited for cache architecture
simulation. - Other multiprocessor simulators include Talisman,
Wisconsin Wind Tunnel II, Augmint, MINT and GEMS.
29Modular Simulators
- Simulators are typically written in sequential
programming languages. - But, processors and their associated systems are
highly parallel. - This additional serial mapping results in
difficulty and consumes time to implement and
debug a simulator.
Solution? Use of modular Simulators
30Liberty Simulation Environment (LSE)
- Product of Princeton
- Maps each hardware component to a single software
function. - Instantiating these components and making
appropriate connections. - Enables hierarchically building more complex
processor components.
31Asim
- Developed by the designers of the VAX and Alpha
processors. - Allows model writers to faithfully represent the
detailed timing of complex modern machines. - Other modular simulators include EXPRESSION, LISA
and Microlib.
32Benchmark suites
- Simulators provide a platform for architects to
model their processor. - Benchmark programs are the actual ones that asses
the performance of a model. - This presentation classifies benchmarks into
- General Purpose Benchmark Suites
- Embedded Benchmark Suites
- Miscellaneous Benchmark Suites
33General Purpose Benchmark Suites
- The Standard Performance Evaluation Corporation
(SPEC) suite of benchmarks (www.spec.org) are the
most widely used. - Founded in 1988, financed by its member
organizations which include all leading computer
and software manufacturers. - Written in common languages like C or FORTRAN.
- Latest release SPEC2006 suite of benchmarks are a
package of 12 application based and 17 CPU
intensive benchmarks.
34Embedded Benchmark Suites
- Reason for classification?
- MiBench from the University of Michigan one of
the most popular ones. - Six types of benchmarks included in the MiBench
toolkit - Automotive and industrial control (six)
- Consumer devices (eight)
- Office automation (five)
- Network (two)
- Security (seven)
- Telecommunications (seven)
35Miscellaneous Benchmark Suites
- Benchmark suites which have been designed for
specific purposes/applications. - Examples
- Transaction Processing Performance Council
benchmarks intended to measure the rate at which
a system can process business related
transactions over a network. - Other specific benchmarks for scientific
research, bio informatics, office productivity,
etc. exist.
36SimpleScalar Revisited
- Developed as part of the Multiscalar project in
1992 under Gurinder Sohi at University of
Wisconsin - Was released to the public with the assistance of
Doug Burger in 1995. - Simulator of choice for computer architects
more than one-third of papers use the toolkit.
37SimpleScalar
- Simplifies implementing hardware models capable
of simulating complete applications. - Dynamic characterization both hardware and
software parameters. - Uses execution-driven simulation.
- I/O emulation module provides simulated programs
with access to external input and output
facilities.
38Interpreters
- Includes instruction interpreters for the ARM,
x86, PPC, and Alpha instruction sets. - Written in a target definition language provides
a comprehensive mechanism for describing how
instructions modify registers and memory state. - Preprocessor uses these machine definitions to
synthesize the interpreters, dependence
analyzers, and microcode generators that
SimpleScalar models need.
39Tools available in SimpleScalar
Architectural Simulators
Functional
Performance
Trace Driven
Execution Driven
Instruction Schedulers
Cycle Timers
Interpreters
Direct Execution
Source SimpleScalar Hacker's Guide and
SimpleScalar 4.0 Tutorial by Todd Austin
40Baseline Simulator Models
41Sim-fast
- Functional simulation
- Optimized for speed
- Assumes no cache
- Does not support DLite!
- Does not allow command line arguments
Adapted from SimpleScalar Introduction for
toolset release v2.0 by Praveen Bhojwani
42Sim-safe
- Functional simulation
- Checks for correct alignment and access
permissions for each memory reference - Optimized for speed
- Assumes no cache
- Supports DLite!
- Does not allow command line arguments
Adapted from SimpleScalar Introduction for
toolset release v2.0 by Praveen Bhojwani
43Sim-cache
- Cache simulation
- Ideal for fast simulation of caches
- If the effect of cache performance on execution
time is not necessary - Accepts command line arguments for
- Level 1 2 instruction and data caches
- TLB configuration (data and instruction)
- Flush and compress
- Ideal for performing high-level cache studies
that dont take access time of caches into account
Adapted from SimpleScalar Introduction for
toolset release v2.0 by Praveen Bhojwani
44Sim-cheetah
- Cache simulation
- Originated at UMich
- Simulates fully associative cache efficiently
- Simulates a sometime-optimal replacement policy
(MIN) - MIN or OPT use future knowledge to select a
replacement - Accepts command line arguments
- Max size of cache
- Replacement policy LRU, OPT
- Fully associative, set associative, or direct
mapped cache - Ideal for performing high-level cache studies
that dont take access times of caches into
account
Adapted from SimpleScalar Introduction for
toolset release v2.0 by Praveen Bhojwani
45Sim-bpred
- Simulate different branch prediction mechanisms
- Generate prediction hit and miss rate reports
- Does not simulate the effect of branch prediction
on total execution time
Adapted from SimpleScalar Introduction for
toolset release v2.0 by Praveen Bhojwani
46Sim-profile
- Program profiler
- Generates detailed profiles, by symbol and by
address - Keeps track of and reports
- Dynamic instruction counts
- Instruction class counts
- Branch class counts
- Usage of address modes
Adapted from SimpleScalar Introduction for
toolset release v2.0 by Praveen Bhojwani
47Sim-outorder
- Most complicated and detailed simulator
- Supports out-of-order issue and execution
- Provides reports
- Branch prediction
- Cache
- External memory
- Various configurations
Adapted from SimpleScalar Introduction for
toolset release v2.0 by Praveen Bhojwani
48DLite! Debugger
- Lightweight symbolic debugger
- Supported by all simulators except sim-fast
49Parting Words
- Released with a noble cause in mind.
- Toolkit is well documented.
- Includes baseline models.
- Ideal toolkit for microarchitecture simulation.
50Conclusions
- Simulation is often the only practical way to
test architectural ideas and assess system
performance. - Offer flexibility to modify and analyze the
impact of various architectural parameters and
components. - Various types of simulators available along with
the benchmarks used to evaluate the performance
of a design. - Discussed SimpleScalar.
51Future Work
- Focus on various simulation methodologies and
statistical approaches to the processing of the
results. - Model a new architectural concept in one of the
simulators and analyze and compare its
performance.
52References
- J. Yi and D. Lilja, "Simulation of Computer
Architectures Simulations, Benchmarks,
Methodologies, and Recommendations," IEEE
Transactions on Computers, Vol. 55, No. 3, March
2006. - T. Austin, E. Larson, and D. Ernst,
SimpleScalar An Infrastructure for Computer
System Modeling, Computer, vol. 35, no. 2, pp.
59-67, Feb. 2002. - Joseph Sharkey, "M-Sim A Flexible,
Multi-threaded Architectural Simulation
Environment" Technical Report CS-TR-05-DP01,
Department of Computer Science, State University
of New York at Binghamton, Binghamton, NY,
October, 2005. - P. Magnusson, M. Christensson, J. Eskilson, D.
Forsgren, G. Halberg, J. Hogberg, F. Larsson, A.
Moestedt, and B. Werner, Simics A Full System
Simulation Platform, Computer, vol. 35, no. 2,
pp. 50-58, Feb. 2002. - http//simos.stanford.edu/
- D. Brooks, V. Tiwari, and M. Martonosi, Wattch
A Framework for Architectural-Level Power
Analysis and Optimizations, Proc. Intl Symp.
Computer Architecture, 2000. - W. Ye, N. Vijaykrishnan, M. Kandemir, and M. J.
Irwin, The design and use of SimplePower a
cycle-accurate energy estimation tool, In Proc.
Design. Automation Conference (DAC), Los Angeles,
June 5-9, 2000. - C. Hughes, V. Pai, P. Ranganathan, and S. Adve,
Rsim Simulating Shared-Memory Multiprocessors
with ILP Processors, Computer, vol. 35, no. 2,
pp. 40-49, Feb. 2002. - S. Herrod. Tango lite A multiprocessor
simulation environment. Technical report,
Computer Systems Laboratory, Stanford University,
1993. - http//m5.eecs.umich.edu, 2006.
- M. Vachharajani, N. Vachharajani, D. Penry, J.
Blome, and D. August, Microarchitectural
Exploration with Liberty, Proc. Intl Symp.
Microarchitecture, 2002. - J. Emer, P. Ahuja, E. Borch, A. Klauser, C. Luk,
S. Manne, S. Mukherjee, H. Patil, S. Wallace, N.
Binkert, R. Espasa, and T. Juan, Asim A
Performance Model Framework, Computer, vol. 35,
no. 2, pp. 68-76, Feb. 2002.
53Thank you!!!