Microarchitecture Simulators An Overview - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Microarchitecture Simulators An Overview

Description:

Sufficient timing accuracy to interface to detailed hardware models. ... Developed by the designers of the VAX and Alpha processors. ... – PowerPoint PPT presentation

Number of Views:163
Avg rating:3.0/5.0
Slides: 54
Provided by: Phan1
Category:

less

Transcript and Presenter's Notes

Title: Microarchitecture Simulators An Overview


1
Microarchitecture Simulators An Overview
  • Phaneeth Kumar R J

ECE 7970 Advanced Computer Architecture
2
Implementation
Concept
Performance Evaluation
Viability
3
Performance Evaluation
Create a Model
Appropriate Workloads
Measure Performance
Evaluate Performance
4
Model
Hardware Model
Software Model
Too Complex and Time Consuming
Not very Precise but very Efficient
5
Hardware Simulator
  • A tool that reproduces the behavior of a computer
    device.
  • - Todd Austin
  • Inputs System Specification
  • Outputs System Metrics

6
What does a Simulator do?
  • Simulates the hardware at the required level of
    abstraction
  • Runs appropriate benchmarks
  • Performance metrics
  • Viability

7
What has it got to offer?
  • Flexibility
  • to modify and analyze the impact of various
    architectural parameters and components.
  • Statistical Detail
  • more detailed statistics collection than real
    hardware.

8
Architectural Simulators
Functional
Performance
Trace Driven
Execution Driven
Instruction Schedulers
Cycle Timers
Interpreters
Direct Execution
Source SimpleScalar Hacker's Guide and
SimpleScalar 4.0 Tutorial by Todd Austin
9
Purpose of a Simulator
  • Evaluate any microarchitectural modification or
    innovation
  • Maybe since memory is always a bottleneck
    evaluate new cache hierarchy
  • Perhaps power needs to be optimized and hence
    assess the power consumption
  • Or evaluate a new multiprocessor architecture

10
Classification of Simulators
  • Simulators described in this presentation are
    classified as follows
  • Single Processor Performance Simulators
  • Full System Simulators
  • Single processor power consumption Simulators
  • Multi Processor Performance Simulators
  • Modular Simulators

11
Single Processor Performance Simulators
  • Model a single processing unit
  • All onboard components like pipelining, ALU,
    cache, etc. are modeled
  • Level of detail depends on the simulator
  • Sometimes various levels are implemented in the
    same toolkit
  • Eg. SimpleScalar, M-Sim.

12
SimpleScalar
  • Most popular simulation tool around.
  • Originally released in 1995 current stable
    version is 3.0d.
  • Toolkit consists of simulators and also
    supporting tools like a gcc compiler, linker and
    assembler.
  • More to come later

13
M-Sim
  • An extension to the SimpleScalar3.0d toolset.
  • Accurate models of the pipeline structures,
    including explicit register renaming.
  • Support for the concurrent execution of multiple
    threads Simultaneous Multithreading (SMT)
    model.

14
M-Sim
  • Includes the Wattch framework to estimate power
    consumption as applied to SimpleScalar (accuracy
    not guaranteed by authors).
  • Support of only Alpha AXP binaries as opposed to
    the support of PISA too in the original
    SimpleScalar toolkit.
  • Lack of support for simultaneous execution of
    dependent threads.

15
Full System Simulators
  • Single processor simulators like SimpleScalar do
    not model the entire system or run an operating
    system.
  • Absence of an operating system can introduce
    errors of more than 100.
  • Full system simulators model entire system in
    sufficient depth to run the OS.
  • E.g. Simics, SimOS, Mambo from IBM and
    Sim-OS-Alpha from DEC.

16
Simics
  • Attempts to strike a balance between accuracy and
    performance.
  • Sufficiently abstract to achieve tolerable
    performance levels.
  • Sufficient functional accuracy to run commercial
    workloads.
  • Sufficient timing accuracy to interface to
    detailed hardware models.
  • Available from Virtutech AB (www.virtutech.com).

17
Simics
  • Three processor models
  • Fastest
  • In-order processor
  • Single cycle execution latencies
  • Compromise or optimize?
  • Scaled back version of the out-of-order model
  • No user visible API
  • number of instruction execution options reduced.
  • Slowest
  • Detailed out-of-order model
  • Allows the user to specify a detailed timing
    model and manner of instruction execution.

18
Simics
  • Very detailed includes device models with
    enough accuracy to run real firmware and device
    drivers.
  • Simics/x86 can run Windows XP right off the
    installation disks.
  • Hence the highly accurate results.

19
SimOS
  • Designed at Stanford for the efficient and
    accurate study of both uniprocessor and
    multiprocessor computer systems.
  • Capable of running commercial unmodified
    operating systems.
  • Can change levels of detail on-the-fly.

20
SimOS
  • Embra (10x) slowdown
  • Mipsy (100x) slowdown)
  • MXS (1000x) slowdown

Source http//simos.stanford.edu/
21
Power Consumption Simulators
  • Importance of power
  • Power/performance tradeoffs be made more visible
    to not only circuit designers but also to
    architects and compiler writers.
  • Power analysis tools like Wattch and SimPower
    used to estimate consumption.

22
Wattch
  • Traditional power analysis tools - power
    estimates available only after layout and floor
    planning are complete.
  • Wattch, developed at Princeton, is based on the
    sim-out-order, ver 3.0.
  • Faster than lower level power estimate tools.

23
Wattch
  • Uses following formula
  • C - capacitance along all paths
  • Vdd - supply voltage
  • a - internal switching activity
  • f - clock frequency

24
The Others
  • SimplePower
  • Created at Penn State
  • Includes a transition-sensitive, cycle-accurate
    datapath energy model
  • Integrates itself with the SimpleScalar toolkit
  • SimWattch
  • Also from Penn State
  • Integrated Simics and Wattch.

25
Multiprocessor Simulators
  • Shared memory multiprocessor simulators prior to
    1994
  • Focused on the memory system and ignored the
    processor model assumed a simple in- order
    processor.
  • Advent of complex out-of-order processors
    demanded a different simulator.
  • Thus RSIM Rice Simulator was released in 1997.

26
RSIM
  • Rsimoriginally an acronym for Rice simulator for
    ILP multiprocessors
  • Features include
  • out-of-order issue
  • register renaming
  • branch prediction
  • nonblocking caches
  • multiple cache-coherence protocols
  • user configurable parameters such as instruction
    window size, cache sizes and latencies, and flit
    size and delay.

27
The M5
  • Encompasses system-level architecture as well as
    processor microarchitecture.
  • Implemented in python and C and built with the
    Alpha ISA in mind.
  • Distinguishing feature Object Oriented Tool
  • Can now model a full DEC Tsunami system and run
    unmodified Linux 2.4/2.6 and FreeBSD.

28
The M5
  • Currently it provides three interchangeable CPU
    objects
  • a simple, functional, single CPI CPU
  • a detailed model of an out-of-order SMT capable
    CPU
  • a random memory-system tester.
  • Specifically suited for cache architecture
    simulation.
  • Other multiprocessor simulators include Talisman,
    Wisconsin Wind Tunnel II, Augmint, MINT and GEMS.

29
Modular Simulators
  • Simulators are typically written in sequential
    programming languages.
  • But, processors and their associated systems are
    highly parallel.
  • This additional serial mapping results in
    difficulty and consumes time to implement and
    debug a simulator.

Solution? Use of modular Simulators
30
Liberty Simulation Environment (LSE)
  • Product of Princeton
  • Maps each hardware component to a single software
    function.
  • Instantiating these components and making
    appropriate connections.
  • Enables hierarchically building more complex
    processor components.

31
Asim
  • Developed by the designers of the VAX and Alpha
    processors.
  • Allows model writers to faithfully represent the
    detailed timing of complex modern machines.
  • Other modular simulators include EXPRESSION, LISA
    and Microlib.

32
Benchmark suites
  • Simulators provide a platform for architects to
    model their processor.
  • Benchmark programs are the actual ones that asses
    the performance of a model.
  • This presentation classifies benchmarks into
  • General Purpose Benchmark Suites
  • Embedded Benchmark Suites
  • Miscellaneous Benchmark Suites

33
General Purpose Benchmark Suites
  • The Standard Performance Evaluation Corporation
    (SPEC) suite of benchmarks (www.spec.org) are the
    most widely used.
  • Founded in 1988, financed by its member
    organizations which include all leading computer
    and software manufacturers.
  • Written in common languages like C or FORTRAN.
  • Latest release SPEC2006 suite of benchmarks are a
    package of 12 application based and 17 CPU
    intensive benchmarks.

34
Embedded Benchmark Suites
  • Reason for classification?
  • MiBench from the University of Michigan one of
    the most popular ones.
  • Six types of benchmarks included in the MiBench
    toolkit
  • Automotive and industrial control (six)
  • Consumer devices (eight)
  • Office automation (five)
  • Network (two)
  • Security (seven)
  • Telecommunications (seven)

35
Miscellaneous Benchmark Suites
  • Benchmark suites which have been designed for
    specific purposes/applications.
  • Examples
  • Transaction Processing Performance Council
    benchmarks intended to measure the rate at which
    a system can process business related
    transactions over a network.
  • Other specific benchmarks for scientific
    research, bio informatics, office productivity,
    etc. exist.

36
SimpleScalar Revisited
  • Developed as part of the Multiscalar project in
    1992 under Gurinder Sohi at University of
    Wisconsin
  • Was released to the public with the assistance of
    Doug Burger in 1995.
  • Simulator of choice for computer architects
    more than one-third of papers use the toolkit.

37
SimpleScalar
  • Simplifies implementing hardware models capable
    of simulating complete applications.
  • Dynamic characterization both hardware and
    software parameters.
  • Uses execution-driven simulation.
  • I/O emulation module provides simulated programs
    with access to external input and output
    facilities.

38
Interpreters
  • Includes instruction interpreters for the ARM,
    x86, PPC, and Alpha instruction sets.
  • Written in a target definition language provides
    a comprehensive mechanism for describing how
    instructions modify registers and memory state.
  • Preprocessor uses these machine definitions to
    synthesize the interpreters, dependence
    analyzers, and microcode generators that
    SimpleScalar models need.

39
Tools available in SimpleScalar
Architectural Simulators
Functional
Performance
Trace Driven
Execution Driven
Instruction Schedulers
Cycle Timers
Interpreters
Direct Execution
Source SimpleScalar Hacker's Guide and
SimpleScalar 4.0 Tutorial by Todd Austin
40
Baseline Simulator Models
41
Sim-fast
  • Functional simulation
  • Optimized for speed
  • Assumes no cache
  • Does not support DLite!
  • Does not allow command line arguments

Adapted from SimpleScalar Introduction for
toolset release v2.0 by Praveen Bhojwani
42
Sim-safe
  • Functional simulation
  • Checks for correct alignment and access
    permissions for each memory reference
  • Optimized for speed
  • Assumes no cache
  • Supports DLite!
  • Does not allow command line arguments

Adapted from SimpleScalar Introduction for
toolset release v2.0 by Praveen Bhojwani
43
Sim-cache
  • Cache simulation
  • Ideal for fast simulation of caches
  • If the effect of cache performance on execution
    time is not necessary
  • Accepts command line arguments for
  • Level 1 2 instruction and data caches
  • TLB configuration (data and instruction)
  • Flush and compress
  • Ideal for performing high-level cache studies
    that dont take access time of caches into account

Adapted from SimpleScalar Introduction for
toolset release v2.0 by Praveen Bhojwani
44
Sim-cheetah
  • Cache simulation
  • Originated at UMich
  • Simulates fully associative cache efficiently
  • Simulates a sometime-optimal replacement policy
    (MIN)
  • MIN or OPT use future knowledge to select a
    replacement
  • Accepts command line arguments
  • Max size of cache
  • Replacement policy LRU, OPT
  • Fully associative, set associative, or direct
    mapped cache
  • Ideal for performing high-level cache studies
    that dont take access times of caches into
    account

Adapted from SimpleScalar Introduction for
toolset release v2.0 by Praveen Bhojwani
45
Sim-bpred
  • Simulate different branch prediction mechanisms
  • Generate prediction hit and miss rate reports
  • Does not simulate the effect of branch prediction
    on total execution time

Adapted from SimpleScalar Introduction for
toolset release v2.0 by Praveen Bhojwani
46
Sim-profile
  • Program profiler
  • Generates detailed profiles, by symbol and by
    address
  • Keeps track of and reports
  • Dynamic instruction counts
  • Instruction class counts
  • Branch class counts
  • Usage of address modes

Adapted from SimpleScalar Introduction for
toolset release v2.0 by Praveen Bhojwani
47
Sim-outorder
  • Most complicated and detailed simulator
  • Supports out-of-order issue and execution
  • Provides reports
  • Branch prediction
  • Cache
  • External memory
  • Various configurations

Adapted from SimpleScalar Introduction for
toolset release v2.0 by Praveen Bhojwani
48
DLite! Debugger
  • Lightweight symbolic debugger
  • Supported by all simulators except sim-fast

49
Parting Words
  • Released with a noble cause in mind.
  • Toolkit is well documented.
  • Includes baseline models.
  • Ideal toolkit for microarchitecture simulation.

50
Conclusions
  • Simulation is often the only practical way to
    test architectural ideas and assess system
    performance.
  • Offer flexibility to modify and analyze the
    impact of various architectural parameters and
    components.
  • Various types of simulators available along with
    the benchmarks used to evaluate the performance
    of a design.
  • Discussed SimpleScalar.

51
Future Work
  • Focus on various simulation methodologies and
    statistical approaches to the processing of the
    results.
  • Model a new architectural concept in one of the
    simulators and analyze and compare its
    performance.

52
References
  • J. Yi and D. Lilja, "Simulation of Computer
    Architectures Simulations, Benchmarks,
    Methodologies, and Recommendations," IEEE
    Transactions on Computers, Vol. 55, No. 3, March
    2006.
  • T. Austin, E. Larson, and D. Ernst,
    SimpleScalar An Infrastructure for Computer
    System Modeling, Computer, vol. 35, no. 2, pp.
    59-67, Feb. 2002.
  • Joseph Sharkey, "M-Sim A Flexible,
    Multi-threaded Architectural Simulation
    Environment" Technical Report CS-TR-05-DP01,
    Department of Computer Science, State University
    of New York at Binghamton, Binghamton, NY,
    October, 2005.
  • P. Magnusson, M. Christensson, J. Eskilson, D.
    Forsgren, G. Halberg, J. Hogberg, F. Larsson, A.
    Moestedt, and B. Werner, Simics A Full System
    Simulation Platform, Computer, vol. 35, no. 2,
    pp. 50-58, Feb. 2002.
  • http//simos.stanford.edu/
  • D. Brooks, V. Tiwari, and M. Martonosi, Wattch
    A Framework for Architectural-Level Power
    Analysis and Optimizations, Proc. Intl Symp.
    Computer Architecture, 2000.
  • W. Ye, N. Vijaykrishnan, M. Kandemir, and M. J.
    Irwin, The design and use of SimplePower a
    cycle-accurate energy estimation tool, In Proc.
    Design. Automation Conference (DAC), Los Angeles,
    June 5-9, 2000.
  • C. Hughes, V. Pai, P. Ranganathan, and S. Adve,
    Rsim Simulating Shared-Memory Multiprocessors
    with ILP Processors, Computer, vol. 35, no. 2,
    pp. 40-49, Feb. 2002.
  • S. Herrod. Tango lite A multiprocessor
    simulation environment. Technical report,
    Computer Systems Laboratory, Stanford University,
    1993.
  • http//m5.eecs.umich.edu, 2006.
  • M. Vachharajani, N. Vachharajani, D. Penry, J.
    Blome, and D. August, Microarchitectural
    Exploration with Liberty, Proc. Intl Symp.
    Microarchitecture, 2002.
  • J. Emer, P. Ahuja, E. Borch, A. Klauser, C. Luk,
    S. Manne, S. Mukherjee, H. Patil, S. Wallace, N.
    Binkert, R. Espasa, and T. Juan, Asim A
    Performance Model Framework, Computer, vol. 35,
    no. 2, pp. 68-76, Feb. 2002.

53
Thank you!!!
  • Questions
  • ???
Write a Comment
User Comments (0)
About PowerShow.com