Combining Statistical and Symbolic Simulation - PowerPoint PPT Presentation

About This Presentation
Title:

Combining Statistical and Symbolic Simulation

Description:

... technique work for poorly behaved applications? Will it extend to deeper pipelines and more real processors (i.e. Alpha, P6 architecture)? – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 26
Provided by: mark332
Category:

less

Transcript and Presenter's Notes

Title: Combining Statistical and Symbolic Simulation


1
Combining Statistical and Symbolic Simulation
  • Mark Oskin
  • Fred Chong and Matthew Farrens
  • Dept. of Computer ScienceUniversity of
    California at Davis

2
Overview
  • HLS is a hybrid performance simulation
  • Statistical Symbolic
  • Fast
  • Accurate
  • Flexible

3
Motivation
I-cache hit rate
Basic block size
Dispatch bandwidth
I-cache miss penalty
Branch miss-predict penalty
4
Motivation
  • Fast simulation
  • seconds instead of hours or days
  • Ideally is interactive
  • Abstract simulation
  • simulate performance of unknown designs
  • application characteristics not applications

5
Outline
  • Simulation technologies and HLS
  • From applications to profiles
  • Validation
  • Examples
  • Issues
  • Conclusion

6
Design Flow with HLS
Cycle-by- Cycle Simulation
Estimate Performance
Profile
Design Issue
Possible solution
HLS
Design Issue
Design Issue
7
Traditional Simulation Techniques
  • Cycle-by-cycle (Simplescalar, SimOS,etc.)
  • accurate
  • slow
  • Native emulation/basic block models (Atom, Pixie)
  • fast, complex applications
  • useful to a point (no low-level modifications)

8
Statistical / Symbolic Execution
  • HLS
  • fast (near interactive)
  • accurate / within regions
  • permits variation of low-level parameters
  • arbitrary design points / use carefully

9
HLS A Superscalar Statistical and Symbolic
Simulator
Statistical
Symbolic
10
Workflow
Code
sim-stat
Binary
sim-outorder
app profile
machine-profile
R10k
Stat-binary
HLS
machine-configuration
11
Machine Configurations
  • Number of Functional units (I,F,L,S,B)
  • Functional unit pipeline depths
  • Fetch, Dispatch and completion bandwidths
  • Memory access latencies
  • Mis-speculation penalties

12
Profiles
  • Machine profile
  • cache hit rates gt (?)
  • branch prediction accuracy gt (?)
  • Application profile
  • basic block size gt (?,?)
  • instruction mix ( of I,F,L,S,B)
  • dynamic instruction distance (histogram)

13
Statistical Binary
  • 100 basic blocks
  • Correlated
  • random instruction mix
  • random assignment of dynamic instruction distance
  • random distribution of cache and branch behaviors

14
Statistical Binary
dynamic instruction distance
branch predictor behavior
load (l1 i-cache, l2 i-cache, l1 d-cache l2
d-cache, dependence 0)
integer (l1 i-cache, l2 i-cache, dependence 0,
dependence 1)
integer (l1 i-cache, l2 i-cache, dependence 0,
dependence 1)
branch (l1 i-cache, l2 i-cache, branch-predictor
accr., dep 0, dep 1)
store (l1 i-cache, l2 i-cache, l1 d-cache l2
d-cache, dep 0, dep 1)
load (l1 i-cache, l2 i-cache, l1 d-cache l2
d-cache, dependence 0)
core functional unit requirements
cache behavior during I-fetch
cache behavior during data access
15
HLS Instruction Fetch Stage
Fetches symbolic instructions and interacts with
a statistical memory system and branch predictor
model.
Similar to conventional instruction fetch - has
a PC- has a fetch window- interacts with
caches- utilizes branch predictor- passes
instructions to dispatch Differences - caches
and branch predictor are statistical models
16
Validation - SimpleScalar vs. HLS
17
Validation - R10k vs. HLS
18
HLS Multi-value Validation with SimpleScalar
HLS
Simple-Scalar
(Perl)
19
HLS Multi-Value Validation with SimpleScalar
HLS
Simple-Scalar
(Xlisp)
20
Example use of HLS
An intuitive result branch prediction accuracy
becomes less important (crosses fewer iso-IPC
contour lines, as basic block size increase).
(Perl)
21
Example use of HLS
Another intuitive result gains in IPC due to
basic block size are front-loaded
Trade-off between front-end (fetch/dispatch) and
back-end (ILP) processor performance
(Perl)
22
Example use of HLS
This space intentionally left blank.
(Perl)
23
Related work
  • R. Carl and J.E. Smith. Modeling superscalar
    processors via statistical simulation - PAID
    Workshop - June 1998.
  • N. Jouppi. The non-uniform distribution of
    instruction-level and machine parallelism and its
    effect on performance. - IEEE Trans. 1989.
  • D. Noonburg and John Shen. Theoretical modeling
    of superscalar processor performance - MICRO27 -
    November 1994.

24
Questions Future Directions
  • How important are different well-performing
    benchmarks anyway?
  • easily summarized
  • summaries are not precise gt yet precise enough
  • Will the statisticalsymbolic technique work for
    poorly behaved applications?
  • Will it extend to deeper pipelines and more real
    processors (i.e. Alpha, P6 architecture)?

25
Conclusion
  • HLS Statistical Symbolic Execution
  • Intuitive design space exploration
  • Fast
  • Accurate
  • Flexible
  • Validated against cycle-by-cycle and R10k
  • Future work deeper pipelines, more hardware
    validations, additional domains
  • source code at http//arch.cs.ucdavis.edu/oskin
Write a Comment
User Comments (0)
About PowerShow.com