Title: The Memory Behavior of Data Structures
1The Memory Behaviorof Data Structures
- Kartik K. Agaram,
- Stephen W. Keckler, Calvin Lin, Kathryn McKinley
- Department of Computer Sciences
- The University of Texas at Austin
2Memory hierarchy trends
- Growing latency to main memory
- Growing cache complexity
- More cache levels
- New mechanisms, optimizations
- Growing application complexity
- Lots of abstraction
Hard to predict how an application will perform
on a specific system
3Application understanding is hard
- Observations can generate Gigabytes of data
- Aggregation is necessary
- Current metrics are too lossy
- Different application behaviors ?similar
miss-rate - New metrics needed, richer but still concise
Our approach data structure decomposition
4Why decompose by data structure?
- Irregular app multiple regular data structures
- while (tmp) tmptmp-gtnext
- Data structures are high-level
- Results easy to visualize
- Can be correlated back to application source code
5Outline
- Data structure decomposition using DTrack
- Automatic instrumentation timing simulation
- Methodology
- Tools, configurations simulated, benchmarks
studied - Results
- Data structures causing the most misses
- Different types of access patterns
- Case study data structure criticality
6Conventional simulation methodology
- Simulated application shares resources with
simulator - disk, file system, network
- ..but is not aware of it
Application
Simulator
Host Processor
Resources
7A different perspective
Simulator
Application
Resources
- Application can communicate with simulator
- Leave core application oblivious automatically
add simulator-aware instrumentation
8DTrack
Application Sources
Instrumented Sources
Detailed Statistics
Application Executable
Source Translator
Compiler
Simulator
- DTracks protocol for application-simulator
communication
9DTracks protocol
- Application stores mapping at a predetermined
shared location - (start address, end address) ? variable name
- Application signals simulator somehow
- We enhance ISA with new opcode
- Other techniques possible
- Simulator detects signal, reads shared location
10DTrack instrumentation
- Global variables just after initialization
int globalTime int main ()
Before
After
int Time int main () print (FILE, Time,
Time, sizeof(Time)) asm
(mop)
11DTrack instrumentation
- Heap variables just after allocation
x malloc(4)
Before
After
x malloc(4) DTRACK_PTR x DTRACK_NAME x
DTRACK_SIZE 4 asm(mop)
12Design decisions
- Source-based rather than binary-based translation
- Local variables no instrumentation
- Instrumenting every call/return is too much
overhead - Doesnt cause many cache misses anyway
- Dynamic allocation on the stack handle alloca
just like malloc - Signalling opcode overload an existing one
- avoid modifying compiler, allow running natively
13Instrumentation can perturb app behavior
- Minimizing perturbance
- Global variables are easy
- One-time cost
- Heap variables are hard
- DTRACK_PTR, etc. always hit in the cache
- Measuring perturbance
- Communicate specific start and end points in
application to simulator - Compare instruction counts between them with and
without instrumentation
14Outline
- Data structure decomposition using DTrack
- Automatic instrumentation timing simulation
- Methodology
- Tools, configurations simulated, benchmarks
studied - Results
- Data structures causing the most misses
- Different types of access patterns
- Case study data structure criticality
15Methodology
- Source translator C-Breeze
- Compiler Alpha GEM cc
- Simulator sim-alpha
- Validated model of 21264 pipeline
- Simulated machine Alpha 21264
- 4-way issue, 64KB 3-cycle DL1
- Benchmarks 12 C applications from SPEC CPU2000
suite
16Major data structures by DL1 misses
DL1 misses
17Code Data profile Access pattern
18Most misses ? Most pipeline stalls?
- Process
- Detect stall cycles when no instructions were
committed - Assign blame to data structure of oldest
instruction in pipeline - Result
- Stall cycle ranks track miss count ranks
- Exceptions
- tds in 179.art
- search in 186.crafty
19Summary
- Toolchain for mapping addresses to high-level
data structure - Communicating information to simulator
- Reveals new patterns about applications
- Applications show wide variety of distributions
- Within an application, data structures have a
variety of access patterns - Misses not correlated to accesses or footprint
- ..but they correlate well with data structure
criticality