Title: RTR: 1 Byte/Kilo-Instruction Race Recording
1RTR 1 Byte/Kilo-InstructionRace Recording
Rastislav Bodik
Mark D. Hill
2Why Do You Need a Recorder?
- gcc sim.c
- a.out
- Segmentation fault
gdb a.out gdbgt run Program received SIGSEGV. In
get() at hash.c45 45 a bucket-gtd
gdb a.out gdbgt run Program exited normally. gdbgt
gcc para-sim.c a.out Segmentation fault
gdb a.out log gdbgt run Program received
SIGSEGV. In get() at para-hash.c67 67 a
bucket-gtd
gcc para-sim.c a.out Segmentation fault Race
recorded in log
3Ideally
Long recording small log
Low runtime overhead
Low cost
gdb a.out log gdbgt run Program received
SIGSEGV. In get() at para-hash.c67 67 a
bucket-gtd
gcc para-sim.c a.out Segmentation fault Race
recorded in log
4Better and Better Recorders
5A New Recorder
1 Byte/Kilo- Instruction ASPLOS06
- This talk covers only RTR
- Regulated Transitive Reduction algorithm
Result One more step toward practical
6Outline
Race Recording
RTR Algorithm
Compress log during recording ? replay more
regularly
Results with Commercial Workloads
Conclusion
7Technically, whats race recording?
8Race Recording
Thread I
Thread J
Thread I
Thread J
X 1 X print(X)
- - - X X5 -
X 1 X print(X)
- X X5 - -
Original
Replay
X6
X10
9Terminologies and Assumptions
Dependence (black)
Conflicts (red)
Thread I
Thread J
Thread I
Thread J
ld A
add
ld A
add
st B
st B
st C
st C
st C
Log
st C
ld B
ld B
ld D
st A
ld D
st A
sub
sub
st C
st C
ld B
ld B
st D
st D
Recording
Replay
Goal Reproduce same conflicts with minimum log
data
10Regulated Transitive Reduction (RTR)
11Log All Conflicts
Thread I
Thread J
ld A
add
st B
st C
st C
ld B
st A
ld D
sub
st C
ld B
st D
Replay
But too many conflicts
12Netzers Transitive Reduction (TR)
Thread I
Thread J
TR reduced
1
1
ld A
add
st B
st C
2
2
st C
ld B
3
3
st A
ld D
4
4
sub
st C
5
5
ld B
st D
6
6
Replay
How to further reduce log size?
13The Intuition of the RTR Algorithm
After Reduction
14Stricter Dependences to Aid Vectorization
Thread I
Thread J
1
1
ld A
add
st B
st C
2
2
st C
ld B
3
3
st A
ld D
4
4
Replay
Fewer dependencies to log
15Compress Vectorized Dependencies
Thread I
Thread J
1
1
ld A
add
st B
st C
2
2
st C
ld B
3
3
st A
ld D
4
4
sub
st C
5
5
ld B
st D
6
6
Replay
TR?RTR fewer deps fewer byte/dep
16Deadlock Avoidance of RTR
Thread I
Thread J
1
1
ld A
add
st B
st C
2
2
st C
ld B
3
3
st A
ld D
4
4
sub
st C
5
5
ld B
st D
6
6
Recording
Limit the strict dependencies (see paper)
17Results with Commercial Workloads
18Full-system Simulation Method
- Commercial server hardware
- GEMS http//www.cs.wisc.edu/gems
- Full-system (OS application) executions
- 4-core CMP (Sequential Consistent)
- 1-way in-order issue, 2 GHz,
- 64KB I/D L1, 4MB L2, 64byte lines, MOSI directory
- Commercial server software
- Apache static web serving
- SpecJBB middleware
- OLTP TPC-C like
- Zeus static web serving
19Log Size 1 byte/KI
Less buffer, longer recording, smaller logs
20RTR vs. Netzers TR
Log Size
- 28 smaller log
- TR was optimal
TR
RTR
21Why Does RTR Work Well?
- RTR
- Instructions execute at similar speed
- Dependencies are often vectorizable
22A New Recorder
- Less hardware TSO not covered
- Equally important
- More details in the paper
Less Hardware ASPLOS06
SC TSO ASPLOS06
Result One more step toward practical
23Conclusion
- Race recording ? Counter nondeterminism
- RTR ? 1 byte/kilo-instruction
- Based on Netzers transitive reduction
- Create stricter dependencies
- Vectorize dependencies to compress log
- Avoid overly-strict hence no deadlock
- Future work
- Support snooping, SMT, replayer