Title: Predictable Programming on a Precision Timed Architecture
1Predictable Programming on a Precision Timed
Architecture
- Ben Lickly - UC Berkeley
- Isaac Liu - UC Berkeley
- Sungjun Kim - Columbia University
- Hiren D. Patel UC Berkeley
- Stephen A. Edwards - Columbia University
- Edward A. Lee - UC Berkeley
2Edwards and Lee - Case for PRET
- 2007 Edwards and Lee made a case for precision
timed computers (PRET machines) - Predictability
- Repeatability
- S. A. Edwards and E. A. Lee, The case for the
precision timed (PRET) machine. In Proceedings of
the 44th Annual Conference on Design Automation
(San Diego, California, June 04 - 08, 2007). DAC
'07. ACM, New York, NY, 264-265.
2
3Edwards and Lee - Case for PRET
- Unpredictability
- Difficulty in determining timing behavior through
analysis - Non-repeatability
- Different executions may yield different timing
behavior - Brittleness
- Small changes have big effects on timing behavior
3
4Brittleness
- Expensive affair
- Tight coupling of software and hardware
- Reliance on testing for validation
- Upgrading difficult
- Solution stockpile
Source www.skycontrol.net
4
5But wait
- Real-time scheduling
- Worst-case execution time
- Detailed model of hardware
- Large engineering effort
- Valid for particular hardware models
- Interrupts, inter-process communication, locks
- Bench testing
- Brittle
Sebastian Altmeyer, Christian Hümbert, Björn
Lisper, and Reinhard Wilhelm. Parametric Timing
Analysis for Complex Architectures. In
Proceedings of the 14th IEEE International
Conference on Embedded and Real-Time Computing
Systems and Applications (RTCSA'08), pages
367-376, Kaohsiung, Taiwan, August 2008. IEEE
Computer Society.
5
6Precise Timing and High Performance
Traditional
Alternative
Caches
Scratchpads
Deep pipelines
Thread-interleaved pipelines
Function-only ISAs
ISAs with timing instructions
Function-only languages
Languages and programming models with timing
Best-effort communication
Fixed-latency communication
Time-sharing
Multiple independent processors
6
7Outline
- Introduction
- Related Work
- PRET Machine
- Programming Example
- Future Work
- Conclusion
7
8Related Work
- Java Optimized Processor
- Schoeberl et al. 2003
- Timing instructions
- Ip and Edwards 2006
- Reactive processors
- Von Hanxleden et al. 2005
- Salcic et al. 2005
- Virtual Simple Architecture
- Mueller et al. 2003
8
9Semantics of Timing Instructions
- Ip and Edwards 2007
- Deadline instructions
- Denote the required execution time of a block
- When decoded
- Stall instruction until timer value is 0
- Then set timer value to new value
- deadi t0, 10
-
- deadi t0, 8
-
- deadi t0, 0
-
- L0
- deadi t0, 10
-
- b L0
-
Straight Line Block 0
Straight Line Block 1
Loop Block
9
10Tracing A Program Fragment
cycle
t0
- A deadi t0, 6
- B sethi hi(0x3f800000), g1
- C or g1, 0x200, g1
- D st g1, fp -12
- E deadi t0, 8
- F
11Precision Timed Architecture
Scratchpad memories
Round-robin thread scheduling
Thread-interleaved pipeline
Time-triggered main memory access
11
12Clocks and Memory Hierarchy
- Clocks
- Main clock
- Derived clocks
- Instruction and data scratchpad memories
- 1 cycle access latency
- Main memory
- 16MB size
- Latency of 50ns
- Frequency250Mhz
- 13 cycles latency
Core
Main Mem.
DMA
12
13Thread-interleaved Pipeline
- Thread stalls
- Main memory access
- Deadline instructions
- Replay mechanism
- Execute same PC next iteration
Decrement Deadline Timers
Fetch
F/D
Decode
D/R
Stall if Deadline Instruction
Reg. Access
R/E
Execute
E/M
Check main memory access
Memory
M/W
Increment PC
WriteBack
13
14Time-Triggered Access through Memory Wheel
- Decouple threads access pattern
- Time-triggered access
- Each thread must make and complete access within
its window
14
15Tool Flow
- GCC 3.4.4, SystemC 2.2, Python 2.4
15
16Simple Mutual Exclusion Example
- Producer followed by Consumer and Observer
- Consumer and Observer execute together
- Loop rate of two rotations of memory wheel
- 1st for Producer to write
- 2nd Consumer and Observer to read
Write to output
Write to shared data
Read from shared data
16
17Video Game Example
Graphic Thread
VGA-Driver Thread
Main-Control Thread
Pixel Data
Command
Even Buffer
Even Queue
Command
Pixel Data
Odd Buffer
Odd Queue
Swap (When Sync Requested and When Odd Queue
Empty)
Swap (When sync requested and when Vertical
blank)
Update Screen (Sync request)
Refresh (Sync request)
Sync (After queue swapped)
Sync (After buffer swapped)
17
18Timing Requirements
Signal
Timing Requirement
Pixel Cycles
V. Sync
64µs
1611
V. Back-porch
1.02ms
25679
Draw 480 lines
15.25ms
V. Front-porch
350µs
8811
H. Sync
3.77µs
96
H. Back-porch
1.89µs
48
Draw 640 pixels
25.42µs
H. Front-porch
0.64µs
16
18
19Timing Implementation
- Pixel-clock using derived clock
- 25.175Mhz
- Drawing 16 pixels
19
20Future Work
- Architecture
- DMA
- DDR2 main memory model
- Thread synchronization primitives
- Shared data between threads
- Real-time Benchmarks
- With timing requirements
- Programming models
- Memory allocation schemes
- Synchronizations
20
21Conclusion
- What we want
- Time as a first class citizen of embedded
computing - Predictability
- Repeatability
- Where we are at
- PRET cycle-accurate simulator
- Release
- http//chess.eecs.berkeley.edu/pret/
21
22(No Transcript)
23Extras
24More on Brittleness
- Small changes may have big effects on timing
behavior - Theorem (Richards anomalies)
- If a task set with fixed priorities, execution
times, and precedence constraints is optimally
scheduled on a fixed number of processors, then
increasing the number of processors, reducing
execution times, or weakening precedence
constraints can increase the schedule length. - Richard L. Graham, Bounds on the performance of
scheduling algorithms, in E. G. Coffman,
Jr.(ed.), Computer and Job-Shop Scheduling
Theory, John Wiley, New York, 1975.
25Richards Anomalies
- 9 tasks, 3 processors, priority list, precedence
order, execution times.
0
3
12
26Richards Anomalies Reducing Execution Times
0
3
12
27Richards Anomalies More Processors
0
3
12
15
28Richards Anomalies Changing Priority List
- L (T1,T2,T4,T5,T6,T3,T9,T7,T8)
0
3
12
29Brittleness Again
- In general, all task scheduling strategies are
brittle