Predictable Programming on a Precision Timed Architecture - PowerPoint PPT Presentation

About This Presentation
Title:

Predictable Programming on a Precision Timed Architecture

Description:

Predictable Programming on a Precision Timed Architecture. Ben ... When decoded. Stall instruction until timer value is 0. Then set timer value to new value ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 30
Provided by: hsie2
Category:

less

Transcript and Presenter's Notes

Title: Predictable Programming on a Precision Timed Architecture


1
Predictable Programming on a Precision Timed
Architecture
  • Ben Lickly - UC Berkeley
  • Isaac Liu - UC Berkeley
  • Sungjun Kim - Columbia University
  • Hiren D. Patel UC Berkeley
  • Stephen A. Edwards - Columbia University
  • Edward A. Lee - UC Berkeley

2
Edwards and Lee - Case for PRET
  • 2007 Edwards and Lee made a case for precision
    timed computers (PRET machines)
  • Predictability
  • Repeatability
  • S. A. Edwards and E. A. Lee, The case for the
    precision timed (PRET) machine. In Proceedings of
    the 44th Annual Conference on Design Automation
    (San Diego, California, June 04 - 08, 2007). DAC
    '07. ACM, New York, NY, 264-265.

2
3
Edwards and Lee - Case for PRET
  • Unpredictability
  • Difficulty in determining timing behavior through
    analysis
  • Non-repeatability
  • Different executions may yield different timing
    behavior
  • Brittleness
  • Small changes have big effects on timing behavior

3
4
Brittleness
  • Expensive affair
  • Tight coupling of software and hardware
  • Reliance on testing for validation
  • Upgrading difficult
  • Solution stockpile

Source www.skycontrol.net
4
5
But wait
  • Real-time scheduling
  • Worst-case execution time
  • Detailed model of hardware
  • Large engineering effort
  • Valid for particular hardware models
  • Interrupts, inter-process communication, locks
  • Bench testing
  • Brittle

Sebastian Altmeyer, Christian Hümbert, Björn
Lisper, and Reinhard Wilhelm. Parametric Timing
Analysis for Complex Architectures. In
Proceedings of the 14th IEEE International
Conference on Embedded and Real-Time Computing
Systems and Applications (RTCSA'08), pages
367-376, Kaohsiung, Taiwan, August 2008. IEEE
Computer Society.
5
6
Precise Timing and High Performance
Traditional
Alternative
Caches
Scratchpads
Deep pipelines
Thread-interleaved pipelines
Function-only ISAs
ISAs with timing instructions
Function-only languages
Languages and programming models with timing
Best-effort communication
Fixed-latency communication
Time-sharing
Multiple independent processors
6
7
Outline
  • Introduction
  • Related Work
  • PRET Machine
  • Programming Example
  • Future Work
  • Conclusion

7
8
Related Work
  • Java Optimized Processor
  • Schoeberl et al. 2003
  • Timing instructions
  • Ip and Edwards 2006
  • Reactive processors
  • Von Hanxleden et al. 2005
  • Salcic et al. 2005
  • Virtual Simple Architecture
  • Mueller et al. 2003

8
9
Semantics of Timing Instructions
  • Ip and Edwards 2007
  • Deadline instructions
  • Denote the required execution time of a block
  • When decoded
  • Stall instruction until timer value is 0
  • Then set timer value to new value
  • deadi t0, 10
  • deadi t0, 8
  • deadi t0, 0
  • L0
  • deadi t0, 10
  • b L0

Straight Line Block 0
Straight Line Block 1
Loop Block
9
10
Tracing A Program Fragment
cycle
t0
  • A deadi t0, 6
  • B sethi hi(0x3f800000), g1
  • C or g1, 0x200, g1
  • D st g1, fp -12
  • E deadi t0, 8
  • F

11
Precision Timed Architecture
Scratchpad memories
Round-robin thread scheduling
Thread-interleaved pipeline
Time-triggered main memory access
11
12
Clocks and Memory Hierarchy
  • Clocks
  • Main clock
  • Derived clocks
  • Instruction and data scratchpad memories
  • 1 cycle access latency
  • Main memory
  • 16MB size
  • Latency of 50ns
  • Frequency250Mhz
  • 13 cycles latency

Core
Main Mem.
DMA
12
13
Thread-interleaved Pipeline
  • Thread stalls
  • Main memory access
  • Deadline instructions
  • Replay mechanism
  • Execute same PC next iteration

Decrement Deadline Timers
Fetch
F/D
Decode
D/R
Stall if Deadline Instruction
Reg. Access
R/E
Execute
E/M
Check main memory access
Memory
M/W
Increment PC
WriteBack
13
14
Time-Triggered Access through Memory Wheel
  • Decouple threads access pattern
  • Time-triggered access
  • Each thread must make and complete access within
    its window

14
15
Tool Flow
  • GCC 3.4.4, SystemC 2.2, Python 2.4

15
16
Simple Mutual Exclusion Example
  • Producer followed by Consumer and Observer
  • Consumer and Observer execute together
  • Loop rate of two rotations of memory wheel
  • 1st for Producer to write
  • 2nd Consumer and Observer to read

Write to output
Write to shared data
Read from shared data
16
17
Video Game Example
Graphic Thread
VGA-Driver Thread
Main-Control Thread
Pixel Data
Command
Even Buffer
Even Queue
Command
Pixel Data
Odd Buffer
Odd Queue
Swap (When Sync Requested and When Odd Queue
Empty)
Swap (When sync requested and when Vertical
blank)
Update Screen (Sync request)
Refresh (Sync request)
Sync (After queue swapped)
Sync (After buffer swapped)
17
18
Timing Requirements
Signal
Timing Requirement
Pixel Cycles
V. Sync
64µs
1611
V. Back-porch
1.02ms
25679
Draw 480 lines
15.25ms
V. Front-porch
350µs
8811
H. Sync
3.77µs
96
H. Back-porch
1.89µs
48
Draw 640 pixels
25.42µs
H. Front-porch
0.64µs
16
18
19
Timing Implementation
  • Pixel-clock using derived clock
  • 25.175Mhz
  • Drawing 16 pixels

19
20
Future Work
  • Architecture
  • DMA
  • DDR2 main memory model
  • Thread synchronization primitives
  • Shared data between threads
  • Real-time Benchmarks
  • With timing requirements
  • Programming models
  • Memory allocation schemes
  • Synchronizations

20
21
Conclusion
  • What we want
  • Time as a first class citizen of embedded
    computing
  • Predictability
  • Repeatability
  • Where we are at
  • PRET cycle-accurate simulator
  • Release
  • http//chess.eecs.berkeley.edu/pret/

21
22
(No Transcript)
23
Extras
24
More on Brittleness
  • Small changes may have big effects on timing
    behavior
  • Theorem (Richards anomalies)
  • If a task set with fixed priorities, execution
    times, and precedence constraints is optimally
    scheduled on a fixed number of processors, then
    increasing the number of processors, reducing
    execution times, or weakening precedence
    constraints can increase the schedule length.
  • Richard L. Graham, Bounds on the performance of
    scheduling algorithms, in E. G. Coffman,
    Jr.(ed.), Computer and Job-Shop Scheduling
    Theory, John Wiley, New York, 1975.

25
Richards Anomalies
  • 9 tasks, 3 processors, priority list, precedence
    order, execution times.

0
3
12



26
Richards Anomalies Reducing Execution Times
  • eTime eTime - 1

0
3
12



27
Richards Anomalies More Processors
  • 4 processors

0
3
12
15




28
Richards Anomalies Changing Priority List
  • L (T1,T2,T4,T5,T6,T3,T9,T7,T8)

0
3
12



29
Brittleness Again
  • In general, all task scheduling strategies are
    brittle
Write a Comment
User Comments (0)
About PowerShow.com