System-level Exploration for Pareto-optimal Configurations in Parameterized Systems-on-a-chip Architectures

About This Presentation
Title:

System-level Exploration for Pareto-optimal Configurations in Parameterized Systems-on-a-chip Architectures

Description:

ATI Technologies XILLEON 220 SOC for Digital Set-top Box Market ... Adelante Technologies offers complete SOC customizable platforms for DSP domains ... –

Number of Views:34
Avg rating:3.0/5.0
Slides: 46
Provided by: csu71
Learn more at: http://www.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: System-level Exploration for Pareto-optimal Configurations in Parameterized Systems-on-a-chip Architectures


1
System-level Exploration for Pareto-optimal
Configurations in Parameterized Systems-on-a-chip
Architectures
  • Tony Givargis (Frank Vahid, Jörg Henkel)
  • Center for Embedded Computer Systems
  • University of California
  • Irvine, CA 92697
  • givargis_at_ics.uci.edu

2
Overview
  • Given
  • Parameterized SOC architecture
  • Fixed application
  • Automatically explore the design space
  • Find optimal points w/respect to power and
    performance

3
Motivation
  • Design trends
  • Growing demand for portable devices
  • Growing demand for low power design
  • Increased application complexity
  • Shrinking time-to-market windows
  • Technology trends
  • Increased chip capacity
  • Increased I/O pins
  • Improved on-chip integration techniques (storage,
    digital, analog, digital, )
  • SOC era

Need for greater designer productivity!
4
Motivation
  • One approach reuse of existing IP
  • IP selection ?
  • IP integration ?
  • SOC verification ?
  • Multi-source IP licensing
  • More

5
Motivation
  • Alternate approach reuse of SOC
  • Designed, integrated, tested
  • Domain specific
  • Parameterized
  • Designed by firms specializing in SOC
  • User map application, then, configure-and-execut
    e
  • (successors to microcontrollers!)

6
Motivation
  • Composed of 100s of cores
  • Cores are configurable
  • Configurations impact power/performance
  • Large number of total configurations!

Architecture is otherwise fixed!
7
Motivation
  • ATI Technologies XILLEON 220 SOC for Digital
    Set-top Box Market
  • Tensilica Xtensa 1040 configurable processor
    cores
  • Philips Semiconductors Velocity RSP9 SOC
    platforms
  • Adelante Technologies offers complete SOC
    customizable platforms for DSP domains
  • More

8
Outline
  • Previous work
  • Target architecture
  • Power/performance estimation
  • Parameter space exploration
  • Experiments
  • Conclusion

9
Outline
  • Previous work
  • Target architecture
  • Power/performance estimation
  • Parameter space exploration
  • Experiments
  • Conclusion

10
Previous Work
  • Parameterized SOC design
  • Malik00, Veidenbaum99, Vahid99, Stan95
  • Power/performance evaluation
  • Barndolese00, Simunic99, Li98, Tiwari94
  • Design space exploration (manual)
  • givargis99, Lieverse99
  • Design space exploration (automatic)
  • Focus of this work

11
Previous Work
Y-chart Lieverse99
12
Outline
  • Previous work
  • Target architecture
  • Power/performance estimation
  • Parameter space exploration
  • Experiments
  • Conclusion

13
Target Architecture
14
Target Architecture
  • Voltage scale
  • Size, line, associativity
  • Bus width, encoding (gray, invert)
  • UART tx/rx buffer size
  • DCT resol.

15
Target Architecture
  • Voltage scale
  • Size, line, associativity
  • Bus width, encoding (gray, invert)
  • UART tx/rx buffer size
  • DCT resol.

16
Target Architecture
  • Voltage scale
  • Size, line, associativity
  • Bus width, encoding (gray, invert)
  • UART tx/rx buffer size
  • DCT resol.

17
Target Architecture
  • Voltage scale
  • Size, line, associativity
  • Bus width, encoding (gray, invert)
  • UART tx/rx buffer size
  • DCT resol.

18
Target Architecture
  • Voltage scale
  • Size, line, associativity
  • Bus width, encoding (gray, invert)
  • UART tx/rx buffer size
  • DCT resol.

19
Target Architecture
  • 26 parameters
  • 1014 configurations
  • What are the optimal configuration (given a fixed
    application)?

20
Problem Summary
  • What are the possible power/performance
    tradeoffs? (100 trillion)
  • Need to efficiently evaluate power/performance
    (1/sec?150,000 years)
  • Need to explore the configuration space

21
Outline
  • Previous work
  • Target architecture
  • Power/performance estimation
  • Parameter space exploration
  • Experiments
  • Conclusion

22
Power Evaluation
180000
  • Exploration works with
  • Chip instrumentation (real-time)
  • System-level simulation
  • RTL simulation
  • Gate-level simulation
  • Circuit-level simulation
  • Relative accuracy required!

28800
5400
440
1
Digital camera application mapped on our SOC,
capturing 1 image.
23
Power Evaluation
180000
  • Exploration works with
  • Chip instrumentation (real-time)
  • System-level simulation
  • RTL simulation
  • Gate-level simulation
  • Circuit-level simulation
  • Relative accuracy required!

28800
5400
440
1
Digital camera application mapped on our SOC,
capturing 1 image.
24
Power Evaluation - Processor
  • Tiwari94/00s instruction-level
  • Measure watt/inst
  • Account for stalls dependency
  • Apply traces

25
Power Evaluation Cache/Mem.
  • Evans95
  • Capacitance model of sub- components
  • Switching obtained via simulation (parameter
    dependent)

26
Power Evaluation Buses
  • Chern92
  • Model bus capacitance
  • Switching derived from I/O traffic (parameter
    dependent)

27
Power Evaluation Peripherals
  • Observation cores execute instructions!
  • Apply a technique similar to that used for
    processors!

28
Power Evaluation Summary
50-100K instruction/second! (Platune)
29
Outline
  • Previous work
  • Target architecture
  • Power/performance estimation
  • Parameter space exploration
  • Experiments
  • Conclusion

30
Exploration
  • Problem formulation
  • P1, P2, , Pn
  • A configuration (point) is an assignment of
    values to all parameters
  • How to efficiently generate all Pareto-optimal
    configurations?

31
Exploration
  • Algorithm Idea

320 points
42 points
  • Directed graph

 
32
Exploration
  • A ? B Pareto-optimal configurations of B
    calculated after Pareto-optimal configurations of
    nodes along the path A ? B
  • A ? B ? A, (cycle) Pareto-optimal
    configurations of all the parameters on the cycle
    calculated simultaneously
  • A Pareto-optimal configurations calculated in
    isolation

33
Exploration
Dependency Graph
 
34
Exploration
  • Dependency graph
  • Based on designer knowledge
  • Computed by simulating all pairs of nodes
    (quadratic time complexity, approx.)
  • One time effort

 
35
Exploration Algorithm
  • Step 1 Clustering followed by simulation

36
Exploration Algorithm
  • Step 2 Pair-wise merge followed by simulation

37
Exploration
  • Exhaustive solution
  • Evaluate all points
  • Sort by decreasing execution time
  • Walk through the space, eliminate points with
    power gt minimum seen so far!
  • Substitute heuristics

(only works for 1-4 parameters!)
38
Exploration
  • Complexity O((K log(K)) 2N/K)
  • K is the number of clusters
  • N is the number of parameters
  • 2N/K bounds the exhaustive comp.
  • (K log(k)) bounds the number of iterations
  • Worse case K1, best case KN
  • 2N/K decrease rapidly as K increases (e.g.,
    226/2226/2 is much smaller than 226!)

39
Outline
  • Previous work
  • Target architecture
  • Power/performance estimation
  • Parameter space exploration
  • Experiments
  • Conclusion

40
Exploration Results
  • JPEG
  • Exploration time 29.1 min
  • Config. visited 12352 (141)
  • 5.10x exe. time
  • 7.51x power
  • 2.73x energy
  • Pruning ratio gt 0.99997

41
Exploration Results
  • CKEY
  • Exploration time 108 min
  • Config. visited 15890 (223)
  • 8.31x exe. time
  • 6.08x power
  • 2.57x energy
  • Pruning ratio gt 0.99993

42
Exploration Results
  • IMAGE
  • Exploration time 50.2 min
  • Config. visited 10135 (80)
  • 8.29x exe. time
  • 8.57x power
  • 1.81x energy
  • Pruning ratio gt 0.99998

43
Exploration Results
  • MATRIX
  • Exploration time 73.6 min
  • Config. visited 12623 (84)
  • 10.7x exe. time
  • 8.16x power
  • 3.18x energy
  • Pruning ratio gt 0.99997

44
Exploration Results
JPEG
JPEG
JPEG
45
Conclusion
  • Gave a system-level algorithm for exploring the
    solution space of an application mapped to a
    parameterized SOC architectures
  • Given a dependency graph we extensively prune the
    solution space
  • Pruning ratio gt 0.99997 in experiments
  • Future work
  • Automatically compute the dependency model
  • Replace the exhaustive sub-algorithm with a
    heuristic (e.g., gradient search, GA)
Write a Comment
User Comments (0)
About PowerShow.com