Software Estimation for Application Specific Multiprocessor SoCs - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Software Estimation for Application Specific Multiprocessor SoCs

Description:

Satish Parvataneni (2001MCS017) Department of Computer Science & Engineering ... Processor cycles = (K L) * ceil(N/L) M ceil(N/L) Slide 25 ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 38

Provided by: phil253

Category:

more less

Transcript and Presenter's Notes

Title: Software Estimation for Application Specific Multiprocessor SoCs

1
Software Estimation for Application Specific
Multiprocessor SoCs

Under the Supervision of
Prof. M.Balakrishnan

2
Presentation Outline

Introduction and Motivation
Objectives
Implementation
Results
Conclusions Future work
References

3
Introduction and Motivation

Application specific processors
Multiprocessor SoCs
SRIJAN flow

4
Why Application Specific Multiprocessors
Compute Intensive Application
Control Part
General Purpose Multiprocessor
Application Specific Multiprocessor
No customization
Customization
Higher Performance
Avg. Performance
5
Role of Processor Customization

Allows effective utilization of resources
Makes solution cheaper

6
SRIJAN System Level Design Methodology
7
Presentation Outline

Introduction and motivation
Objectives
Implementation
Results
Conclusions Future work
References

8
Objectives

Objectives
Defining multiprocessor architecture description
Developing a tool to generate a task graph and
annotate with

Computation estimates

Communication overheads

Input
Application IR
Profiled data
Architecture description.
Output
Annotated task graph

9
Presentation Outline

Introduction and motivation
Objectives
Implementation
Results
Conclusions Future work
References

10
Implementation

Defined sections to describe multiprocessor
architecture
Task graph generation
Modified MACHSUIF library for estimating
execution times

11
Architecture Description

Describing the architecture using HMDES and
extracting information using MQes.
There are three sections
Memory section
Processor section
Bus section

12
Architecture Description contd

Memory section
Memory type
No of ports
Memory size
Bus name
Processor section
Register file information
Cache information
Instruction set information
Pipeline information

13
Architecture Description contd

Bus section
Protocol information
Connectivity information
Bit width information
BCU information
Main section
Integrate all the above three sections
Extracting details with MQes

14
Task Graph

Application model is pthreads
Task is defined as a piece of sequential code

15
Task Graph contd

Problems encountered
Thread creation in loops
Thread creation in if-else statement
Solutions
Unrolling loops
Pruning the less frequently executed part with
the help of profiling information

16
Execution time Estimation

Machine SUIF library
Extract DDG at basic block level
Supply the resource model to the scheduler
Generating the estimates by using scheduler

17
MACHSUIF Flow
Application in C
Lower level SUIF
SUIF virtual machine
Target instructions
Target machine Description Resource Model -gt
Target dependent
Control flow graph
Profiling
Register allocation
Scheduler Estimates
--gt
18
Resource Model
Resources a, b, c Vectors a1 i1 b
i2 ac, c i3
19
Collision matrices for instruction classes
20
Generated Automata
F1
b
x 0 0 0 x 0
F0
a
0 0 0 0 0 0
a
F2
F0 and F4 are Cycle advancing states
b
0 0 x 0 0 0
a
c
c
F3
x 0 0 0 x x
F4
Modified Flow
0 0 0 0 x 0
b
F5
b
a
0 0 x 0 x 0
21
Modified Flow
22
Branch Delays

Unconditional Branches
delay uncond_delay cur_block_profile_info
Conditional Branches
taken_delay frequency of branch taken
taken_delay
not_taken_delay frequency of branch not taken
not_taken_delay
delay taken_delay not_taken_delay
Delay information is extracted from the processor
pipeline
Branch frequency information is obtained from
gcov profiler

23
Memory References

Classifying loads and stores
Loads and stores involving scalar variables
Loads and stores involving array references
Scalar References
All the scalar variables are stored in
consecutive memory locations
There is only one cache miss corresponding to
every cache line containing a scalars

24
Scalar References

N , no of scalar variables involved in the memory
access
M, no of memory access to the N scalar variables
K, no of processor cycles to fetch one line to
the cache
L, cache line size

Processor cycles (KL) ceil(N/L) M
ceil(N/L)

25
Array References

Self-spatial reuse
A reference access same cache line in different
iterations
Self-temporal reuse
A reference access same data location in
different iterations
Group-spatial reuse
Different references access same cache line in
different iterations
Group-temporal reuse
Different references access same data location in
different iterations

26
Array References contd

Self-temporal reuse references are moved outside
the loop
Group the remaining references into equivalence
classes.
Each class exhibit self-spatial and group-spatial
reuse
Calculate effective accesses per iteration

27
Example
for(i0iltMi) for(j0jltMj) aij
aij ai-1j ai1j aij-1
aij1 bi cji

bi is self-temporal reuse
aij1 is self-spatial reuse
aij and aij1 group temporal reuse
aij and aij-1 group spatial reuse

28
Example contd

aij, aij-1, aij1
ai-1j
ai1j

3 memory access in each iteration for A , 3/L
per j
1 memory access in each iteration for C ie 1 per
j
For B, 1/L off-chip access per each iteration per
i

29
Presentation Outline

Introduction and motivation
Objectives
Implementation
Results
Conclusions Future work
References

30
Tsim vs Our Estimates
Percentage of Error 16, 12, 18
31
Presentation Outline

Introduction and motivation
Objectives
Implementation
Results
Conclusions Future work
References

32
Conclusions Contributions

Facilitated system level architecture description
for SRIJAN
Task graph formulation
Execution time estimates
List scheduler
Branch delays
Memory
Leon target library

Architectural exploration
Instruction latencies
Number of FUs
Memory organizations
Register file organizations

33
Future Work

Task Graph Formulation
Synchronization overheads
Improving Leon Library
Extracting latency information from HMDES

34
Presentation Outline

Introduction and motivation
Objectives
Implementation
Results
Conclusions Future work
References

35
References

SRIJAN
Trimaran mQs functions in md.h (in
trimaran/impact dir)
SUIF2 documentation
MACHSUIF documentation
Instruction scheduling library for SUIF by Gang
Chen and Cliff Young, Harvard University
Efficient instruction scheduling using finite
state automata by Vasanth Bala and Norman Rubin
Local memory exploration and optimization in
embedded systems by P R Panda, Nikil D.Dutt,
Alexandru Nicolau
M.J. Flynn, "Computer Architecture Pipelined
and Parallel Processor Design", Narosa Publishing
House, 1996.