A Decompression Architecture for Low Power Embedded Systems - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

A Decompression Architecture for Low Power Embedded Systems

Description:

... the SAMC algorithm and this decompress architecture as another factor to simulate (This paper) ... stored in the I-cache is decompressed. Post-cache: ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 38

Provided by: impac1

Category:

more less

Transcript and Presenter's Notes

Title: A Decompression Architecture for Low Power Embedded Systems

1
A Decompression Architecturefor Low Power
Embedded Systems
Yi-hsin Tseng Date11/06/2007

Lekatsas, H. Henkel, J. Wolf, W.
Computer Design, 2000. Proceedings. 2000
International Conference on 2000 IEEE

2
Outline

Introduction motivation
Code Compression Architecture
Decompression Engine Design
Experimental results
Conclusion Contributions of the paper
Our project
Relate to CSE520
Q A

3
Introduction motivation
4
For Embedded system

More complicated architecture in embedded system
nowadays.
Available memory space is smaller.
A reduced executable program can also indirectly
affect the chip on
Size
Weight
Power consumption

5
Why code compression/decompression?

Compress the instruction segment of the
executable running on the embedded system
Reducing the memory requirements and bus
transaction overheads
Compression ? Decompression

6
Related work on compressed instructions

A logarithmic-based compression scheme where
32-bit instructions map to fixed but smaller
width compressed instructions.
(The system using memory area only)
Frequently appearing instructions are compressed
to 8 bits.
(fixed-length 8 or 32 bits)

7
The compressed method in this paper

Give comprehensive results for the whole system
including
buses
memories (main memory and cache)
decompression unit
CPU

8
Code Compression Architecture
9
Architecture in this system (Post-cache)

Reason ?
-Increase the effective cache size
Improve instruction bandwidth

10
Code Compression Architecture

Use SAMC to compress instructions
(Semiadaptive Markov Compression)
Divide instructions into 4 groups
based on SPARC architecture
appended a short code (3-bit) in the beginning of
each compressed instruction

11
4 Groups of Instructions

Group 1
instructions with immediates
Ex sub i1, 2, g3 set 5000, g2
Group 2
branch instructions
Ex be, bne, bl, bg, ...
Group 3
instructions with no immediates
Ex add o1,o2,g3 st g1,o2
Group 4
Instructions that are left uncompressed

12
Decompression Engine Design(Approach)
13
The Key idea is.

Present an architecture for embedded systems that
decompresses offline-compressed instructions
during runtime
to reduce the power consumption
a performance improvement (in most cases)

14
Pipelined Design
15
Pipelined Design (cont)
16
Pipelined Design group 1 (stage 1)
Index the Dec. Table
Input Compressed Instructions
Forward instructions
17
Pipelined Design group 1 (stage 2)
18
Pipelined Design group 1 (stage3)
19
Pipelined Design group 1 (stage 4)
20
Pipelined Design group 2 branch instructions
(stage 1)
21
Pipelined Design group 2 branch instructions
(stage 2)
22
Pipelined Design group 2 branch instructions
(stage 3)
23
Pipelined Design group 2 branch instructions
(stage 4)
24
Pipelined Design group 3instructions with no
immediates (stage 1)
No immediate instructions may appear in pairs. -gt
compressed in one byte. (lt-gt 64 bits)
256 entry table
8 bits as index to address
25
Pipelined Design group 3instructions with no
immediates (stage 2)
26
Pipelined Design group 3instructions with no
immediates (stage 3)
27
Pipelined Design group 3instructions with no
immediates (stage 4)
28
Pipelined Design group 4 uncompressed
instructions
29
Experimental results
30
Experimental results

Use different applications
an algorithm for computing 3D vectors for a
motion picture ("i3d)
a complete MPEGII encoder ("mpeg ")
a smoothing algorithm for digital images ("smo")
a trick animation algorithm ("trick")
A simulation tool written in C for obtaining
performance data for the decompression engine

31
Experimental results (cont)

The decompression engine is application specific.
for each application -- build a decoding table
and a fast dictionary table that will decompress
that particular application only.

32
Experimental results for energy and performance
33
Worse performance on smo 512-byte instruction
cache? - Do not require large memory. (Execute
in tight loops) - Generates very few misses for
this cache size. (So the compressed
architecture therefore does not help an already
almost perfect hit ratio and the slowdown by
the decompression engine prevails)
34
Conclusion Contributions of the paper

This paper designed an instruction decompression
engine as a soft IP core for low power embedded
systems.
Applications run faster as opposed to systems
with no code compression (due to improved cache
performance).
Lower power consumption (due to smaller memory
requirements for the executable program and
smaller number of memory accesses)

35
Relate to CSE520

Implement the system performance and power
consumption by using Pipeline Architecture in
system.
A different architecture design for lower power
consumption on the Embedded system.
Smaller cache size perform better on compressed
architecture larger cache perform better on
no-compressed architecture.
Cache hit ratio

36
Our project

Goal
How to improve the efficiency of power management
in embedded multicore system
Idea
Use different power mode within a given power
budget, global power management policy (In Jun
Shens presentation)
Use the SAMC algorithm and this decompress
architecture as another factor to simulate (This
paper)
How?
SimpleScalar tool set
try simple function at first, then try the
different power mode

37
Thank you!Q A
38
Backup Slides
39
Critique

The decompression engine will slowdown the system
if the cache generate very few misses for some
cache size.

40
Post-cache Pre-cache
Pre-cache The instruction stored in the I-cache
is decompressed.
Post-cache The instruction stored in the I-cache
is still decompressed.
41
Problems for post-cache arch

Memory Relocation
The compression will change the instruction
location in the memory.
In pre-cache arch
Decompression is done before fetch into I-cache,
so the address in the I-cache neednt to be fixed.

42
SPARC Instruction Set