Title: Multimedia Processor: Architectures and Applications
1Multimedia Processor Architectures and
Applications
- By
- Surin Kittitornkun
- 14 March, 1998
2Contents
- Programmable Multimedia Processor Why ?
- TMS320C8x C80 C82
- MPACT-R-2
- Targeted Applications
- Application H.324 on TMS320C82
- Current Multimedia Processors
- References
3Programmable Multimedia Processor Why ?
- More flexibility changes can be made in software
- Less complexity shorter time to market
- More cost efficient than ASIC design
- Requires software development
- May consume more power
- PC and consumer product
4TMS320C8x Overview
- RISC Master processor _at_ 50 and 60 MHz
- Parallel processors x2 (4 for c80)
- Transfer controller DMA and memory controller
- Video controller (C80 only)
5TMS320C8x Master Processor
- 32-bit RISC instruction/64-bit data
- Scoreboarded 31 GP registers and a zero register
- IEEE 754 floating point
- Supports vector FP operations
- Performs single precision FP MAC in 1 cycle 100
MFLOPS (_at_50 MHz) - Suitable for control protocols and FP intensive
algorithms
6TMS320C8x Processor communication
- Shared memory multiprocessor
- MP sends commands through command buffers located
in shared memory
7TMS320C8x Parallel Processor
- Data unit 32-bit datapath, ALU, multiplier,
etc. - 2 Independent Address units global and local
- Single cycle on-chip memory access (no conflict)
- Single cycle load/store of byte, halfword, and
word - Internal adder can offload data unit computation
- Program flow control unit
- 3-stage pipelining
- Instruction fetch
- Address generation, and
- Operation execution
- Supports conditional operation of data unit
operations, moves, load from memory and branches - PC is mapped into register file
- To minimize overhead Loop controller supports 3
levels of nested loops
8TMS320C8x Parallel Procesor Data Unit
- Split 32-bit 3-input ALU Boolean and arithmetic
operations - Split and rounded multiplier dual 8x816,
16x1632 - Flexible datapath barrel rotator, mask
generator - Supports signed, unsigned and saturate arithmetic
9TMS320C8x Parallel Procesor Data Unit
3-input ALU
- Supports totally 512 operations Boolean 256
Arith. 256 - Boolean F0 (ABC) F1 (ABC) F2
(ABC) F3 (ABC) F4 (ABC)
F5 (ABC) F6 (ABC) F7 (ABC) - Arithmetic A f1(B,C) f2(B,C) 1
- Example
- AB1
- (AB)(BC) Mask A and B by C and then add
- A((BC) (-BC)) Multiple-byte AB
- A-((BC) (-BC)) Multiple-byte A-B
10TMS320C8x Parallel Processor Instruction Set
- 64-bit opcode contains multiple subinstructions
for - Data unit
- Global address unit and
- Local address unit
- Ex d4d5d6gtgtd0 a8d7 d0(a0x1)
11TMS320C8x Transfer Controller
- Prioritizes, schedules, and transfers data cache
between on- and off-chip memories - Handles data cache (on chip RAM) miss and
instruction cache - Supports multidimensional data transfers
- simple contiguous linear sequence up to 3D region
- Memory interface supports a wide range of memory
system - DRAM, SDRAM, Video RAM and SRAM
12TMS320C8x Video Controller (c80 only)
- Provides simultaneous control over two
independent capture or display systems and frame
grabber or frame buffer image storage - Dual-frame timers
- Programmable timing and control registers
- Programmable line interrupt to MP
13TMS320C8x Development Tools
- C-like compilers and assemblers for both master
and parallel processor - Register allocator
- identifies live and free registers
- allows using variable names in assembly code and
- assigns specific register to variable
- Code compactor converts straight-line assembly
codes into parallel codes - Optimization can be done by hand for
time-critical parallel code
14TMS320C8x Execution Time for 256-Point FFT
-C" indicates performance with the cache
pre-loaded - Benchmark results for the TMS320C80
are for one of the on-chip DSP processors
15MPACT-R-2 Overview
- Hardware/Software relationship
- Variety of high speed I/O interface
16MPACT-R-2 CPU Datapath
- Data size multiple of 9 bits
- 512 72-bit register file with 4 read and 4 write
ports - ALU1 - shift and align
- ALU2 - add and logic
- ALU3 - arithmetic and logic
- ALU4 - stage 1 of multiplication
- ALU5 - motion estimation
- Full crossbar between ALU outputs, inputs,
register read and write ports
17MPACT-R-2 CPU Datapath
18MPACT-R-2 Multimedia ISA
- Issues two instruction pack of 72 bits every
cycle - Data forwarding from one ALU to one another
- Vector instruction (length upto 255)
- Multimedia data byte of 9, 18, 27, and 36 bits
- Supports signed , unsigned and saturating
arithmetic - MPACT 2 includes single-precision FP for 3D
graphics - Flow control branch, jump and calls
- Special purpose instruction
- Motion Estimation
- IDCT
- Butterfly FFT, etc.
19MPACT-R-2 Hardware/Software Relationship
- Requires a host x86 CPU
- Mediaware- uses standard APIs
- RM Resource Manager running under Windows
- MRK MPACT real-time kernel
- Nearest deadline scheduling algorithm
- Interrupt-driven kernel with 4-us context switch
time in the worst case
20MPACT-R-2 Hardware/Software Relationship
21MPACT-R-2 High speed I/O interface
- PCI bus or AGP (Accelerated Graphics Port)
- x86 Host CPU bus
- 66 MHz gt 264 Mbytes/s
- Rambus Memory Interface
- 300 MHz bus (9-bit wide) on both edge600Mbytes/s
- Requires 2-4 Mbytes
- Display Controller
- 24-bit RAMDAC
- High resolution up to 1280x1024 24-bit or
1600x1200 16-bit - Video Interface
- Accepts NTSC and PAL format video or
- DVD input through PCI or AGP
- Programmable Peripheral I/O Interface
- Supports connection to several devices
22MPACT-R-2 Architecture trade-offs
- High speed I/O to move data inout
- No Data cache but large register file
- multimedia data has poor locality
- Based on standard APIs (Application Program
Interface) of Microsoft Windows no proprietary
API - Pin counts vs. high memory bandwidth/low latency
- RDRAM is chosen
- PC and Consumer market
23Targeted Applications
- Video - DVD, MPEG 1 2 decoding, H.261/263
- Audio - Dolby AC-3, 3D Audio, MPEG Decode,
Wavetable Synthesis - Graphics - 2D 3D acceleration
- Communication
- Fax/MODEM V.34, 56k
- Desktop Videoconferencing
- H.320 ISDN
- H.324 on POTS (Plain Old Telephone System)
24H.324 on TMS320C82 Overview
- ITU-T H.324 Low-bit-rate multimedia
teleconferencing on circuit-switched network
includes - G.723 Audio coding at 5.3-6.4 kbps requires 18-20
fixed-point MIPS - H.263 Video coding based on H.261 includes some
enhancements - H.223 MUX/DEMUX control
- H.245 Control protocol
- V.34 Modem up to 33.6 kbps
- Other related standards H.320 (ISDN), H.323
(LAN), and H.310 (ATM/B-ISDN)
25H.324 on TMS320C82 Overview
26H.324 on TMS320C82 Task Partitioning
- Video Processing (H.263)
- Encoding
- Pre-processing MP
- Motion estimation PP0
- DCT PP0
- Decoding
- Huffman or arithmetic decode, IDCT, etc. PP0
- Post processing PP0
- Audio Processing and AEC (Acoustic Echo
Cancellation) - PP1 - G.723
- Encoding 22 MIPS
- Decoding 3 MIPS
- AEC LMS algorithm up to 64-ms echo 10MIPS
- MODEM V.34 20 MIPS - PP1
27H.324 on TMS320C82 Task Partitioning
28Current Multimedia Processors
29References
- J. Golston, Single-chip H.324 video
conferencing, IEEE Micro, August 1996, pp. 42-50 - Texas Instrument, TMSC320C80 Data Sheet, 1997
- http//www.ti.com/../sprs023b.pdf
- P. Lapseley and G. Blalock, How to estimate DPS
processor performance, IEEE Spectrum, July 1996,
pp. 74-78 - HTML file http//www.bdti.com/../wpeval.html
- P. Kalapathy, Hardware-software interfacing on
Mpact, IEEE Micro, March 1997, pp. 20-26 - Presentation file http//infopad.eecs.berkeley.ed
u/HotChips8/ - Chromatic Research, MPACT2 Preliminary Data
Sheet, Feb. 1998 - http//www.mpact.com/../mpact2.pdf
- Toshiba, TOSHIBA ANNOUNCES ITS NEXT-GENERATION
MPACT MEDIA PROCESSOR, September 22, 1997 - http//www.toshiba.com/taec/../to-628.htm
30References
- G. A. Slavenburg, The Trimedia TM-1 PCI VLIW
Mediaprocessor, IEEE Hot Chips 8 Symposium on
High-Performance Chips, Aug. 1996 - http//infopad.eecs.berkeley.edu/HotChips8/
- L. T.Nguyen, M. Mohamed, H. Park, Y. Pal, R.
Wong, A. Qureshi, P. Psong, F. Valesco, H. D.
Truong, C. Reader, Multi-media Signal Processor
(MSP) Summary , IEEE Hot Chips 8 Symposium on
High-Performance Chips, Aug. 1996 - http//infopad.eecs.berkeley.edu/HotChips8/
- D. Lindbergh, The H.324 multimedia communication
standard, IEEE Communication Magazine, December
1996, pp. 46-51 - K. Rijkse, H.263 Video coding for low-bit-rate
communication, IEEE Communication Magazine,
December 1996, pp. 42-45
31Useful links
- CPU Information Center
- http//infopad.eecs.berkeley.edu/CIC/
- Microprocessor Report
- http//www.chipanalyst.com/q/
- Berkeley Design Technology Inc.
- http//www.bdti.com/
- Peter Pirschs research group
- http//www.mst.uni-hannover.de/