Multimedia Processor: Architectures and Applications - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Multimedia Processor: Architectures and Applications

Description:

identifies live and free registers. allows using variable names in assembly code and ... Video - DVD, MPEG 1 & 2 decoding, H.261/263 ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 32
Provided by: sur73
Category:

less

Transcript and Presenter's Notes

Title: Multimedia Processor: Architectures and Applications


1
Multimedia Processor Architectures and
Applications
  • By
  • Surin Kittitornkun
  • 14 March, 1998

2
Contents
  • Programmable Multimedia Processor Why ?
  • TMS320C8x C80 C82
  • MPACT-R-2
  • Targeted Applications
  • Application H.324 on TMS320C82
  • Current Multimedia Processors
  • References

3
Programmable Multimedia Processor Why ?
  • More flexibility changes can be made in software
  • Less complexity shorter time to market
  • More cost efficient than ASIC design
  • Requires software development
  • May consume more power
  • PC and consumer product

4
TMS320C8x Overview
  • RISC Master processor _at_ 50 and 60 MHz
  • Parallel processors x2 (4 for c80)
  • Transfer controller DMA and memory controller
  • Video controller (C80 only)

5
TMS320C8x Master Processor
  • 32-bit RISC instruction/64-bit data
  • Scoreboarded 31 GP registers and a zero register
  • IEEE 754 floating point
  • Supports vector FP operations
  • Performs single precision FP MAC in 1 cycle 100
    MFLOPS (_at_50 MHz)
  • Suitable for control protocols and FP intensive
    algorithms

6
TMS320C8x Processor communication
  • Shared memory multiprocessor
  • MP sends commands through command buffers located
    in shared memory

7
TMS320C8x Parallel Processor
  • Data unit 32-bit datapath, ALU, multiplier,
    etc.
  • 2 Independent Address units global and local
  • Single cycle on-chip memory access (no conflict)
  • Single cycle load/store of byte, halfword, and
    word
  • Internal adder can offload data unit computation
  • Program flow control unit
  • 3-stage pipelining
  • Instruction fetch
  • Address generation, and
  • Operation execution
  • Supports conditional operation of data unit
    operations, moves, load from memory and branches
  • PC is mapped into register file
  • To minimize overhead Loop controller supports 3
    levels of nested loops

8
TMS320C8x Parallel Procesor Data Unit
  • Split 32-bit 3-input ALU Boolean and arithmetic
    operations
  • Split and rounded multiplier dual 8x816,
    16x1632
  • Flexible datapath barrel rotator, mask
    generator
  • Supports signed, unsigned and saturate arithmetic

9
TMS320C8x Parallel Procesor Data Unit
3-input ALU
  • Supports totally 512 operations Boolean 256
    Arith. 256
  • Boolean F0 (ABC) F1 (ABC) F2
    (ABC) F3 (ABC) F4 (ABC)
    F5 (ABC) F6 (ABC) F7 (ABC)
  • Arithmetic A f1(B,C) f2(B,C) 1
  • Example
  • AB1
  • (AB)(BC) Mask A and B by C and then add
  • A((BC) (-BC)) Multiple-byte AB
  • A-((BC) (-BC)) Multiple-byte A-B

10
TMS320C8x Parallel Processor Instruction Set
  • 64-bit opcode contains multiple subinstructions
    for
  • Data unit
  • Global address unit and
  • Local address unit
  • Ex d4d5d6gtgtd0 a8d7 d0(a0x1)

11
TMS320C8x Transfer Controller
  • Prioritizes, schedules, and transfers data cache
    between on- and off-chip memories
  • Handles data cache (on chip RAM) miss and
    instruction cache
  • Supports multidimensional data transfers
  • simple contiguous linear sequence up to 3D region
  • Memory interface supports a wide range of memory
    system
  • DRAM, SDRAM, Video RAM and SRAM

12
TMS320C8x Video Controller (c80 only)
  • Provides simultaneous control over two
    independent capture or display systems and frame
    grabber or frame buffer image storage
  • Dual-frame timers
  • Programmable timing and control registers
  • Programmable line interrupt to MP

13
TMS320C8x Development Tools
  • C-like compilers and assemblers for both master
    and parallel processor
  • Register allocator
  • identifies live and free registers
  • allows using variable names in assembly code and
  • assigns specific register to variable
  • Code compactor converts straight-line assembly
    codes into parallel codes
  • Optimization can be done by hand for
    time-critical parallel code

14
TMS320C8x Execution Time for 256-Point FFT
-C" indicates performance with the cache
pre-loaded - Benchmark results for the TMS320C80
are for one of the on-chip DSP processors
15
MPACT-R-2 Overview
  • VLIW CPU
  • Multimedia ISA
  • Hardware/Software relationship
  • Variety of high speed I/O interface

16
MPACT-R-2 CPU Datapath
  • Data size multiple of 9 bits
  • 512 72-bit register file with 4 read and 4 write
    ports
  • ALU1 - shift and align
  • ALU2 - add and logic
  • ALU3 - arithmetic and logic
  • ALU4 - stage 1 of multiplication
  • ALU5 - motion estimation
  • Full crossbar between ALU outputs, inputs,
    register read and write ports

17
MPACT-R-2 CPU Datapath
18
MPACT-R-2 Multimedia ISA
  • Issues two instruction pack of 72 bits every
    cycle
  • Data forwarding from one ALU to one another
  • Vector instruction (length upto 255)
  • Multimedia data byte of 9, 18, 27, and 36 bits
  • Supports signed , unsigned and saturating
    arithmetic
  • MPACT 2 includes single-precision FP for 3D
    graphics
  • Flow control branch, jump and calls
  • Special purpose instruction
  • Motion Estimation
  • IDCT
  • Butterfly FFT, etc.

19
MPACT-R-2 Hardware/Software Relationship
  • Requires a host x86 CPU
  • Mediaware- uses standard APIs
  • RM Resource Manager running under Windows
  • MRK MPACT real-time kernel
  • Nearest deadline scheduling algorithm
  • Interrupt-driven kernel with 4-us context switch
    time in the worst case

20
MPACT-R-2 Hardware/Software Relationship
21
MPACT-R-2 High speed I/O interface
  • PCI bus or AGP (Accelerated Graphics Port)
  • x86 Host CPU bus
  • 66 MHz gt 264 Mbytes/s
  • Rambus Memory Interface
  • 300 MHz bus (9-bit wide) on both edge600Mbytes/s
  • Requires 2-4 Mbytes
  • Display Controller
  • 24-bit RAMDAC
  • High resolution up to 1280x1024 24-bit or
    1600x1200 16-bit
  • Video Interface
  • Accepts NTSC and PAL format video or
  • DVD input through PCI or AGP
  • Programmable Peripheral I/O Interface
  • Supports connection to several devices

22
MPACT-R-2 Architecture trade-offs
  • High speed I/O to move data inout
  • No Data cache but large register file
  • multimedia data has poor locality
  • Based on standard APIs (Application Program
    Interface) of Microsoft Windows no proprietary
    API
  • Pin counts vs. high memory bandwidth/low latency
  • RDRAM is chosen
  • PC and Consumer market

23
Targeted Applications
  • Video - DVD, MPEG 1 2 decoding, H.261/263
  • Audio - Dolby AC-3, 3D Audio, MPEG Decode,
    Wavetable Synthesis
  • Graphics - 2D 3D acceleration
  • Communication
  • Fax/MODEM V.34, 56k
  • Desktop Videoconferencing
  • H.320 ISDN
  • H.324 on POTS (Plain Old Telephone System)

24
H.324 on TMS320C82 Overview
  • ITU-T H.324 Low-bit-rate multimedia
    teleconferencing on circuit-switched network
    includes
  • G.723 Audio coding at 5.3-6.4 kbps requires 18-20
    fixed-point MIPS
  • H.263 Video coding based on H.261 includes some
    enhancements
  • H.223 MUX/DEMUX control
  • H.245 Control protocol
  • V.34 Modem up to 33.6 kbps
  • Other related standards H.320 (ISDN), H.323
    (LAN), and H.310 (ATM/B-ISDN)

25
H.324 on TMS320C82 Overview
26
H.324 on TMS320C82 Task Partitioning
  • Video Processing (H.263)
  • Encoding
  • Pre-processing MP
  • Motion estimation PP0
  • DCT PP0
  • Decoding
  • Huffman or arithmetic decode, IDCT, etc. PP0
  • Post processing PP0
  • Audio Processing and AEC (Acoustic Echo
    Cancellation) - PP1
  • G.723
  • Encoding 22 MIPS
  • Decoding 3 MIPS
  • AEC LMS algorithm up to 64-ms echo 10MIPS
  • MODEM V.34 20 MIPS - PP1

27
H.324 on TMS320C82 Task Partitioning
28
Current Multimedia Processors
29
References
  • J. Golston, Single-chip H.324 video
    conferencing, IEEE Micro, August 1996, pp. 42-50
  • Texas Instrument, TMSC320C80 Data Sheet, 1997
  • http//www.ti.com/../sprs023b.pdf
  • P. Lapseley and G. Blalock, How to estimate DPS
    processor performance, IEEE Spectrum, July 1996,
    pp. 74-78
  • HTML file http//www.bdti.com/../wpeval.html
  • P. Kalapathy, Hardware-software interfacing on
    Mpact, IEEE Micro, March 1997, pp. 20-26
  • Presentation file http//infopad.eecs.berkeley.ed
    u/HotChips8/
  • Chromatic Research, MPACT2 Preliminary Data
    Sheet, Feb. 1998
  • http//www.mpact.com/../mpact2.pdf
  • Toshiba, TOSHIBA ANNOUNCES ITS NEXT-GENERATION
    MPACT MEDIA PROCESSOR, September 22, 1997
  • http//www.toshiba.com/taec/../to-628.htm

30
References
  • G. A. Slavenburg, The Trimedia TM-1 PCI VLIW
    Mediaprocessor, IEEE Hot Chips 8 Symposium on
    High-Performance Chips, Aug. 1996
  • http//infopad.eecs.berkeley.edu/HotChips8/
  • L. T.Nguyen, M. Mohamed, H. Park, Y. Pal, R.
    Wong, A. Qureshi, P. Psong, F. Valesco, H. D.
    Truong, C. Reader, Multi-media Signal Processor
    (MSP) Summary , IEEE Hot Chips 8 Symposium on
    High-Performance Chips, Aug. 1996
  • http//infopad.eecs.berkeley.edu/HotChips8/
  • D. Lindbergh, The H.324 multimedia communication
    standard, IEEE Communication Magazine, December
    1996, pp. 46-51
  • K. Rijkse, H.263 Video coding for low-bit-rate
    communication, IEEE Communication Magazine,
    December 1996, pp. 42-45

31
Useful links
  • CPU Information Center
  • http//infopad.eecs.berkeley.edu/CIC/
  • Microprocessor Report
  • http//www.chipanalyst.com/q/
  • Berkeley Design Technology Inc.
  • http//www.bdti.com/
  • Peter Pirschs research group
  • http//www.mst.uni-hannover.de/
Write a Comment
User Comments (0)
About PowerShow.com