Title: Giga-Scale%20System-On-A-Chip%20International%20Center%20on%20System-on-a-Chip%20(ICSOC)
1Giga-Scale System-On-A-ChipInternational Center
on System-on-a-Chip (ICSOC)
- Jason Cong
- University of California, Los Angeles
- Tel 310-206-2775, Email cong_at_cs.ucla.edu
- (Other participants are listed inside)
2Project Summary
- Develop new design methodology to enable
efficient giga-scale integration for
system-on-a-chip (SOC) designs - Project includes three major components
- SOC synthesis tools and methodologies
- SOC verification, test, and diagnosis
- SOC design driver network processor
3Research Team by Institutions
- US
- UCLA Jason Cong
- UC Santa Barbara Tim Cheng
- Taiwan
- NTHU Shi-Yu Huang, Tingting Hwang, J. K. Lee,
Youn-Long Lin, C. L. Liu, Cheng-Wen Wu, Allen Wu - NCTU Jing-Yang Jou
- China
- Tsinghua Univ. Jinian Bian, Xianlong Hong, Zeyi
Wang, Hongxi Xue - Peking Univ. Xu Cheng
- Zhejiang Univ. Xiaolang Yan
4Current Research Team
- US
- UCLA Jason Cong
- UC Santa Barbara Tim Cheng
- Taiwan
- NTHU Shi-Yu Huang, Tingting Hwang, J. K. Lee,
Youn-Long Lin, C. L. Liu, Cheng-Wen Wu, Allen Wu - NCTU Jing-Yang Jou
- China
- Tsinghua Univ. Jinian Bian, Xianlong Hong, Zeyi
Wang, Hongxi Xue - Peking Univ. Xu Cheng
- Zhejiang Univ. Xiaolang Yan
- Several new faculty members in the 7 institutions
- Guest members from National University of
Singapore, Purdue Univ., and UCLA (EE Dept)
5Thrust 1 -- SOC Synthesis Environment/Methodology
(Led by Jason Cong)
VHDL/C Co-Simulation
Design Spec VHDL/C
Design Partitioning
Code Generation for Retargetable Compiler and
Assembler Generator
DSP Synthesis and Optimization
FPGA Synthesis and Technology Mapping
Embedded Processors
DSPs
Embedded FPGAs
Customized Logic
6Interconnect Bottleneck in Nanometer Designs
- 2nd challenge Single-cycle full chip
synchronization is no longer possible - Not supported by the current CAD toolset
- About to happen soon
- ITRS01 0.07um Tech
- 5.63 G Hz across-chip clock
- 800 mm2 (28.3mm x 28.3mm)
- IPEM BIWS estimations
- Buffer size 100x
- Driver/receiver size 100x
- On semi-global layer (tier 3)
- Can travel up to 11.4 mm in one cycle
- Need 5 clock cycles from corner to corner
7Regular Distributed Register Architecture (2)
- Use register banks
- Registers in each island are partitioned to k
banks for 1 cycle, 2 cycle, k cycle
interconnect communication in each island - Highly regular
8MCAS Placement-Driven Architectural Synthesis
Using RDR Architecture
9Experimental Results (3)
- MCAS basic flow vs. Synopsys Behavioral Compiler
(on Virtex-II)
- Synopsys Behavioral Compiler setting default
(optimizing latency) - Average latency ratio of MCAS vs. BC 69
Latency
Resource
10Optimality Study of Large-Scale Circuit Placement
- Construction of Placement Example with Known
Optimal (PEKO) C. Chang et al, 2003
?
11High Interest in the Community
- Two EE Times articles coverage
- Placement tools criticized for hampering IC
designs Feb03 - IC placement benchmarks needed, researchers say
April03 - More than 60 downloads from our website
- Cadence, IBM, Intel, Magma, Mentor Graphics,
Synopsys, etc - CMU, SUNY, UCB, UCSB, UCSD, UIC, UMichgan,
UWaterloo, etc - Used in every placement since its publication
http//ballade.cs.ucla.edu/pubbench
121. Synthesis Verification
- Hardware/Software Partition
- Propose a SSS based H/S partition algorithm
(ASICON2003) - better solution than SA and less runtime than
Tabu - High-level Synthesis
- Re-synthesis algorithm after floorplanning for
timing optimization (ASICON2003) - Based on initial scheduling do floorplanning
- After floorplanning do re-scheduling and
re-allocation by force-balance method - Controller Synthesis
- A Heuristic State Minimization Algorithm For
Incompletely Specified Finite State Machine
(ASICON2003, JCST)
132. Floorplanning Interconnect Planning
- Based on proposed Corner Block List (CBL)
representation propose several Extended Corner
Block List, ECBL, CCBL and SUB-CBL to speed up
floorplanning and handle more complicate L/T
shaped and rectilinear shaped blocks. - Propose floorplanning algorithms with some
geometric constraints, such as boundary,
abutment, L/T shaped blocks. - Propose integrated floorplanning and buffer
planning algorithms with consideration of
congestion . - Using research results from UCLA on interconnect
planning - About 30 papers published in DAC, ICCAD, ISPD,
ASPDAC, ISCAS and Transactions.
14 3. P/G Network Analysis Optimization
- Propose an Area Minimization of Power
Distribution Network Using Efficient Nonlinear
Programming Techniques (ICCAD2001, accepted by
IEEE Trans. On CAD) - Propose a decoupling capacitance optimization
algorithm for Robust On-Chip Power Delivery
(ASPDAC2004, ASICON2003)
4. Global Routing Special Routing
- Propose several congestion, timing, and both
timing - and congestion optimization global routing
algorithms - Papers were published in ASPDAC, ISCAS, and IEEE
Transactions.
155. Parasitic R/L/C Etraction
- 3-D R/C Extraction using Boundary Element Method
(BEM) - Quasi-Multiple Medium (QMM) BEM algorithms
- Hierarchical Block BEM (HBBEM) technique
- Fast 3-D Inductance Extraction (FIE)
- Papers were published in ASPDAC, ASICON and IEEE
Transaction on MTT
16Thrust 2 -- SOC Verification, Test, and
Diagnosis(Led by Tim Cheng)
Verification and Testing
Enabling techniques for semi-formal functional
verification
Testing and diagnosis for heterogeneous SOC
Self-testing using on-chip programmable components
Self-testing for on-chip analog/mixed-signal
components
Automatic/semi-automatic functional vector
generation from HDL code
Scalable constraint-solving techniques
Integrated framework for simulation, vector
generation and model checking
New test techniques for deep-submicron embedded
memories
17Key Results - Verification
- Developed and released ATPG-based SAT solvers for
circuits (Univ. of California, Santa Barbara) - Integrating structural ATPG and SAT techniques
with new conflict learning - CSAT Fast combinational solver (released on
March 2003) - Demonstrated 10-100X speedup over
state-of-the-art SAT solvers on industrial test
cases (reported by Intel and Calypto) - Has been integrated into Intels FV verification
system and a startups verification engine - Publications DATE2003 and DAC2003
- Satori2 Fast sequential solver (released on Dec.
2003) - Demonstrated 10X-200X speedup over a commercial,
sequential ATPG engine on public benchmark
circuits - Publications ICCAD2003, HLDVT2003 and ASPDAC2004
18Key Results - Testing
A new Statistical Delay Testing and Diagnosis
framework consisting of five major components
(UCSB)
- Statistical timing analysis
- Statistical critical path selection
DAC02,ICCAD02 - Selecting statistical long true paths whose
tests maximize detection of parametric failures - Path coverage metric ASPDAC03
- Estimating the quality of a path set
- Selection/Generation of high quality tests for
target paths ITC01DATE 2004 - Identifying tests that activate longer delay
along the target path - Delay fault diagnosis based on statistical timing
model DATE03, VTS03, DAC03 - Ref Krstic, Wang, Cheng, Abadir, DATE03Best
Paper Award in Test
19Key Results - Testing
- On-Chip Jitter Extraction for Bit-Error-Rate
(BER) Testing of Multi-GHz Signal (UCSB) - Using on-chip, single-shot measurement unit to
sample signal periods for spectral analysis - Demonstrated, through simulation, accurate
extraction of multiple sinusoids and random
jitter components for a 3GHz signal - Publications ASPDAC2004 and DATE2004
20Thrust 3 Design Driver Network Security
Processor (Led by Prof. C. W. Wu)
- Applications IPSec, SSL, VPN, etc.
- Functionalities
- Public key RSA, ECC
- Secret key AES
- Hashing (Message authentication) HMAC
(SHA-1/MD5) - Truly random number generator (FIPS 140-1,140-2
compliant) - Target technology 0.18?m or below
- Clock rate 200MHz or higher (internal)
- 32-bit data and instruction word
- 10Gbps (OC192)
- Power 1 to 10mW/MHz at 3V (LP to HP)
- Die size 50mm2
- On-chip bus AMBA (Advanced Microcontroller Bus
Architecture)
21Encryption Modules (PKEM)
- Public key encryption module
- Operations
- 32-bit word-based modular multiplication
- Multiplication over GF(p) and GF(2m)
- An RSA cryptography engine with small area
overhead and high speed - Scalable word-width
- TSMC 0.35µm
- 34K gates (1.71.8 mm2 )
- 100MHz clock
- Scalable key length
- Throughput
- 512-bit key 1.79Kbps/MHz
- 1024-bit key 470bps/MHz
22Encryption Modules (SKEM)
- Secret key encryption module
- Operations
- Matrix operations, manipulation
- AES cryptography
- 32-bit external interface
- 58K gates
- Over 200MHz clock
- Throughput 2Gbps
- Support key length of 128/192/256 bits
Technology TSMC 0.25?m CMOS
Package 128CQFP
Core Size 1,279 x 1,271 ?m2
Gate Count 63.4K
Max. Freq. 250MHz
Throughput 2.977 Gbps (128-bit key) 2.510 Gbps (196-bit key) 2.169 Gbps (256-bit key)
23Journal Publications
- C.-T. Huang and C.-W. Wu, High-speed easily
testable Galois-field inverter'', IEEE Trans.
Circuits and Systems II Analog and Digital
Signal Processing, vol. 47, no. 9, pp. 909-918,
Sept. 2000. - S.-A. Hwang and C.-W. Wu, Unified VLSI systolic
array design for LZ data compression'', IEEE
Trans. VLSI Systems, vol. 9, no. 4, pp. 489-499,
Aug. 2001. - C.-H. Wu, J.-H. Hong, and C.-W. Wu, VLSI design
of RSA cryptosystem based on the Chinese
Remainder Theorem'', J. Inform. Science and
Engineering, vol. 17, no. 6, pp. 967-979, Nov.
2001. - J.-H. Hong and C.-W. Wu, Cellular array modular
multiplier for the RSA public-key cryptosystem
based on modified Booth's algorithm'', IEEE
Trans. VLSI Systems, vol. 11, no. 3, pp. 474-484,
June 2003. - C.-P. Su, T.-F. Lin, C.-T. Huang, and C.-W. Wu,
A high-throughput low-cost AES processor'',
IEEE Communications Magazine, vol. 41, no. 12,
pp. 86-91, Dec. 2003.
24Conference Publications
- J.-H. Hong and C.-W. Wu, Radix-4 modular
multiplication and exponentiation algorithms for
the RSA public-key cryptosystem'', in Proc. Asia
and South Pacific Design Automation Conf.
(ASP-DAC), Yokohama, Jan. 2000, pp. 565-570. - J.-H. Hong, P.-Y. Tsai, and C.-W. Wu,
Interleaving schemes for a systolic RSA
public-key cryptosystem based on an improved
Montgomery's algorithm'', in Proc. 11th VLSI
Design/CAD Symp., Pingtung, Aug. 2000, pp.
163-166. - C.-H. Wu, J.-H. Hong, and C.-W. Wu, An RSA
cryptosystem based on the Chinese Remainder
Theorem'', in Proc. 11th VLSI Design/CAD Symp.,
Pingtung, Aug. 2000, pp. 167-170. - C.-H. Wu, J.-H. Hong, and C.-W. Wu, RSA
cryptosystem design based on the Chinese
Remainder Theorem'', in Proc. Asia and South
Pacific Design Automation Conf. (ASP-DAC),
Yokohama, Jan. 2001, pp. 391-395. - Y.-C. Lin, C.-P. Su, C.-W. Wang, and C.-W. Wu,
A word-based RSA public-key crypto-procesoor
core'', in Proc. 12th VLSI Design/CAD Symp.,
Hsinchu, Aug. 2001. - T.-F. Lin, C.-P. Su, C.-T. Huang, and C.-W. Wu,
A high-throughput low-cost AES cipher chip'',
in Proc. 3rd IEEE Asia-Pacific Conf. ASIC,
Taipei, Aug. 2002, pp. 85-88. - Y.-T. Lin, C.-P. Su, C.-T. Huang, C.-W. Wu, S.-Y.
Huang, and T.-Y. Chang, Low-power embedded
memory architecture design for SOC'', in Proc.
13th VLSI Design/CAD Symp., Taitung, Aug. 2002,
pp. 306-309. - M.-C. Sun, C.-P. Su, C.-T. Huang, and C.-W. Wu,
Design of a scalable RSA and ECC
crypto-processor'', in Proc. Asia and South
Pacific Design Automation Conf. (ASP-DAC),
Kitakyushu, Jan. 2003, pp. 495-498, (Best Paper
Award). - C.-P. Su, T.-F. Lin, C.-T. Huang, and C.-W. Wu,
A highly efficient AES cipher chip'', in Proc.
Asia and South Pacific Design Automation Conf.
(ASP-DAC), Kitakyushu, Jan. 2003, pp. 561-562,
(Design Contest Special Feature Award). - J.-H. Hong, C.-L. Liu, B.-Y. Tsai, and C.-W. Wu,
A radix-4 modular multiplier for fast RSA
public-key cryptosystem'', in Proc. 14th VLSI
Design/CAD Symp., Hualien, Aug. 2003, pp.
553-556. - M.-Y. Wang, C.-P. Su, C.-T. Huang, and C.-W. Wu,
An HMAC processor with integrated SHA-1 and MD5
algorithms'', in Proc. Asia and South Pacific
Design Automation Conf. (ASP-DAC), Yokohama, Jan.
2004 (to appear).