Vector computers - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Vector computers

Description:

... machine costing $30 milion + A device to turn a compute-bound problem into an I/O bound problem Any machine designed by Seymour Cray ... The Cray SV1 can ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 25
Provided by: Lapt3193
Category:

less

Transcript and Presenter's Notes

Title: Vector computers


1
Vector computers
2
Supercomputer
  • Definition of a supercomputer
  • Fastest machine in the world at given task
  • Any machine costing 30 milion
  • A device to turn a compute-bound problem into an
    I/O bound problem
  • Any machine designed by Seymour Cray ?
  • In 70s, 80s, Supercomputer ? Vector machine

3
First Vector Computers / Processors
  • CDC STAR-100, TI ASC (1972)
  • Memory-memory vector processors
  • High start-up overhead
  • Relatively slow scalar units (underestimation of
    Amdahls Law)
  • Cray-1 (1976)
  • Vector-register vector processor (lower start-up
    overhead, reduced bandwidth requirements)
  • Fastest scalar processor in the world at that
    time
  • Vector chaining support

4
Vector ComputersMemory-memory vector computers
  • CDC CYBER 205 (1981)
  • Memory-memory architecture
  • Four lanes with multiple functional units
  • Wide load-store pipeline
  • Support for nonunit stride memory accesses and
    sparse vectors
  • ETA-10 (CDC, late 80s)
  • 10 processors
  • Each supporting the memory-memory architecture
  • Last significant memory-memory design

5
Vector ComputersVector-register vector processors
K. Asanovic. "Vector processors, Appendix G in
Computer Architecture A Quantitative Approach.
6
Vector ComputersVector-register vector processors
K. Asanovic. "Vector processors, Appendix G in
Computer Architecture A Quantitative Approach.
7
Vector ComputersMemory-memory vs vector-register
  • Memory-memory vector computers
  • Operands fetched directly from the main
  • Results written directly to the memory
  • Vector-register vector computers
  • Vector elements read from the memory into the
    register by a LOAD VECTOR operation
  • All arithmetic and logic operations are
    register-register operations
  • Results of vector operations are put into vector
    registers and may be stored back in memory by a
    STORE VECTOR operation

8
Vector ComputersMemory-memory vs vector-register
  • Memory-memory architecture
  • Requires greater bandwidth
  • Unables easy reuse of intermediate results
  • Makes difficult to overlap multiple vector
    operations
  • Start-up time is significantly increased due to
    cost of memory accesses
  • Becomes more efficient for very long vectors
  • Vector-register architecture
  • Free of disadavantages of memory-memory machines
  • Experience has shown that shorter vectors are
    more commonly used

9
Vector computersMemory bandwidth latency
  • Memory access latency adds to the start-up cost
    of fetching a vector from memory
  • Assuring sustainable sufficient bandwidth
    requires special memory organization into
    multiple memory banks
  • Additional problems arise when the memory is
    accessed in an irregular pattern (very typical
    for various matrix based computations)

10
Vector ComputersSimplified general structure of
a vector-register vector computer
Data (vectors)
Main memory
External memory
Vector transfer control and address generator
Vector registers (local memory)
Data
Address parameters
Data
Vector operation control
Functions Status
Pipelined functional units
Data (scalars)
Vector processor
Data
Scalar processor
Scalar instructions
Vector instructions
Instruction processor
Instructions
11
Vector ComputersCray-1
  • Main features of a classical vector-register
    vector computer
  • Load/Store Architecture
  • Vector Registers
  • Vector Instructions
  • Hardwired Control
  • Highly Pipelined Functional Units
  • Interleaved Memory System
  • (16 banks, 4 cycle busy time, 12 cycle latency)
  • No Data Caches
  • No Virtual Memory

12
Basic Cray-1 architecture
13
Vector computersVector instructions
  • ai f1 ( bi )
  • sine, cosine, square root,
  • scalar f2 ( A )
  • sum, maximum,
  • ai f3 ( bi ci )
  • add, subtract,
  • ai f4 ( scalar ci )
  • multiply vector by scalar,
  • It is possible to combine the above operations

14
Vector computersVector instruction set advantages
  • Compact
  • One short instruction encodes N operations (may
    be an equivalent to an entire loop)
  • Expressive
  • Each instruction tells hardware that these N
    operations
  • are independent
  • use the same functional unit
  • access disjoint registers
  • access registers in the same pattern as previous
    instructions
  • access a contiguous block of memory (unit-stride
    load/store)
  • access memory in a known pattern (strided
    load/store)
  • Scalable
  • The same object code can be run on more parallel
    pipelines or lanes

15
Vector computersStripmining
Theoretical throughput as a function of vector
length. What happens when a vector length exceeds
the size of vector Registers?
16
Vector computersStripmining
Performance of Spert-II system on dot product
with unit-stride operands. K. Asanovic, Vector
microprocessors. (32 vector registers)
17
Vector computersVector chaining
  • Example y axi yi

a
ax11
ax10
ax9
ax8
ax7
ax6
ax5
,x13 ,x12
ax3y3
ax2y2
ax1y1
,y5 ,y4
Performance of Cray-1 was almost doubled with the
use of vector chaining, from 80 Mflops to 153
Mflops.
18
Vector computersScatter and gather
  • Sometimes, only certain elements of a vector are
    needed in a computation
  • If the elements to be used are in a
    regularly-spaced pattern, the spacing between the
    elements to be gathered is called stride
  • Example
  • Elements extracted
  • x1, x5, x9, x13, , x4floor((n-1)/4)1
  • from a vector
  • x1, x2, x3, x4, x5, x6, x7, x8, , xn
  • with a stride equal to 4

19
Vector computersScatter and gather
  • Scatter and gather operations may be also used
    with irregularly-spaced data
  • Example operation gather

1
3
4
7
a1
a2
a3
a4
a5
a6
a7
a8
a1
a3
a4
a7
20
Vector computersCompress and expand
  • Scatter and gather operations may be also used
    with irregularly-spaced data
  • Example operation compress

1
0
1
1
0
0
1
0
a1
a2
a3
a4
a5
a6
a7
a8
a1
a3
a4
a7
21
Vector computersVector conditional execution
  • Vectorization of a loop with a conditional code

for (i0 iltN i) if (Aigt0) then
Ai Bi else Ai Ci
  • Use of vector mask register (1bit per element)

lv vA, rA Load A vector mgtz m0, vA Set
bits in mask register m0 where Agt0 lv.m vA, rB,
m0 Load B vector into A under mask fnot m1, m0
Invert mask register lv.m vA, rC, m1 Load
C vector into A under mask sv vA, rA Store A
back to memory (no mask)
22
Vector computersVector conditional execution
5
0
1
0
0
2
3
4
lv vA, rA mgtz m0, vA lv.m vA, rB, m0 fnot m1,
m0 lv.m vA, rC, m1 sv vA, rA
Source A
1
0
1
0
0
1
1
1
m0
B1
B2
B3
B4
B5
B6
B7
B8
B
1
0
1
0
0
1
1
1
m0
B1
B3
B6
B7
B8
C2
C4
C5
Result A
0
1
0
1
1
0
0
0
m1
C1
C2
C3
C4
C5
C6
C7
C8
C
23
Vector computersPrograming vector computers
  • Assembly language programming
  • Libraries
  • Data-parallel languages
  • Support for data-parallel operations as an
    inherent part of the langauge (intrinsic
    operators and functions)
  • Fortran 90, High Performance Fortran
  • Vectorizing compilers
  • Extensive loop dependencies analysis

24
Vector computersVector processing applications
  • Problems that can be efficiently formulated in
    terms of vectors
  • Long- range weather forecasting
  • Petroleum explorations
  • Seismic data analysis
  • Medical diagnosis
  • Aerodynamics and space flight simulations
  • Artificial intelligence and expert systems
  • Mapping the human genome
  • Image processing
Write a Comment
User Comments (0)
About PowerShow.com