Optical Microprocessor Chips - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Optical Microprocessor Chips

Description:

Optical surfaces can be connected with strong bonds using specific types of optical solutions. ... access - all memory takes the same time to access (UMA) vs. ... – PowerPoint PPT presentation

Number of Views:172

Avg rating:3.0/5.0

Slides: 38

Provided by: vpop

Category:

more less

Transcript and Presenter's Notes

Title: Optical Microprocessor Chips

1
Optical Microprocessor Chips

Mayank Bhatnagar

2
Understanding an Optical processor

Optical surfaces can be connected with strong
bonds using specific types of optical solutions.
Rigidity is 100 times higher than supper glue.
2. Once the surfaces are glued together, the
surface is treated with water via flow pump in
order to verify the connections.
3. Then the substrate of placed on the Opaline
hetero photonic crystal and then the layers are
transferred into the layer transfer bin.

3
Why Optical connections?

Optical transmission in and between microchips
has been considered superior to the traditional
electrical transmission technology.
The reason is that the electrical technology can
be slower and has a power cost that rises with
the distance traveled whereas, optical
transmission can be faster and has a fixed power
cost to connect to any point within the system.

4
Optical Microprocessors

Studies are being done on how one can use the
optical signals inside the processor to send the
information around at much higher speed.
Sun Microsystems, who has won 8.1 million dollar
over five years contract from military, believes
that they have the knowhow that necessary in
order to make these type of chips as they are
very sophisticatedly put together using optical
components.

5
Optical Microprocessors continued

The idea is to use electrical chip-to-chip
input/output technology to construct arrays of
low-cost chips into a single virtual macrochip.
This assemblage would perform as one very large
chip and would eliminate the conventional
soldered-chip interconnections.
Long connections across the macrochip would
leverage the low latency, high bandwidth and low
power of silicon-made optics.

6
Advantages of Optical Transmission over
Electrical Ones

Lower material cost
Lower cost of transmitters and receivers
Capability to carry electrical power as well as
signals (in specially-designed cables)

7
Disadvantages of Optical Transmission over
Electrical Ones

Optical Fibers are more difficult and expensive
to splice.
At higher optical powers, Optical Fibers and
other optics are susceptible to fiber fuse
wherein a bit too much light meeting with an
imperfection can destroy several meters per
second.
The installation of fiber fuse detection circuit
at the transmitter can break the circuit and halt
the failure to minimize damage.

8
Final Notes

IBM builds optical switch for multi-core chips
The new nanotech switch, which is 100 times
smaller than a human hair, is designed to enable
researchers to build future chips that will have
greater performance but use less energy.
They are also working on a project to shrink
supercomputers down to the size of a laptop by
replacing the electrical wiring that now connects
multiple cores inside a microprocessor with
pulses of light.
The company said the laptop supercomputers should
be ready by 2020. Calling it a "breakthrough" in
chip design.

9
Final Notes

They also said optical communication between the
cores would dramatically cut a processor's energy
needs and the heat it emits.
The new chips would require the energy needed to
power a light bulb, while today's supercomputers
need enough energy to power hundreds of homes,
the company noted.
http//www.linuxworld.com.au/index.php/id19819404
10
http//www.photonics.com/Content/ReadArticle.aspx?
ArticleID33505

10
Multiprocessors
11
Multiprocessor motivation

Many scientific applications take too long to run
on a single processor
These are parallel applications which consist of
loops which operate on independent data. Need
multiprocessor machine with each loop iteration
running on a different processor, operating on
independent data
Many multi-user environments require more compute
power than available from a single processor
machine (airline reservation system, inventory
system, file server) These consist of largely
parallel transactions which operate on
independent data.
Multiprocessor machines should not be confused
with multi-core processors, although some
functionality is similar.

12
Multiprocessor performance

They assure high throughput for independent tasks
Or a single program can be running on several
processors (parallel processing) but programming
is more difficult and there is no portability of
code.
The processors can be on a single bus or can be
connected on a LAN (up to 256 processors).
Which is better?

13
Multiprocessor examples

Multiprocessors can be found in low-end PCs such
as dual-processor Xeons or Macs

14
Multiprocessor history
15
Multiprocessor history
Sun Fire x4150 1U server
16
Multiprocessor history
Sun Fire x4150 1U server
4 cores each
16 x 4GB 64GB DRAM
17
I/O System Design Example

Given a Sun Fire x4150 system with
Workload 64KB disk reads
Each I/O op requires 200,000 user-code
instructions and 100,000 OS instructions
Each CPU 109 instructions/sec
FSB 10.6 GB/sec peak
DRAM DDR2 667MHz 5.336 GB/sec
PCI-E 8 bus 8 250MB/sec 2GB/sec
Disks 15,000 rpm, 2.9ms avg. seek time,
112MB/sec transfer rate
What I/O rate can be sustained?
For random reads, and for sequential reads

18
Design Example (cont)

I/O rate for CPUs
Per core 109/(100,000 200,000) 3,333
8 cores 26,667 ops/sec
Random reads, I/O rate for disks
Assume actual seek time is average/4
Time/op seek latency transfer 2.9ms/4
4ms/2 64KB/(112MB/s) 3.3ms
303 ops/sec per disk, 2424 ops/sec for 8 disks
Sequential reads
112MB/s / 64KB 1750 ops/sec per disk
14,000 ops/sec for 8 disks

19
Design Example (cont)

PCI-E I/O rate
2GB/sec / 64KB 31,250 ops/sec
DRAM I/O rate
5.336 GB/sec / 64KB 83,375 ops/sec
FSB I/O rate
Assume we can sustain half the peak rate
5.3 GB/sec / 64KB 81,540 ops/sec per FSB
163,080 ops/sec for 2 FSBs
Weakest link disks
2424 ops/sec random, 14,000 ops/sec sequential
Other components have ample headroom to
accommodate these rates

20
Questions

How do parallel processors share data?
single address space - communication through lw
and sw which needs synchronization
uniform memory access - all memory takes the same
time to access (UMA) vs.
Non-uniform memory access (NUMA) which scales to
larger sizes up to 256 processors
private memory - communication through message
passing up to 256 processors
How do parallel processors coordinate?
synchronization (locks, semaphores),built into
send/receive primitives, operating system
protocols
How are they implemented? connected by a single
bus, or connected by a network

21
Multiprocessors on a single bus

Up to 32 processors can share a single bus.
Each processor has its own cache, but share same
memory space
Each cache stores the same data. This reduces
latency and reduces bus traffic.
They communicate through shared memory and have
UMA
Use a single copy of the OS
But difficult to scale to large number of
processors.

22
Shared memory multi-processors

Major design issues is cache coherency ensuring
that stores to cached data are seen by other
processors
Coherent reading If a cache misses, another
cache can supply the data,
Coherent writing when one processor writes data
into a shared block, all other copies of that
block located in other caches need to either be
invalidated or updated (depending on protocol).
Synchronization the coordination among
processors accessing shared data
Memory consistency definition of when a
processor must observe a write from another
processor

23
Cache coherency problem

Two write-back caches becoming incoherent

(1) CPU 0 reads block A
(2) CPU 1 reads block A
(3) CPU 0 writes block A
24
Snooping

Cache controllers need to monitor or snoop on
the bus to see if their cache has a copy of the
block being written to by another CPU- a snoop
tag.
Each cache has a duplicate of the address bits
and a second read port on the bus.

25
Two Snooping Protocols

Write-invalidate protocol
Processor has exclusive data access before the
write operation to a shared block.
Before the write the CPU sends an invalidation
signal to all other caches gt they will miss on
next read
Most common protocol, with reduced bus traffic
which allows more processors on the bus.
Write-update (Write-broadcast)
Processor continuously sends updated copies of
writes to all other caches
Has the advantage of reduced latency
Very infrequent - high bandwidth requirements due
to large bus traffic.

26
Write-Invalidate Protocol

Simultaneous writes - the bus arbiter decides
which processor is allowed
First CPU to obtain the bus will invalidate the
line in the cache of the other one
Then the second CPU does the same to the first.
Write doesnt complete until the bus access is
obtained
How do we locate the data on cache miss?
In write-through caches memory
In write-back more tricky, so we will deal with
this in more detail (MESI protocol)
Write with no interleaved activity by other CPUs
very efficient (no bus activity)

27
Cache coherence problem revisited
(1) CPU 0 reads block A
(2) CPU 1 reads block A
(3) CPU 0 invalidates
(4) CPU 0 writes block A
28
Multiprocessors on a network

Single-bus multiprocessor architecture has limits
in terms of number of allowed processors due to
limited bus and memory bandwidths.
Solution is to have more than one bus - a network
Network can connect to memory, which is
physically distributed

29
Multiprocessors on a network

Network can connect above memory (Sun E10,000)
Shared memory machines connected together over a
network operate as a distributed memory (or DSM
machine)

30
Distributed memory

Distributed memory is the opposite of centralized
memory.
It can have a single address space (called shared
memory), or each processors can have its own
memory address space (called private memory)
In the case of shared memory communication is
done through loads and stores
In the case of private memory communication is
done through message passing (send and receive)
used to access another processors memory

31
Shared Memory

Non-uniform memory access (NUMA) shared memory
multiprocessors
All memory can be addressed by all processors,
but access to a processors own local memory is
faster than access to another processors remote
memory
Looks like a distributed machine, but
interconnection network is usually
custom-designed switches and/or buses
Commodity hardware of a distributed memory
multiprocessor, but all processors have the
illusion of shared memory
Operating system handles accesses to remote
memory transparently on behalf of the
application
Relieves application developer of the burden of
memory management across the network

32
Characteristics of multiprocessor computers
Name Number Memory
Communication Topology
of processors size BW/link
.

Cray T3E 2048
524 GB 1200 MB/sec
3-D torus HP/Convex 64
65 GB 980 MB/sec
8-was crossbar SGI Origin 128
131 GB 800 MB/sec
ring SUN Enterprise 64
65 GB 1600 MB/sec 16-way
crossbar
33
Cache coherency for single address space

Since there are multiple busses, snooping will
not work - we need an alternative - use of
directories
The directory keeps the state of every block in
memory, including the sharing status of that
block.
The directory sends over the network explicit
messages to every processor whose cache has that
data.
There are two levels of coherence - at the cache
level the original data is in memory and
replicated in the caches that need it.

34
Cache coherency for single address space

The second level of coherence is at the memory
level
Requires more hardware and the OS takes care of
moving data at the page level.

Miss penalties are very large, since data needs
to be brought over the network.
However, by moving pages, the miss rate is
reduced (due to co-located data)

35
Message Passing

For machines with private memories (each CPU has
its own memory and cache)
Message passing over a network is used in
clusters (discussed next)
Good for parallel programming techniques
Using MPI (Message Passing Interface)
Visible to the programmer

Example - Sum 100,000 numbers with a
network-connected multiprocessor with 100
processors using multiple private memories
Steps
Distribute 100 subsets for partial sums
Do partial sums on each processor
Split CPUs in half, one side sends, one side
receives and adds

36
Clusters

Connect several (or several hundred)
off-the-shelf computers over network
Strengths - Cheaper, available all the time,
expandable
Can achieve very good performance
Most of the time good enough
Since each machine has its own copy of the OS, it
is much easier to isolate in case of failure

Weaknesses compared to bus multi-processors are
System administration costs are higher since
there are n independent machines
The bus is slower (I/O bus is slower than
backplane bus)
Smaller memory

Applications where cost/performance is important
use hybrid clusters of multiprocessors

37
Characteristics of clusters vs. multiprocessors
Multiprocessor Number Memory
Communication Topology Name of
processors size BW/link
.

Cray T3E 2048
524 GB 1200 MB/sec
3-D torus HP/Convex 64
65 GB 980 MB/sec
8-was crossbar SGI Origin 128
131 GB 800 MB/sec
ring SUN Enterprise 64
65 GB 1600 MB/sec 16-way
crossbar
Cluster Number Memory
Communication Node type and Name of
processors size BW/link
number
.

Tandem NonStop 4096 1,048
GB 40 MB/sec 16-way SMP, 256 IBM
RS6000 SP2 512 1,048 GB
150 MB/sec 16-way node, 32 IBM RS6000
R40 16 4 GB
12 MB/sec 8-way SMP, 2 SUN Enterprise
60 61 GB
100 MB/sec 30-way SMP, 2 Cluster

Write a Comment

User Comments (0)