Title: PCI Bus Introduction
1PCI Bus Introduction
- Intel released PCI Bus Specification Ver 1.0 in
1992 as an alternative to VL-Bus - PCI Bus now at Version 3.0 (April 2004)
- PCI Bus now controlled by the PCI Special
Interests Group (PCI SIG) - hundreds of members (04/2004 860)
- The steering committee includes Intel and AMD,
Compaq and Gateway and Peripheral Manufacturers
like 3Com.
2PCI Background
- Why not attach peripherals directly to the CPU
Bus i.e. a true local bus? - Next Generation Processor would require an
updated local bus - Bus Loading would limit the local bus to ONE
device (could be improved upon by using buffers). - CPU local bus could not be a separate bus, which
could allow device-to-device transfer
independently of the CPU.
3PCI Ver 2.1 Features (1)
- Processor Independence
- Decoupling of processor (Host Bus) and PCI
Expansion Bus by means of a Bus Bridge - 10 loads approx but 256 functional devices
- Speed
- Rev. 2.0 0MHz -gt 33MHz
- Rev. 2.1 0MHz -gt 66MHz
- 32-bit transfers 133MB/s at 33MHz
- 64-bit data transfers 266MB/s at 33MHz
4PCI Ver 2.1 Features (2)
- Supports multiprocessor systems
- Burst transfers of arbitrary length supported
- Low Pin count (shared address/data lines)
- PCI Initiator 49 Pins
- PCI Target 47 Pins
- Bus Mastering with Arbitration
- Software Configuration (PnP) with config
registers - Synchronous Bus
- Unique ID Numbers (like EISA/MCA)
5PCI ver 2.2 Additions
- PCI Hot-Plug
- Enables removal or replacement of adapter cards
without having to shut down the main system - PCI Power Management
- Allows OS to manage power usage of PCI cards
- Supports ACPI and Microsofts OnNow(ACPIAdvanced
Configuration and Power Interface)
6PCI Bus Block Diagram
7PC Architecture Bus structure
8PCI Bus Structure (1)
- Strict Decoupling of the Processor and Memory
Subsystems (Host) Bus from PCI Bus - Host/PCI Bridge (North Bridge) between Host Bus
and PCI Bus invisible to user - PCI units or devices are called PCI Agents
- All PCI Agents are connected to the PCI Bus
- PCI Agents should be on the Motherboard where
possible but can also be on an adapter
9PCI Bus Structure (2)
- Max of 3 Slots provided for PCI units
- Originally 2 Slots for Audio and Motion Video
- Interface to EISA/ISA/MCA Bus is a type of Bus
Agent sometimes called South Bridge - The EISA/ISA Expansion bus can be considered as a
type of PCI Agent - PCI Bus can connect up to 10 PCI agents
- PCI Bus-to-PCI Bus Bridges are also available
10PCI Bridge Block Diagram
11PCI Bus Data Transfers (1)
- Prefetch Buffers are for reads
- Posting Buffers store writes to post out on the
addressed bus later - PCI Initiator - Current Bus Master - initiates a
transfer - PCI Target - Slave - addressed by the initiator
- PCI Agents Initiator and Target are referred to
as Agents
12PCI Bus Data Transfers (2)
- Multiplexed PCI address/data bus reduces pin
count - Penalty is more clock cycles per transfer (2
cycles for a write or 3 cycles for a read) - First Cycle Address Phase
- Second Cycle Write Data
- Third Cycle Read Data
- At 33MHz Clock and 32-bit data bus, maximum
single write data rate 66MB/s, maximum single
read data rate 44MB/s
13PCI Bus Burst Mode (1)
- Speeds up data rate further because the address
is sent out only once - Sender and receiver both increase the addresses
internally so they only transfer data there is
no address phase. - Any number of transfer cycles can be carried out.
- Max Data Rate (Burst Mode) 133MB/s (32-bit bus,
33MHz) 266MB/s (64-bit bus, 33MHz)
14PCI Burst Mode (2)
- Unique Feature PCI Bridge independently forms
burst accesses - The PCI Bridge joins together single transfer
reads and writes to form burst accesses if the
addresses of individual accesses are sequential - Can make bursts even if an address is left out of
a possible sequence - e.g. DW0-DW1-DW3-DW4-DW5. PCI Bridge performs
DW0-DW1-DW2-DW3-DW4-DW5 but deasserts all BEX
signals for DW2 to make sure that no data are
actually transferred
15PCI Burst Mode (3)
16Some PCI Bus Signals (1)
- Frame
- Driven by current Initiator, indicates start
duration of a transaction - AD Bus (I/O)
- Multiplexed Address and Data Bus signals can be
32- or 64-bits wide. - C/BE3 - C/BE0 (32-bit bus) or C/BE7 - C/BE0
(64-bit bus) - Bus Command and Bus Enable signals are
transferred on these pins. During the address
phase, C/BE3 - C/BE0 indicate the type of bus
cycle.
17Some PCI Bus Signals (2)
- IRDY Initiator RDY signal
- Low the Initiator (Bus Master) is ready and can
complete the data transfer - Writes data is now valid, Reads initiator can
read the data. - TRDY Target Data RDY signal
- Low the addressed PCI unit (target) is ready so
current data phase can be completed - Only if IRDY and TRDY are BOTH active (low) at
the same time can the data transfer be completed. - DEVSEL Device Select
- Low indicates that the decode unit has identified
a PCI unit as the target of the bus operation
18PCI Bus Cycles
19Bus Arbitration (1)
- PCI Bus Arbitration is performed separately for
each access. - One Bus Master cannot hold up the bus between two
accesses unlike ISA/EISA/MCA - A PCI Burst represents a single arbitration but
can extend over many cycles - PCI Bus uses Hidden Arbitration to reduce time
taken to arbitrate - Arbitration happens in the background while
data transfers are going on its hidden
20Bus Arbitration (2)
- Arbitration signals REQ GNT
- Each Bus Master has its own REQ and GNT signals
(e.g. REQ0, GNT0 etc) they go to Central
Arbitration Logic - BM asserts REQ -gt 0 bus master requests the bus
- Arbitration Logic asserts GNT -gt 0 when it
grants the bus - Bus master then has 16 CLK cycles to start a
transfer, otherwise a timeout will occur
21PCI DMA?
- DMA not implemented as such no DREQ or DACK
signals - Bus Mastering and Bus Arbitration make DMA
unnecessary - PC/AT DMA Controller was on the Motherboard and
controlled DMA Access from an ISA slot to memory.
DMA Controller produced necessary bus control
signals. - PCI-based system the bus master on a PCI unit
can produce all bus control signals itself and
can interface to memory and I/O directly.
22PCI DMA? (2)
- DMA
- only arbitration is between CPU and DMA
- PCI system
- More flexible arbitration between a number of bus
masters - more complicated scheme
23Interrupts (1)
- PCI interrupts are optional i.e. not essential
for each unit - PCI interrupts must be active low and level
triggered - INTA is assigned to each PCI Unit
- Only multifunction PCI units can use INTB, INTC
and INTD
24Interrupts (2)
- PCI interrupts are formed in the PCI Bridge
- In PC systems each INTA interrupt source needs
to be associated with IRQ0-IRQ15 for legacy
reasons - For the PCI bus to function in a PC, the PCI
interrupts must be mapped to ISA interrupts - But IRQX INTA association is done through
software, not hardwired as on PC/AT. - PnP OS (gtWindows95) support PCI IRQ
steering(multiple PCI devices share same IRQ)
25Interrupts (3)
- ISA Bus needs 11 contacts for interrupt support
- PCI Bus needs 4 contacts for same functionality
26I/O Address Space and the PCI Bus
- I/O address space is used for PCI registers
- PC I/O addresses previously left for the
Motherboard used for PCI configuration registers. - Configuration registers are 32-bits wide and form
a gateway and an enable path to the Configuration
Area of PCI devices. - CONFIG_ADDRESS 0CF8h
- CONFIG_DATA 0CFCh
- These addresses do NOT conflict with any
EISA/ISA/MCA I/O addresses
27Configuring a PCI Unit
- Three ways of accessing PCI Config area
- Use I/O addresses at 0CF8h 0CFCh as
CONFIG_ADDRESS CONFIG_DATA - Use I/O addresses at 0CF8h 0CFAh to map I/O
address range 0C000h-0CFFFh directly onto the PCI
Config Area - Use BIOS Interrupt INT 1Ah
28PCI CONFIG_ADDRESS
The bus field - can have several PCI buses in a
hierarchy Bus 0 closest to the CPU. The Unit
field selects one of 32 possible PCI agents. The
Function Field selects one of 8 functions if
the PCI Unit is a multifunction device The
Register Field selects one of 64 possible double
word registers (256 bytes) in the Configuration
Area.
29PCI Configuration Address Space
- Every PCI Unit and separate function in a
multifunction unit has a 256-byte configuration
area - Corresponds to 64 x 32-bit registers
- First 64 bytes are a Fixed header Area.
Remaining 192 bytes are unit specific.
30PCI Configuration Area
31Basic and Subclass Codes
32PCI Connector Physical Bus Slot Configurations
- 5Volt PCI Cards 32 bits
- 5Volt PCI Cards 64 bits
- 3.3Volt PCI Cards 32 bits
- 3.3Volt PCI Cards 64 bits
- Universal PCI Cards 32 bits (5V 3.3V)
- Universal PCI Cards 64 bits (5V 3.3V)
3332-bit PCI(5V)
3464-bit PCI
35PCI Bus in non-Intel environments
36Elimination of ISA Bus
- ISA Bus is slow, hard to use and bulky
- Microsoft/Intel PC1999 and PC2001 promote the
elimination of ISA bus slots from new PC designs - ISA plug in cards to be replaced by either PCI
plug-in cards or USB add-on peripherals
37Mini PCI PCI-X
- Mini (or Small) PCI small form factor PCI
implementation for Laptops Notebooks - Uses keyed PC-Card or Cardbus Connctors
- Similar Performance to PCI
- PCI-X improved data transfer rate
- PCI-X 64-bit bus at frequencies up to 133MHz
- Theoretically capable of 1Gbyte/s transfer rate
- Easier to design than 66MHz PCI
38Future I/O and NGIO
- Future I/O Builds on PCI Standard
- Promoted by Computer Manufacturers
- Available 2001?
- Next Generation IO new serial type standard
proposed by Intel Sun - Aims to make Server design more modular
- Available 2001/2 ?
39Bus Performance
40Advanced Graphics Port (AGP)
- Even PCI Bus can become a bottleneck for 3D
Graphics - Problem is made worse because many devices can
compete for the bus - The Accelerated Graphics Port (AGP) interface is
a bus specification that enables high performance
graphics capabilities, especially 3D, for PCs
41AGP (2)
- The AGP Port is independent of the PCI bus
- AGP is an additional connection point in the
system - AGP is intended exclusively for visual display
devices all other I/O devices will remain on the
PCI bus. - AGP uses a new connector body which is not
compatible with the PCI connector - PCI and A.G.P. boards are not mechanically
interchangeable.
42AGP and PCI
43AGP in Multiprocessor Environment
44AGP In System
- AGP provides a high speed port to allow movement
of data between the PCs graphics controller and
system memory - The AGP interface is positioned between the PC's
chipset and graphics controller - Significantly increases the bandwidth available
to a graphics accelerator (current peak bandwidth
is 528 MB/s) - Future AGP systems should support a peak
bandwidth over 1 GB/s
45AGP Fast Lane for Graphics data
- AGP provides a high memory bandwidth "fast lane"
for graphics data - AGP enables the hardware-accelerated graphics
controller to execute texture maps directly from
system memory - instead of caching them in the relatively limited
local video memory - also helps speed the flow of decoded video from
the CPU to the graphics controller
46AGP Data Movement Diagram
47Data Movement without AGP
48Data Movement with AGP
49AGP System Benefits
- Graphics Controller can use smaller local video
memory - Reduces costs because video memory is dearer than
normal system memory. - The PCI Bus is relieved of a lot of high-speed
graphics data allowing other devices to achieve
greater throughput.
50AGP Data Transfer
- AGP max data transfer rate 533Mbyte/s at 66MHz
- Data transferred on both the rising and falling
edges of the 66 MHz clock - AGP also uses more efficient data transfer modes.
- 1Gbyte/s transfers allowed for in the AGP Spec.
- Allow four data transfers per 66MHz clock cycle.
51AGP Pipelining
- AGP overlaps the memory or bus access times for a
request ("n") with the issuing of following
requests ("n1"..."n2"... etc.) - In the PCI bus, request "n1" does not begin
until the data transfer of request "n" finishes - Both AGP and PCI can "burst" (transfer multiple
data items continuously in response to a single
request) - Bursting only partly alleviates the non-pipelined
nature of PCI - Depth of AGP pipelining is implementation
dependent and remains transparent to application
software
52Memory Latency AGP vs PCI
53AGP Sideband Addressing
- AGP used 8 extra "sideband" address lines which
allow the graphics controller to issue new
addresses and requests simultaneously - Even though data continues to move from previous
requests on the main 32 data/address lines.
54AGP Memory Mapping
- AGP memory is really dynamically allocated areas
of system memory that the graphics controller can
access quickly - Access speed comes form built-in hardware in the
Chipset, which translates addresses - Allows the graphics controller and its software
to see a contiguous space in main memory even
though the pages are disjointed - Graphics controller can access large data
structures (1Kbyte 128Kbyte) as a single entity
55AGP Memory Mapping GART
- Built-in Chipset hardware is called the GART
(Graphics Address Remapping Table) - For accesses to AGP memory, the graphics
controller and CPU use a contiguous aperture of
several megabytes - The GART translates these to various, possibly
disjointed, 4 KByte page addresses in system
memory - PCI devices that access to the AGP memory
aperture (for example, for live video capture)
also go through the GART.
56Summary Key benefits of AGP (1)
- Peak bandwidth four-times higher than the PCI bus
due to pipelining, sideband addressing, and data
transfers that occur on both rising and falling
edges of the clock - Direct execution of texture maps from system
memory. AGP enables high-speed direct access to
system memory by the graphics controller, rather
than forcing it to pre-load the texture data into
local video memory.
57Summary Key benefits of AGP (2)
- Less PCI bus congestion
- The PCI bus attaches a wide variety of I/O
devices - AGP operates concurrently with, and independent
from, most transactions on PCI - CPU accesses to system memory can proceed
concurrently with AGP memory reads by the
graphics controller. - Improved system concurrency Pentium II or III
processor can perform other activities while the
graphics chip is accessing texture data in system
memory.
58Chipsets
- Central to PCs
- Chipset is the motherboard
- Two boards with the same chipset are functionally
identical - Chipset determines
- which type of processor can be used
- how fast it will run
- how fast the buses will run
- speed/type/amount of memory
- etc
59Chipset Evolution
- Original PC/XT/ATs contained
- Clock generator 8284
- Bus controller 8288
- System timer 8253
- Interrupt controller(s) 8259
- DMA controller(s) 8237
- CMOS RAM/RTC
- Keyboard controller
- Lots of discrete glue logic (TTL) to complete the
motherboard circuit - TOTAL gt100 individual chips
60Chipset Evolution
- 1986 Chips and Technologies Inc. introduces
revolutionary 82C206 - 82C206
- single chip integrating all functions of the main
motherboard chips for AT-compatible systems - Processor 82C206 four other chips (buffers)
Complete motherboard circuit - based on it NEAT (New Enhanced AT) chipset
- later followed by SCAT (Single Chip AT) 82C836
61Chipset Evolution
- Chipset idea rapidly copied by other chip
manufacturers - Acer, Erso, Opti, Suntac, Symphony, UMC, VLSI
- gt1994 INTEL dominates market for
- processors
- chipsets
- motherboards
- eliminating the delay between introduction of
new processors and systems using them - Today Niche markets for Acer, VIA, SiS
62Intel Chipsets
- Intel Chipset Model Numbers
63Chipset Architectures
- Two distinct chipset architectures
- North/South Bridge Architecture(Intels earlier
chipsets) - Hub Architecture
- More recent 800 series chipsets use the hub
architecture
64North/South Bridge Architecture
- North Bridge Bridge between
- Processor bus (66-400MHz) and AGP (66-533MHz) /
PCI (33-66MHz) - South Bridge Bridge between
- PCI (33-66MHz) and ISA (8MHz)
- normally also contains IDE hard disk controller
and USB interfaces - Super I/O
- attached to the ISA bus. Contains commonly used
peripheral items combined in a single chip - may also contain CMOS RAM/Clock, IDE controllers,
etc
65Example North/South Bridge Architecture
66Hub Architectures
- Hub Architecture Blocks
- Memory Controller Hub (MCH) orGraphic Memory
Controller Hub (GMCH) - I/O Controller Hub (ICH)
- Firmware Hub (FWH)
67Advantages Hub Architecture
- Faster than North/South Bridge Architecture
- Hub interface is quad-clocked 266MB/s
- Reduced PCI Loading
- Hub interface (AHA bus) is independent of PCI and
does not dissipate PCI bandwidth for chipset or
Super I/O traffic. This improves PCI performance. - Reduced board wiring
- Although twice as fast as PCI, the hub interface
is only 8 bits wide and requires only 15 signals
to be routed on the motherboard. More economical.
Less noise. - Faster ATA/IDE and USB interfaces
- ATA/IDE and USB traffic bypasses PCI
68LPC Bus
- ICH provides a low-pin-count (LPC) bus
- 4 bits wide (only 13 signals total)with a maximum
bandwidth of 6.67MB/s - drastically reduces number of traces on
motherboard (compared to 96 traces for ISA) - mainly supports FWH and LPC I/O Controller
69Intel 815 Chipset
- Introduced June 2000
- Mainstream PC chipsets
- Integral video upgradable via an AGP 4x slot
- Support Celeron, Pentium III, etc.
- Support PC133 SDRAM (more affordable than RDRAM)
70Intel 815 Chipset
- Intel 815 Chipset Features
- 66/100/133 MHz system bus
- 266MB/s hub interface (AHA bus)
- ATA-100 or ATA-66 (100MB/s drive performance)
- PC100 or PC133 SDRAM
- Supports up to 512MB RAM
- Integrated Audio-Codec 97
- Low-power sleep modes
- 2..4 USB ports
- LPC Bus
- Elimination of ISA Bus
- Integrated AGP 2x 3D graphics
- Integrated Ethernet controller 10/100Mb/s
71Intel D815 Desktop Board
72Motherboard with 440LX Chipset