InfiniBand Queue Pair and Its Application - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

InfiniBand Queue Pair and Its Application

Description:

Consumer explicitly register a region with OS. Virtual to physical mapping ... Pre-register all data buffers. Post receive work request for all data buffers ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 44
Provided by: yingp
Category:

less

Transcript and Presenter's Notes

Title: InfiniBand Queue Pair and Its Application


1
InfiniBand Queue Pair and Its Application
(Based on Intel Developer Forum Spring 2001)
  • Outline
  • Queue Pair
  • Memory Registration
  • Host to Ethernet
  • Host to Storage I/O Device

2
InfiniBand Architecture Layers
3
Transport Services
  • InfiniBand Transport
  • Reliable Connected
  • Unreliable Connected
  • Reliable Datagram
  • Unreliable Datagram
  • Raw Datagram

4
Transport Interface
5
Verbs
  • Verbs Concept
  • Queue Pair
  • A Dual simplex communications endpoint
  • Work Request
  • Request communication operations
  • Completion Queue
  • Provides complete completion status
  • Memory region
  • Source/sink memory area for communication

6
Structure of Queue Pair
7
Consumer Queuing Model
8
Work Request
9
Scatter/Gather List
10
Send and Receive Operation
11
RDMA Operations
12
Work Completion
13
Selectable Signaling
  • Control completion of queue from send
  • Signaled work request always returns a CQ
  • Unsignaled work request only returns CQ when
    completed in error

14
Completion Retrieval
  • Polling
  • Poll a CQ to retrieve a completions

15
Completion Notification
  • Event Driven
  • A completion event will cause a callback on a per
    CQ basis
  • Events replace interrupt

16
Completion Notification
  • Two levels of event calling back filtering
  • Call back when the next WC is added to CQ
  • Call back when the next meeting the following
    criteria is added to CQ
  • WR completed in error
  • WR on receive queue completed by receiving a
    message with the solicited event flag set

17
Memory Access
  • Data buffers must be registered with the channel
    interface before use
  • Install address mapping into HCA
  • Virtual to physical address mapping is done by
    HCA
  • Assign access rights and permissions
  • Which queue pair can access
  • What operations can be done

18
Memory Model
19
Three main components
  • Three main memory components
  • Memory Region Set of contiguous memory locations
    that have been registered.
  • Consumer explicitly register a region with OS
  • Virtual to physical mapping
  • QP consumers use virtual address, HCA perform the
    translation
  • Set local and remote access rights

20
Three Main
  • Memory windows
  • Flexible and efficient RDMA access control to
    memory region
  • Consumer binds a pre-allocated window to a
    specified portion of a registered region
  • Protection Domain
  • Used to associate QPs with memory regions and
    memory windows

21
Network I/O Example
  • Overview
  • Focus on a single host
  • Design for multiple hosts and controllers

22
Operational View
23
Operational View
24
Design Choice
  • Design choice (How many queue pairs?)
  • Data and Control messages share one QP
  • Infrequent control messages

25
Design Choice
  • Completion Queue (how many?)
  • One for Send Queues
  • One for Receive Queues

26
Design Choice
  • Message format
  • Ethernet header and payload goes into Send
    Payload
  • Immediate data field contains the message type
    (data vs. control)

27
Receiver Side Design
  • Initially
  • Pre-register all data buffers
  • Post receive work request for all data buffers
  • Request a completion notification

28
Receive Side Design
  • When a completion callback occurs
  • Process all completed receives
  • Polling incoming CQ for completed receives
  • Pass received Ethernet frames to network stack
  • Recycle any data buffers returned by the network
    stack and post them to the receive work queue
  • Request a completion Notification after all
    available receives have been processed.

29
Send Side Design
  • Initially
  • Pre-register all data buffers
  • Pass data buffers up to the Network Stack

30
Sender Side Design
  • When network pass down a group of packets
  • Form and post a send work request
  • Poll the associated CQ for completed send
  • Recycle the data buffer to the upper layer

31
Network Summary
  • Hosts push packets to Ethernet
  • Ethernet push packets to Hosts
  • One Reliable connection QP per controller
  • Control and data messages are on the same QP
  • Two completion QPs
  • One with send queue
  • The other with receive queue

32
Host to Storage I/O
  • I/O unit with SCSI I/O controller
  • Example focus on single host

33
Architecture of Host-Storage
34
Operational View
35
Disk Access Scenario
36
Design Choice
  • How many queues?
  • Command and status message on one queue
  • Data RDMA on another queue

37
Completion Queue Pairs
  • Two completion QPs
  • One for Send queue
  • One for Receive queue

38
Design Choice
  • Command Message
  • Transaction ID
  • Host data buffers description
  • Operation request
  • Device command
  • Status message
  • Transaction ID
  • Status from operation request
  • Status from device

39
Receiver Side Design
  • At Driver Startup
  • Pre-register all status buffer
  • Post receive work requests for all data buffers
  • request a completion notification

40
Receiver Side Design
  • When a completion Callback occurs
  • Process all completed receives
  • Poll incoming CQ for completed Receives
  • Use transaction ID to locate storage request
  • Deregister the data buffer
  • Return completed storage request to storage stack
  • Recycle status buffer and post to receive work
    queue
  • Request a completion Notification after all
    available receives have been processed

41
Send Side Design
  • At Driver Startup
  • Pre-register all data buffers
  • When a storage request is passed down to the
    driver
  • Register a data buffer
  • Form command and post send work request
  • Poll the associated CQ for completed sends
  • Recycle send buffers to Drivers free list

42
Data Movement Design
  • Data movement by controller
  • One RDMA can transfer an entire data buffer up to
    231 bytes
  • Multiple RDMA operations maybe used by the
    Controller to transfer a single Data Buffer
  • Strive to use a single RDMA when the data buffers
    are 16K or less
  • Transfer larger than 64K doesnt make much
    improvement.

43
Storage Summary
  • The host pushes commands to the controller
  • The controller pushes status to the host
  • The controller pushes and pulls data
  • Two reliable connection QPs
  • Two completion queues per Driver
Write a Comment
User Comments (0)
About PowerShow.com