Architectural Characterization of TCP/IP Processing on the Intel - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Architectural Characterization of TCP/IP Processing on the Intel

Description:

... (2 dual port Gigabit NICs) Clients 2Gbps per client (1 dual port Gigabit ... Optimized TCP/IP stack running on dedicated processor(s) or core(s) Other Studies ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 22

Provided by: smak

Category:

Tags: tcp | architectural | characterization | intel | processing

Transcript and Presenter's Notes

Title: Architectural Characterization of TCP/IP Processing on the Intel

1
Architectural Characterization of TCP/IP
Processing on the Intel Pentium M Processor

Srihari Makineni Ravi Iyer
Communications Technology Lab
Intel Corp.
srihari.makineni ravishankar.iyer_at_intel.com

HPCA-10
2
Outline

Motivation
Overview of TCP/IP
Setup and Configuration
TCP/IP Performance Characteristics
Throughput and CPU Utilization
Architectural Characterization
TCP/IP in server workloads
Ongoing work

3
Motivation

Why TCP/IP?
TCP/IP is the protocol of choice for data
communications
What is the problem?
So far system capabilities allowed TCP/IP to
process data at Ethernet speeds
But Ethernet speeds are jumping rapidly (1 to 10
Gbps)
Requires efficient processing to scale to these
speeds
Why architectural characterization?
Analyze performance characteristics and identify
processor architectural features that impact
TCP/IP processing

4
TCP/IP Overview

Transmit

Application
Buffer
User
Kernel
Sockets Interface
TCP/IP Stack
TCB
Tx 1
Tx 2
Tx 3
ETH
IP
TCP
Desc 1
Driver
Desc 2
Network Hardware
DMA
Eth Pkt 1
5
TCP/IP Overview

Receive

Application
User
Signal/ Copy
Kernel
Sockets Interface
TCP/IP Stack
Rx 2
Rx 3
Rx 1
Buffer
Copy
Driver
Descriptor
Payload
Network Hardware
Eth Pkt 1
6
Setup and Configuration

Test setup
System Under Test (SUT)
Intel Pentium M processor _at_ 1600MHz, 1MB (64B
line) L2 cache
2 Clients
Four way Itanium 2 processor _at_ 1GHz 3MB L3
(128B line) cache

Operating System
Microsoft Windows 2003 Enterprise Edition
Network
SUT 4Gbps total (2 dual port Gigabit NICs)
Clients 2Gbps per client (1 dual port Gigabit
NIC)

7
Setup and Configuration

Tools
NTttcp
Microsoft application to measure TCP/IP
performance
Tool to extract CPU performance counters
Settings
16 connections (4 per NIC port)
Overlapped I/O
Large Segment Offload (LSO)
Regular Ethernet frames (1518 bytes)
Checksum offload to NIC
Interrupt coalescing

8
Throughput and CPU Utilization

Lower Rx performance for gt 512 byte buffer sizes
Rx and Tx (no LSO) CPU utilization is 100
Benefit of LSO is significant (250 for 64KB
buffer)
Lower throughput for lt 1KB buffers is due to
buffer locking

TCP/IP processing _at_ 1Gbps 1460 bytes requires
gt1 CPU
9
Processing Efficiency

Hz/bit

64 byte buffer
Tx (lso) 17.13 and Rx 13.7
64 KB buffer
Tx (lso) 0.212, Tx (no LSO) 0.53 and Rx 1.12

Several cycles are needed to move a bit,
especially for Rx
10
Architectural Characterization

CPI

Rx CPI higher than Tx for gt512 byte buffers
Tx (LSO) CPI is higher than Tx (no LSO)!!!

CPI needs to come down to achieve TCP/IP scaling
11
Architectural Characterization

Pathlength

Rx pathlength increase is significant after 1460
byte buffer sizes
For 64KB, TCP/IP stack has to receive and process
45 packets
Lower CPI for Tx (no LSO) over Tx (LSO) is due to
higher PL

High PL shows that there is room for stack
optimizations
12
Architectural Characterization

Last level Cache Performance

Rx has higher misses
Primary reason for higher CPI
Lot of compulsory misses
Source buffer, descriptors, may be destination
buffer
Tx (no LSO) has slightly higher misses per bit

Rx performance does not scale with cache size
(many compulsory misses)
13
Architectural Characterization

L1 Data Cache Performance

32KB of data cache in Pentium M processor
As expected L1 data cache misses are more for Rx
For Rx, 68 to 88 of L1 misses resulted in L2
hits

Larger L1 data cache has limited impact on TCP/IP
14
Architectural Characterization

L1 Instruction Cache Performance

L1 Instruction Cache Performance
32KB instruction cache in Pentium M processor
Tx (no LSO) MPI is lower because of code temporal
locality
Rx code path generated L1 instruction capacity
misses

Larger L1 instruction cache helps RX processing
15
Architectural Characterization

TLB Performance

TLB Performance
Size
128 instruction and 128 data TLB entries
iTLB misses increase faster than dTLB misses

16
Architectural Characterization

Branch Behavior

19-21 branch instructions
Misprediction rate is higher in Tx than Rx for lt
512 byte buffer size

gt98 accuracy in branch prediction
17
Architectural Characterization

CPI Contributors
RX is more memory intensive than TX
Frequency Scaling
Poor Frequency scaling due to memory latency
overhead

Frequency Scaling alone will not deliver 10x gain
18
TCP/IP in Server Workloads

Webserver
TCP/IP data path overhead is 28
Back-End (database server with iSCSI)
TCP/IP data path overhead is 35
Front-End (e-commerce server)
TCP/IP data path overhead is 29

TCP/IP Processing is significant in commercial
server workloads
19
Conclusions

Major Observations
TCP/IP processing _at_ 1Gbps 1460 bytes requires
gt1 CPU
CPI needs to come down to achieve TCP/IP scaling
High PL shows that there is room for stack
optimizations
Rx performance does not scale w/ cache size (gt
compulsory misses)
Larger L1 data cache has limited impact on TCP/IP
Larger L1 instruction cache helps RX processing
gt98 accuracy in branch prediction
Frequency Scaling alone will not deliver 10x gain
TCP/IP Processing is significant in commercial
server workloads
Key Issues
Memory Stall Time Overhead
Pathlength (O/S Overhead, etc)

20
Ongoing work

Investigating Solutions to the Memory Latency
Overhead
Copy Acceleration
Low cost synchronous/asynchronous copy engine
DCA
Incoming data is pushed into processors cache
instead of memory
Light weight Threads to hide memory access
latency
Switch-on-event threads small context low
switching overhead
Smart Caching
Cache structures and policies for networking
Partitioning
Optimized TCP/IP stack running on dedicated
processor(s) or core(s)
Other Studies
Connection processing, bi-directional data
Application interference

21
QA

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Telecommunications and Signal Processing at UT Austin PowerPoint PPT Presentation

Telecommunications and Signal Processing at UT Austin - VLC. VLC. DCT = Discrete Cosine Transform. MCP = Motion Compensation. VLC = Variable Length Coding. Model Based Image Quality Assessment ... | PowerPoint PPT presentation | free to view

Telecommunications and Signal Processing at UT Austin PowerPoint PPT Presentation

Telecommunications and Signal Processing at UT Austin - Wireline Communications speaker phones, ADSL modems ... Greyscale image. WYSIWYG halftone. Error. filter. F(t1,t2,...,tn) DBF. Near. Field. Far. Field ... | PowerPoint PPT presentation | free to view

Design Tools for Networked SystemonChip PowerPoint PPT Presentation

Design Tools for Networked SystemonChip - We need tools and techniques to co-design these layers, instead of separate ... Modeling, characterization and validation tools ... | PowerPoint PPT presentation | free to view

Design challenges in sub-100nm high performance microprocessors PowerPoint PPT Presentation

Design challenges in sub-100nm high performance microprocessors - Design challenges in sub-100nm high performance microprocessors Nitin Borkar, Siva Narendra, James Tschanz, Vasantha Erraguntla Circuit Research, Intel Labs | PowerPoint PPT presentation | free to view

Chapter 21: The Linux System PowerPoint PPT Presentation

Chapter 21: The Linux System - ... standard TCP/IP networking protocols ... from clashing over access to hardware resources ... driver may deliver a hardware interrupt that causes the CPU ... | PowerPoint PPT presentation | free to view

Prof. Jerry Breecher PowerPoint PPT Presentation

Prof. Jerry Breecher - One aspect of Performance Engineering building performance into a product. ... are often due to fundamental architectural or design factors rather than ... | PowerPoint PPT presentation | free to view

History PowerPoint PPT Presentation

History - ... standard TCP/IP networking protocols. ... from clashing over access to hardware resources. ... driver may deliver a hardware interrupt that causes the CPU ... | PowerPoint PPT presentation | free to view

Grid Computing Model PowerPoint PPT Presentation

Grid Computing Model - Before laying out the architectural model that supports the definition, it is ... protocol architecture as defined by the Open Systems Interconnect (OSI) Internet ... | PowerPoint PPT presentation | free to view

NEW TRENDS IN COMPUTER ARCHITECTURE DESIGN PowerPoint PPT Presentation

NEW TRENDS IN COMPUTER ARCHITECTURE DESIGN - Sony ... cell phone, radio, timer, camera, TV remote, am/fm radio, garage door ... Matrix transpose/multiply (3D Gr.) # vertices at once. DCT (video, comm. ... | PowerPoint PPT presentation | free to view

Chapter 21: The Linux System PowerPoint PPT Presentation

Chapter 21: The Linux System - ... standard TCP/IP networking protocols ... from clashing over access to hardware resources ... driver may deliver a hardware interrupt that causes the CPU ... | PowerPoint PPT presentation | free to view

Database Middleware for Sensor Networks PowerPoint PPT Presentation

Database Middleware for Sensor Networks - Database Middleware for Sensor Networks | PowerPoint PPT presentation | free to view

Tuning RED for Web Traffic PowerPoint PPT Presentation

Tuning RED for Web Traffic - ... but we're too busy doing cutting edge research to create a sexier home page. ... Decreasing wq to 1/1024 was found to be an unrealistic setting that causes ... | PowerPoint PPT presentation | free to view

(Network%20Security) PowerPoint PPT Presentation

(Network%20Security) - The name and the address of hosts and network devices. The opened services. ... Before this paper: ... Denial of Service (DDoS) Network Invasion Network ... | PowerPoint PPT presentation | free to view

BILGISAYAR MIMARILERINE YENI YAKLASIM DERSI AG ISLEMCI MIMARILERI PowerPoint PPT Presentation

BILGISAYAR MIMARILERINE YENI YAKLASIM DERSI AG ISLEMCI MIMARILERI - ... Mapping on Network Processor Architectures. ... N. Bershad: Characterizing Processor ... for Programmable Network Interfaces. | PowerPoint PPT presentation | free to view

Randy H. Katz PowerPoint PPT Presentation

Randy H. Katz - 1. Randy H. Katz. The United Microelectronics Corporation Distinguished Professor ... Random, round-robin, load-informed redirection. Net vs. server as bottleneck. 37 ... | PowerPoint PPT presentation | free to view

Linuxflow: A High Speed Backbone Measurement Facility PowerPoint PPT Presentation

Linuxflow: A High Speed Backbone Measurement Facility - Title: Linuxflow: A High Speed Backbone Measurement Facility Author: Wintel Last modified by: Wintel Created Date: 4/5/2003 11:34:06 AM Document presentation format | PowerPoint PPT presentation | free to view

Advanced Operating Systems PowerPoint PPT Presentation

Advanced Operating Systems - Advanced Operating Systems Lecture 9: Distributed Systems Architecture University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani | PowerPoint PPT presentation | free to view

Research Challenges in Passive Network Measurement PowerPoint PPT Presentation

Research Challenges in Passive Network Measurement - Single node: emerging themes. many important applications for single node measurement ... collaboration among hardware designers, systems software & algorithms ... | PowerPoint PPT presentation | free to view

dnstap PowerPoint PPT Presentation

dnstap - dnstap: high speed DNS logging without packet capture Jeroen Massar Farsight Security, Inc. | PowerPoint PPT presentation | free to view

Distributed Systems, Network Protocols PowerPoint PPT Presentation

Distributed Systems, Network Protocols - Improve performance and ... Cannot just pick top individual performers ... Reasons for observed performance benefit can we relate route/ISP selection to ... | PowerPoint PPT presentation | free to view

QoS Support PowerPoint PPT Presentation

QoS Support - Cant get QoS with a 'free-for-all' ... Let a connection be allocated weights at each WFQ scheduler along its path, so ... Different weights, fixed packet size ... | PowerPoint PPT presentation | free to view

Future%20Directions%20in%20Computer%20and%20Systems%20Architecture%20for%20Scientific%20Computing PowerPoint PPT Presentation

Future%20Directions%20in%20Computer%20and%20Systems%20Architecture%20for%20Scientific%20Computing - Future Directions in Computer and Systems Architecture for Scientific Computing | PowerPoint PPT presentation | free to view

Computer Systems and Networks Foundation of the IT revolution! PowerPoint PPT Presentation

Computer Systems and Networks Foundation of the IT revolution! - Computer Systems and Networks Foundation of the IT revolution! Dr. Alan D. George CSN Area Coordinator ECE Department 09/19/02 Outline CSN faculty members ... | PowerPoint PPT presentation | free to view

SKR 3201: Internetworking (Antara Rangkaian) PowerPoint PPT Presentation

SKR 3201: Internetworking (Antara Rangkaian) - SKR 3201: Internetworking (Antara Rangkaian) Fahrul Hakim * * ... | PowerPoint PPT presentation | free to view

EE573 VLSI ????? PowerPoint PPT Presentation

EE573 VLSI ????? - EE573 VLSI 2004 ; ( , know-what ... | PowerPoint PPT presentation | free to view

Using PC clusters for scientific computing: do they really work? PowerPoint PPT Presentation

Using PC clusters for scientific computing: do they really work? - Migration of conventional supercomputer users (Cray, etc.) to less expensive platforms ... ASUS KN97X motherboard (440FX PCI chipset) 266MHz Pentium IIs (512KB cache) ... | PowerPoint PPT presentation | free to view

Center for Teleinfrastructure CTIF and its Competence PowerPoint PPT Presentation

Center for Teleinfrastructure CTIF and its Competence - ... e.g. sociology, media studies and humanistic informatics under the CTIF umbrella ... more than 500 highly skilled researchers from various disciplines. ... | PowerPoint PPT presentation | free to view