Title: Designing User Interfaces Spring 1999
1SE 767-NT Software Performance Engineering Robert
Oshana Lecture 19 For more
information, please contact NTU Tape
Orders NTU Media Services (970) 495-6455
oshana_at_airmail.net
tapeorders_at_ntu.edu
2Performance-Oriented Design
3Where we are
- Introduction
- SPE Quick View
- SPE and the UML
- Software Execution Models
- Web applications and other distributed systems
- System execution models
- SPE data collection
- Software measurement and instrumentation
- Performance oriented design
- Performance patterns
- Performance anti-patterns
- Implementation solutions
- Web applications
- Embedded and real-time systems
- The SPE process
- Implementing SPE
4Overview
- This chapter will present a set of general
principles for creating responsive systems - These will help to identify design alternatives
that are likely to meet performance objectives - Generalize and abstract knowledge for performance
specialists
5Performance control principle
- Performance objectives principle define
specific, quantitative, measurable performance
objectives for performance scenarios - Avoid vague or qualitative performance objectives
6Instrumenting principle
- Instrumenting principle instrument systems as
you build them to enable measurement and analysis
of workload scenarios, resource requirements, and
performance objective compliance - Instrument distributed systems that interact with
other applications to measure time required for
interactions
7Independent principles
8Centering principle
- Identify the dominant workload functions and
minimize their processing - Leverages performance by focusing attention on
parts of the system with the greatest impact - 80/20 rule
- Frequently used functions are called dominant
workload functions - Design the dominant workload functions first
9Fixing-Point principle
- For responsiveness, fixing should establish
connections at the earliest feasible point in
time, such that retaining the connection is
cost-effective - A fixing point is a point in time
- Must know enough about the anticipated usage
patterns
10Locality principle
- Create actions, functions, and results that are
close to physical resources - closeness of desired functions and results to
the physical resources that produce them - Types of locality
- Spatial
- Temporal
- Effectual (purpose or intent)
- Degree
11Embedded System example - Direct Memory Access
(DMA)
12Setting up and enabling a DMA operation
/ Addresses of some of the important DMA
registers / define DMA_CONTROL_REG (
(volatile unsigned ) 0x40000404) define
DMA_STATUS_REG ( (volatile unsigned )
0x40000408) define DMA_CHAIN_REG (
(volatile unsigned ) 0x40000414) / Macro to
wait for the DMA to complete and signal the
status register / define DMA_WAIT
while(DMA_STATUS_REG1) / prebuilt TCB
structure / typedef struct TCB setup
structures DMA_TCB
13What is DMA?
- DMA allows data transfers to proceed with little
or no intervention from the rest of the processor - Problem with fast DSPs is keeping it fed with
data - External RAM must run with wait states that
severely impact performance - Using the DMA to stage data on and off chip
increases performance
14Use the DMA instead of the CPU
Write data back to memory
Access data from memory
Do something else
Process data
Set up DMA
Do something else
Set up DMA
Start something else
Process data
DMA moves data on chip
DMA moves data off chip
15Setting up and enabling a DMA operation
extern DMA_TCB tcb / setup the remaining
fields of the TCB structure destination of
data, number of words / tcb.destination_add
ress dest_address tcb.word_count
word_count / writing to the chain register
kicks off the DMA operation / DMA_CHAIN_REG
(unsigned)tcb Allow the CPU to do other
meaningful work.... / wait for the DMA
operation to complete / DMA_WAIT
16Template for staging data on and off chip
CPU
Processing
Moving
DMA
Internal Memory
External Memory
17Template for staging data on and off chip
INITIALIZE TCBS
DMA SOURCE DATA 0 INTO SOURCE BUFFER 0
WAIT FOR DMA TO COMPLETE
DMA SOURCE DATA 1 INTO SOURCE BUFFER 1
PERFORM CALCULATION AND STORE IN RESULT BUFFER
FOR LOOP_COUNT 1 TO N-1
WAIT FOR DMA TO COMPLETE
DMA SOURCE DATA I1 INTO SOURCE BUFFER
(I1)2
DMA RESULT BUFFER(I-1)2 TO DESTINATION
DATA
PERFORM CALCULATION AND STORE IN RESULT
BUFFER
END FOR
WAIT FOR DMA TO COMPLETE
DMA RESULT BUFFER(I-1)2 TO DESTINATION DATA
PERFORM CALCULATION AND STORE IN RESULT BUFFER
WAIT FOR DMA TO COMPLETE
DMA LAST RESULT BUFFER TO DESTIMATION DATA
18On-Chip RAM Organization
Work Buff
Work Buff
Field 3
Field 3
Field 3
Processed Field 1
Unused
Unused
Unused
Processed Field 5
Scheduling to achieve efficient on-chip access
Field 1
Field 5
Field 2
Field 2
Field 2
Field 2
On-Chip RAM
Field 6
Processed Field 4
Field 4
Unused
Unused
Variables
Variables
Variables
Unused
Work Buff
Work Buff
Unused
Weights
Weights
Weights
Weights
Weights
Weights
Weights
Time
19Processing versus frequency principle
- Minimize the product of processing times
frequency - Look to make a tradeoff between the amount of
work done and the number of requests received - For example, accessing a DB
20Synergistic principles
21Shared resources principle
- Share resources when possible and when exclusive
access is required, minimize the sum of the
holding time and the scheduling time - Resources in a computer are limited and processes
compete for their use - Sharing is possible but it must be managed
22Parallel processing principle
- Execute processing in parallel only when the
processing speedup offsets communication overhead
and resource contention delays - Real concurrency
- Apparent concurrency
23SE 767-NT Software Performance Engineering Robert
Oshana End of lecture For
more information, please contact NTU Tape
Orders NTU Media Services (970) 495-6455
oshana_at_airmail.net
tapeorders_at_ntu.edu