Title: Priority Scheduler Overview
1Priority Scheduler Overview
- Teradata Development Division
- August, 2002
2The Tip of the Iceberg
- What Priority Scheduler is Not
- What Priority Scheduler is
- Architecture
- Implementation
3What Priority Scheduler is Not
- Hardware Segmentation Tool
- Priority Scheduler performs logical segmentation,
not physical segmentation - Group Management Tool
- Priority Scheduler provides relative priority
- CPU Optimizer
- Priority Scheduler is not a band aid for poorly
written queries, it does not give more CPU
seconds per day
4What is Priority Scheduler
- Priority Scheduler is
- A Workload Manager
- Based on relative workloads determined at the
time of execution - What Priority Scheduler can do for you
- Instituting better service for your more
important work - Controlling resource sharing among different
applications - Preventing aggressive queries from over-consuming
at the expense of other work - Placing a ceiling on CPU usage for some
applications
5Priority Scheduler is Prioritized Queues not
Percentages of CPU
First Class
Business Class
Coach
6Priority Scheduler Components
RP1
RP4
Resources Partition
RP0 Default
Performance Groups
L1
M1
H1
R1
L4
M4
H4
R4
L
M
H
R
L
M
H
R
Management schmon Tool
6am-8am AG22
8am-11pm AG20
11pm-6am AG21
8am-8am AG1
Performance Periods
Allocation Group
AG1 5
AG20 20
AG21 40
AG22 5
AG30 5
AG31 10
AG32 20
7Summary of Components(For your reference)
- Resource Partition (RP)
- A high-level resource and priority grouping
- May define up to 5
- Partition 0 is provided
- Performance Group (PG)
- Must define 8 (with values 0 - 7) within each
Partition - Only PGs with values of 0,2,4,6 are for users (1,
3, 5, 7 are used internally) - PG name matches to acctid string on logon
statement, must be unique system-wide
- Performance Period (PP)
- 1 to 5 per PG
- Links a PG to an Allocation Groups weight and
policy - Makes possible changes in priority weight/policy
by time or resource usage - Allocation Group (AG)
- Carries the weight
- Determines the Policy and Set Division Type
- PGs may share the same Allocation Groups
8Relative Weighting and Allocating Resources
- Weights are distributed among Resource Partitions
- RP0 100
- RP1 75
- RP2 25
- Weights are then allocated among Allocation Groups
Example RP1/M1 RP1 Relative AG Relative
(75/(1007525)) (10/(21050100)) 37
6 2
9What is Relative Weighting
CPU ASSIGNED CPU TARGETED CPU
CONSUMED
L
RP2
M
RP2
L
H
RP2
R
L
H
H
R
RP1
R
- CPU Consumed The CPU actually allocated to
the active partitions and active groups - Reflects targeted but unused CPU spilling over to
other partitions and groups
10Performance Groups
RP1
Resources Partition
Users are assigned to a PG
Performance Groups
L1
M1
H1
R1
At Logon Users are associated to an AG via a PP
6am-8am AG22
8am-11pm AG20
11pm-6am AG21
Performance Periods
Allocation Group
AG20 20
AG21 40
AG22 5
Users execute at the weight of the assigned AG
11You Assign Weights to Groups of Users
Allocation Group Wgt
User Sessions
Allocation Group 1 Weight 5
Perf. Grp 1 LDEV
Perf. Period
9
Allocation Group 2 Weight 10
Perf. Grp 2 MDEV
Perf. Period
18
Allocation Group 4 Weight 40
Perf Period
Perf. Grp 4 RDEV
73
- Sum the weights
- 51040 55
- Calculate Relative Weights
- L 5/55 9
- M 10/55 18
- R 40/75 73
- Each user session is assigned to an Allocation
Group at log on time - Each Allocation Group has a targeted share of the
system resources, called a Wgt
12Wgt Combines Partition and Allocation Group
Weight
RP0
RP1
RP Assigned Weights
Weight 100
Weight 50
AG Assigned Weight
10
40
10
5
L M R
X
Relative Partition Wgt
67
33
Relative AG Wgt within Partition
9 18 73
100
6 12 49
33
Wgt
.67 .9 .6
.67 .18 .12
.67 .73 .49
.33 1.00.33
13Performance Periods by CPU Usage
Automatic Change Based on CPU Usage Since Session
Logon
User logs on
User reaches 100 Seconds
CPU accumulation
Time in Seconds
0
50
100
125
Performance Period 0 Usage 100
Seconds Allocation Group 9
Performance Period 1 Usage 0 Seconds Allocation
Group 33
Alloc Grp 9 Weight20 Default
Alloc Grp 33 Weight5 Default
14Performance Periods by Time
Performance Period 1 End-time 1700
hours Allocation Group 9
- T is type of Time
- VALUE is the END TIME
- Define Periods with Performance Group
- Up to four per Performance Group
Alloc Grp 9 Weight60 IMMEDIATE
Performance Period 2 End-time 2300
hours Allocation Group 5
Alloc Grp 5 Weight20 Default
Performance Period 3 End-time 0700
hours Allocation Group 6
Alloc Grp 6 Weight10 Absolute
15Allocation Group Policies
Unrestricted Policies
Restricted Policies
- ABS (ABSOLUTE)
- The assigned weight of the allocation group
becomes the ceiling value - This ceiling is the same no matter what other
allocation groups are active - Only when the allocation of CPU is greater than
the ABS will the ceiling have an impact
- DEF (DEFAULT)
- Keeps track of past usage of each process
- Seeks out adjusts over- or under-consumers
- May benefit complex queries or mixed work
- Is the default policy
- IMD (IMMEDIATE)
- All processes in the Allocation Group are treated
as equal - Ignores uneven process usage
- Preferable for short, repetitive work
- REL (RELATIVE)
- Makes the Allocation Group Wgt value the
ceiling on CPU - This ceiling will change based on which
allocation groups are active at the time
16Same Job Running under Different Policies All
tests are run in M with the same assigned weight
of 10
10 Concurrent Streams of Short 3-second Queries
(Actual Tests)
800
Test 4
Test 2
Test 3
Test 1
700
719
600
585
500
400
Time in Seconds
300
200
100
10/70
10
87
89
0
DEFAULT
IMMEDIATE
RELATIVE
ABSOLUTE
10 of CPU
100 of CPU
14 of CPU
100 of CPU
Assume Only M, H and R active. Sum of weights
is 70. The relative weight of M would be 10/70
or 14.
17All Users Share the CPU Allocated to the
Allocation Group
Actual Test
40
39
30
20
Time in Seconds
18
10
11
7
0
1 H User
5 H Users
10 H Users
20 H Users
- The higher the number of active users in the
Allocation Group, the less CPU each user will
receive
18Dispersing Surplus Resources
Not used
10
RP0 - R 49 RP1 - X 33 RP0 - M 12 RP0 - L
6
55
20
Surplus
15
- When an Allocation Group cannot consume all the
resource it is entitled to, unused CPU is shared
based on Wgt
19Example Priority Scheduler Aiding Response Time
Consistency
Average Query Time for 100,000 Single Customer
Reads With and without High/Low Priorities
(Actual Test)
0.25
Same Priority
0.20
High vs Low
0.15
Average Time in Seconds
0.10
0.05
0.00
0 Streams
5 Streams
10 Streams
20 Streams
30 Streams
Background
Background
Background
Background
Background
20SCHMON UtilityA UNIX-based Facility
- Priority Scheduler Facility (PSF) is managed by
issuing UNIX commands - UNIX Root privileges are required
- Two key UNIX commands
- schmon -d Displays the current PSF setting
- schmon -m Reports current resource usage by
group - All changes in set up take place immediately
across all nodes - No perceivable performance overhead
21Implementing Priority Scheduler
- Basic Steps
- Be Prepared
- Generalized Rules
22Basic Steps
- Understand the goals of using the ADW
- What workloads should have what priority at what
time (day, night, weekend) - Set up Service Level Agreements (SLAs)
- Understand the current performance before making
changes - Take small steps and grow
23Be Prepared
- Collect data on the current performance of the
environment - CPU and I/O by user, by hour (source ampusage
- Overall system usage (source resusage)
- Current Query Wall clock time
- Collect data to understand the following
- Hourly CPU consumption by workload by day by week
- Number of queries per hour per session
- Complexity of queries (CPU intensive versus I/O)
intensive. - As stated so many times before EXPLAIN, EXPLAIN
24Key Recommendations
- 1 There is rarely one way to implement Priority
Scheduler - 2 Understand you goals and current performance
before implementing Priority Scheduler - 3 Priority Scheduler is not a band aid for
poorly written queries and locking bottlenecks - 4 Try to minimize user activity in RP0,
especially in the High and Rush - 5 Unless there is a good reason not to, keep RP0
the highest weighted resource partition. - 6 If consistent times are required for active
queries (single and few AMP), then assign RP
weights such that there is at least a 4 to 1
ratio between the partition running
response-sensitive work and all other user
partitions.