Soft RealTime Scheduling on Simultaneous Multithreaded Processors - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Soft RealTime Scheduling on Simultaneous Multithreaded Processors

Description:

Threads share most processor resources: Instruction fetch mechanism. Instruction window ... ICOUNT, gives priority to the thread that has the least instructions ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 29
Provided by: Gues212
Category:

less

Transcript and Presenter's Notes

Title: Soft RealTime Scheduling on Simultaneous Multithreaded Processors


1
Soft Real-Time Scheduling on Simultaneous
Multithreaded Processors
  • Rohit Jain,
  • Christonpher J. Hughes
  • Sarita V. Adve
  • IEEE REAL-TIME SYSTEMS SYMPOSIUM (RTSS02)
  • Borrowed from Yen-Sheng Chang
  • and presented by Cristiano Pereira
  • 4/22/2005

2
Outline
  • Introduction
  • SMT review (hyperthreading)
  • Resource Sharing Algorithms
  • Co-Scheduling Algorithms
  • Experimental Methodology
  • Results
  • Conclusions

3
Introduction
  • Simultaneous multithreading (SMT) improves
    processor throughput by processing instructions
    from multiple threads each cycle.
  • Two Decisions with SMT
  • Co-schedule selection
  • Affects threads utilization
  • Resource sharing
  • Which threads get share processor resources among
    co-scheduled threads
  • Problem unique to SMT processors
  • The choice of co-scheduling and resource sharing
    algorithm may be tightly coupled.

4
Introduction - Objective
  • To find the best algorithm to increase
    schedulability of soft real time tasks by
    exploiting co-scheduling and resource sharing on
    SMT processors

5
SMT review
simultaneous multithreaded
traditional (single-issue)
superscalar
multithreaded
6
Related Work
  • ICOUNT (seeks to maximize throughput)
  • Without consideration of any real-time deadline.
  • Symbiotic Job Scheduling for SMT
  • One interactive task with other non-real-time
    tasks
  • Multiprocessor scheduling (without SMT)

7
Resource Sharing Algorithms (1)
  • Threads share most processor resources
  • Instruction fetch mechanism
  • Instruction window
  • Functional units
  • Caches
  • Previous work has focused resource sharing
    algorithms to maximize total throughput (IPC)
  • No deadline concerns
  • May have negative/positive impact on the
    schedulability

8
Resource Sharing Algorithms (2)
  • Throughput-driven (dynamic)
  • ICOUNT, gives priority to the thread that has the
    least instructions in the instruction window
  • Performance prediction is difficult.
  • Performance guarantees (static)
  • Fixed set of resources is reserved for a given
    job
  • May be suboptimal!
  • Performance prediction is easy (identical to
    uni-processor).
  • Resources controlled per thread in this work
  • Fetch bandwidth instruction window (ICOUNT)
  • Other resources (thread blind, e.g. give FU to
    oldest inst.)

9
Co-scheduling algorithms (1)
  • Partitioning (bin-packing flavor algorithms)
  • Allows admission control
  • Global scheduling
  • Task migration on an SMT processor is free.
  • Symbiosis-aware vs. Symbiosis-oblivious
  • On SMT processors, execution time of a job
    depends on jobs co-scheduled with it.

10
Co-scheduling algorithms (2)
  • Design space explored
  • EDF as the underlying algorithm.
  • Partitioning ? EDF schedules the tasks within a
    context
  • Global ? EDF chooses the next task.

11
Predicting Execution Time, Utilization, and
Symbiosis (1)
  • These algorithms need to know exec. time,
    utilizations and symbiosis relations

IPC instruction count
12
Predicting Execution Time, Utilization, and
Symbiosis (2)
  • IPC
  • Job IPC in single-thread mode
  • ? profiling one frame of each frame type.
  • Job IPC with static resource sharing
  • ? profiling each allocation in single-threaded
    mode
  • Job IPC with dynamic resource sharing
  • ? profiles all possible co-schedules (N-tuples)
    to obtain the task IPCs.
  • Average job IPC with dynamic resource sharing
  • ? when the IPCs depend on the yet unknown
    co-schedule, approximate as job IPC averaged
    across all possible co-schedules
  • Instruction count
  • Use the average instruction count of a large
    number of frames as the prediction.

13
Partitioning Algorithms (1)
  • A partition in SMT is a set of tasks such that no
    two will execute simultaneously.
  • SMT supporting with N contexts has up to N
    partitions

14
Partitioning Algorithms (2)
  • PART-NOSYM-DYN-b
  • Bin-packing based algorithm uses the
    first-fit-decreasing-utilization (FFDU)
    heuristic.
  • IPC average across all co-schedules
  • PART-NOSYM-DYN-e
  • Corrects under estimated IPC by adjusting by
    increasing utilization threshold by smallest
    amount
  • No task-set is ever rejected
  • Simulates the schedule for a hyper-period, to
    determine if it would meet the deadlines.
  • Complexity is high, but it gives partitioning the
    fairest showing against global scheduling.

15
Partitioning Algorithms (3)
  • PART-NOSYM-STAT
  • Independent of co-schedule because of static
    resource allocation
  • IPC of each configuration
  • FFDU heuristic with an EDF admission test.
  • C1, C2, , Cn denote the N hardware contexts.
  • Initial all resources are allocated to C1
  • Re-allocation resources from C1 to Ck such that
    Ck can accommodate new task
  • If C1 dont have enough resource ? Fail
  • Remove smallest utilization task form C1 to
    another context that can accommodate it.
  • If no such context is found ? Fail

16
Partitioning Algorithms (4)
  • PART-SYM-DYN-b
  • Utilization average across all co-schedules
  • Maximizes average symbiosis among tasks in
    different partitions, while keeping the total
    utilization of tasks in each partition balanced.
  • Weighted hypergraph with nodes representing the
    tasks
  • A hypergraph is a graph in which generalized
    edges (called hyperedges) may connect more than
    two nodes.
  • The weight on a hyperedge (u1, u2, , uN) is the
    inverse of the symbiosis factor of the
    co-schedule formed by tasks u1, u2, , uN.
  • Each node is weighted with its tasks
    utilization.
  • A hypergraph-partitioning algorithm is used.
  • The sum of node-weights (utilization) is
    balanced.
  • The weight of the hyperedges is minimized
    (maximizing symbiosis)
  • PART-SYM-DYN-e

Reference B. L. Chamberlain. Graph Partitioning
Algorithms for Distributing Workloads of
Parallel Computations
17
Global scheduling algorithms (1)
18
Global scheduling algorithms (2)
  • Symbiosis-Oblivious Global Scheduling
  • GLOB-NOSYM-PLAIN
  • IPC average across all co-schedules
  • EDF N tasks with earliest deadlines are chosen
  • Tasks with arbitrarily low utilization miss
    deadline (Dhall effect)
  • GLOB-NOSYM-US
  • EDF-USm/2m-1 algorithm
  • If Ti has utilization gt N/(2N-1), give it high
    priority
  • Giving the highest priority to high utilization
    tasks in the task set.

Reference A. Srinivasan and S. Baruah.
Deadline-based Scheduling of Periodic Task
Systems on Multiprcessor
19
Global Scheduling Algorithms (3)
  • Symbiosis-Aware Global Scheduling
  • GLOB-SYM-PLAIN
  • Extends EDF to exploit symbiosis in a
    straightforward way.
  • It first selects the task with the earliest
    deadline.
  • For the other (N-1) tasks, it chooses the set
    that maximizes symbiosis when running with the
    first task.
  • Positive ? Improve schedulability (improve
    overall throughput)
  • Negative ? Potentially reduce schedulability (no
    real-time characteristic)
  • GLOB-SYM-US
  • In the presence of high utilization tasks,
    GLOB-SYM-PLAIN impairs schedulability
  • Improve the negative of GLOB-SYM-PLAIN
  • Defaults to GLOB-NOSYM-US if a task Ti has
    utilization Ui gt N/2N-1
  • Otherwise, it defaults to GLOB-SYM-PLAIN

20
Comparison of properties
21
Experiment Setup (1)
  • Randomly generated task sets
  • Two workloads utilizations follow either normal
    or bi-modal dist.
  • of tasks follows an uniform dist. (mean 8)
  • Periods from a set 100,200,,1600 with uniform
    probability
  • Randomly generated IPCi (mean 3) and co-schedule
    effects on IPCij
  • Metric
  • Success ratio percentage of tasks successful
    scheduled by an algorithm (at most 5 deadl.
    misses)

22
Experiment Setup (2)
RSIM simulator
Real Workload
23
Results
  • Best algorithm ? GLOB-SYM-US
  • Partitioning vs. Global algorithm
  • Global is generally better
  • Enhanced-versions and symbiosis awareness makes
    PART more competitive to GLOB-SYM-US.
  • For bimodal, enhanced partition are the best
    (distributes high utilization tasks)
  • Symbiosis-awareness
  • Partitioning ? often helps
  • Global scheduling ? helpful for high
    utilization, not helpful for medium utilization
  • PLAIN vs US US does better for bi-modal
    distribution as expected

24
Experimental Methodology
  • Metrics
  • critical serial utilization (CSU)
  • The total utilization obtained by uniformly
    increasing the utilization of all tasks until a
    further increase causes the task-set to become
    unschedulable. (5 deadline ? soft real-time)

25
Results
  • Best algorithm ? GLOB-SYM-US
  • Static vs. Dynamic resource sharing
  • Static resource sharing generally implies lower
    throughput than dynamic resource sharing.
  • Partitioning vs. Global algorithm
  • Enhanced-version is more competitive to
    GLOB-SYM-US.
  • Symbiosis-awareness
  • Partitioning ? often helps
  • Global scheduling ? it depends

26
Conclusions
  • Best algorithm
  • Global scheduling, exploits symbiosis,
    prioritizes high utilization tasks, uses dynamic
    resource sharing.
  • Require a lot of profiling
  • Two alternatives
  • Partitioning algorithm that utilizes static
    resource sharing
  • (PART-NONSYM-STAT)
  • Worse schedulability and somewhat more complex.
  • Provide a strict admission control and requires
    less profiling.
  • Earliest deadline first global algorithms
  • (GLOB-NONSYM-PLAIN)
  • Not providing strict admission control, but
    requires no profiling.

27
Conclusions
  • Dynamic resource sharing is better than static
    for schedulability
  • Partitioning algorithm can be made competitive
    with global scheduling algorithm, but with more
    complexity.
  • Symbiosis-awareness
  • Beneficial for partitioning algorithms because
    they do not entirely ignore real-time constraint
  • Can hurt or help global scheduling algorithms,
    depending on the relative magnitude of the
    symbiosis factors and total utilization of the
    applications.

28
  • Thank you
Write a Comment
User Comments (0)
About PowerShow.com