Optimizing Matrix Multiplication with a Classifier Learning System - PowerPoint PPT Presentation

About This Presentation
Title:

Optimizing Matrix Multiplication with a Classifier Learning System

Description:

Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) Mar a Jes s Garzar n University of Illinois at Urbana-Champaign – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 45
Provided by: lsu77
Learn more at: http://www.csc.lsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Optimizing Matrix Multiplication with a Classifier Learning System


1
Optimizing Matrix Multiplication with a
Classifier Learning System
  • Xiaoming Li (presenter)
  • María Jesús Garzarán
  • University of Illinois at Urbana-Champaign

2
Tuning library for recursive matrix multiplication
  • Use cache-aware algorithms that take into account
    architectural features
  • Memory hierarchy
  • Register file,
  • Take into account input characteristics
  • matrix sizes
  • The process of tuning is automatic.

3
Recursive Matrix Partitioning
  • Previous approaches
  • Multiple recursive steps
  • Only divide by half

A
B
4
Recursive Matrix Partitioning
  • Previous approaches
  • Multiple recursive steps
  • Only divide by half

A
B
Step 1
5
Recursive Matrix Partitioning
  • Previous approaches
  • Multiple recursive steps
  • Only divide by half

A
B
Step 2
6
Recursive Matrix Partitioning
  • Our approach is more general
  • No need to divide by half
  • May use a single step to reach the same partition
  • Faster and more general

A
B
Step 1
7
Our approach
  • A general framework to describe a family of
    recursive matrix multiplication algorithms, where
    given the input dimensions of the matrices, we
    determine
  • Number of partition levels
  • How to partition at each level
  • An intelligent search method based on a
    classifier learning system
  • Search for the best partitioning strategy in a
    huge search space

8
Outline
  • Background
  • Partition Methods
  • Classifier Learning System
  • Experimental Results

9
Recursive layout framework
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
57 58 59 60 61 62 63 64
  • Multiple levels of recursion
  • Takes into account the cache hierarchy

10
Recursive layout framework
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
57 58 59 60 61 62 63 64
  • Multiple levels of recursion
  • Takes into account the cache hierarchy

2
1
4
3
11
Recursive layout in our framework
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
57 58 59 60 61 62 63 64
  • Multiple levels of recursion
  • Takes into account the cache hierarchy

12
Recursive layout framework
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
57 58 59 60 61 62 63 64
  • Multiple levels of recursion
  • Takes into account the cache hierarchy

13
Recursive layout framework
1 2 5 6 17 18 21 22
3 4 7 8 19 20 23 24
9 10 13 14 25 26 29 30
11 12 15 16 27 28 31 32
33 34 37 38 49 50 53 54
35 36 39 40 51 52 55 56
41 42 45 46 57 58 61 62
43 44 47 48 59 60 63 64
  • Multiple levels of recursion
  • Takes into account the cache hierarchy

1
2
5
6
3
4
7
8
9
10
13
14
11
12
15
16
14
Padding
  • Necessary when the partition factor is not a
    divisor of the matrix dimension.

Divide by 3
2000
15
Padding
  • Necessary when the partition factor is not a
    divisor of the matrix dimension.

Divide by 3
2001
667
16
Padding
  • Necessary when the partition factor is not a
    divisor of the matrix dimension.

Divide by 4
2001
667
17
Padding
  • Necessary when the partition factor is not a
    divisor of the matrix dimension.

Divide by 4
2004
668
18
Recursive layout in our framework
  • Multiple level recursion
  • Support cache hierarchy
  • Square tile ? rectangular tile
  • Fit non-square matrixes

19
Recursive layout in our framework
  • Multiple level recursion
  • Support cache hierarchy
  • Square tile ? rectangular tile
  • Fit non-square matrixes

8
9
20
Recursive layout in our framework
  • Multiple level recursion
  • Support cache hierarchy
  • Square tile ? rectangular tile
  • Fit non-square matrixes

8
10
Padding
21
Recursive layout in our framework
  • Multiple level recursion
  • Support cache hierarchy
  • Square tile ? rectangular tile
  • Fit non-square matrixes

4
3
22
Outline
  • Background
  • Partition Methods
  • Classifier Learning System
  • Experimental Results

23
Two methods to partition matrices
  • Partition by Block (PB)
  • Specify the size of each tile
  • Example
  • Dimensions (M,N,K) (100, 100, 40)
  • Tile size (bm, bn, bk) (50, 50, 20)
  • Partition factors (pm, pn, pk)
    (2,2,2)
  • Tiles need not to be square

24
Two methods to partition matrices
  • Partition by Size (PS)
  • Specify the maximum size of the three tiles.
  • Maintain the ratios between dimensions constant
  • Example
  • (M,N,K) (100, 100,50)
  • Maximum tile size for M,N 1250
  • (pm, pn, pk) (2,2,1)
  • Generalization of the divide-by-half approach.
  • Tile size 1/4 matrix size

25
Outline
  • Background
  • Partition Methods
  • Classifier Learning System
  • Experimental Results

26
Classifier Learning System
  • Use the two partition primitives to determine how
    the input matrices are partitioned
  • Determine partition factors at each level
  • f (M,N,K) ? (pmi,pni,pki), i0,1,2 (only
    consider 3 levels)
  • The partition factors depend on the matrix size
  • Eg. The partitions factors of a (1000 x 1000)
    matrix should be different that those of a (50 x
    1000) matrix.
  • The partition factors also depend on the
    architectural characteristics, like cache size.

27
Determine the best partition factors
  • The search space is huge ? exhaustive search is
    impossible
  • Our proposal use a multi-step classifier
    learning system
  • Creates a table that given the matrix dimensions
    determines the partition factors

28
Classifier Learning System
  • The result of the classifier learning system is a
    table with two columns
  • Column 1 (Pattern) A string of 0, 1, and
    that encodes the dimensions of the matrices
  • Column 2 (Action) Partition method for one step
  • Built using the partition-by-block and
    partition-by-size primitives with different
    parameters.

29
Learn with Classifier System
Pattern Action
(10,11) PS 100

(010,011) PB (4,4)

30
Learn with Classifier System
Pattern Action
(10,11) PS 100

(010,011) PB (4,4)

5 bits / dim
31
Learn with Classifier System
24
Pattern Action
(10,11) PS 100

(010,011) PB (4,4)

16
32
Learn with Classifier System
24
Pattern Action
(10,11) PS 100

(010,011) PB (4,4)

16
33
Learn with Classifier System
12
Pattern Action
(10,11) PS 100

(010,011) PB (4,4)

8
34
Learn with Classifier System
12
Pattern Action
(10,11) PS 100

(010,011) PB (4,4)

8
35
Learn with Classifier System
12
Pattern Action
(10,11) PS 100

(010,011) PB (4,4)

8
36
Learn with Classifier System
4
4
Pattern Action
(10,11) PS 100

(010,011) PB (4,4)

37
How classifier learning algorithm works?
  • Change the table based on the feedback of
    performance and accuracy from previous runs.
  • Mutate the condition part of the table to adjust
    the range of matching matrix dimensions.
  • Mutate the action part to find the best partition
    method for the matching matrices.

38
Outline
  • Background
  • Partition Methods
  • Classifier Learning System
  • Experimental Results

39
Experimental Results
  • Experiments on three platforms
  • Sun UltraSparcIII
  • P4 Intel Xeon
  • Intel Itanium2
  • Matrices of sizes from 1000 x 1000 to 5000 x 5000

40
Algorithms
  • Classifier MMM our approach
  • Include the overhead of copying in and out of
    recursive layout
  • ATLAS Library generated by ATLAS using the
    search procedure without hand-written codes.
  • Has some type of blocking for L2
  • L1 One level of tiling
  • tile size the same that ATLAS for L1
  • L2 Two levels of tiling
  • L1tile and L2tile the same that ATLAS for L1

41
(No Transcript)
42
(No Transcript)
43
Conclusion and Future Work
  • Preliminary results prove the effectiveness of
    our approach
  • Sun UltraSparcIII and Xeon 18 and 5
    improvement, respectively.
  • Itanium -14
  • Need to improve padding mechanism
  • Reduce the amount of padding
  • Avoid unnecessary computation on padding

44
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com