Title: Implementation of Morton Layout for Large Arrays
1Implementation of Morton Layout for Large Arrays
Bowling Green State University
- Presented by Sharad Ratna Bajracharya
- Advisor Prof. Larry Dunning
23rd April 2004
2Outline
- Introduction
- Objectives
- Implementation
- Samples
- Improvement
- Recommendation
- Conclusion
3Introduction
- Morton Layout is used in two dimensional array.
- Performance of Morton Layout is comparatively
better than row-major or column-major array
representation.
4Introduction continues...
- Reports on analysis of the Morton Layout for the
performance and efficiency - An exhaustive evaluation of row-major,
column-major and Morton Layouts for large
two-dimensional arrays Jeyarajan Thiyagalingam,
Olav Beckman, Paul H. J. Kelly. - Is Morton Layout competitive for large
two-dimensional arrays? Jeyarajan Thiyagalingam
and Paul H. J. Kelly. - Improving the Performance of Morton Layout by
Array Alignment and Loop Unrolling Jeyarajan
Thiyagalingam, Olav Beckman, Paul H. J. Kelly.
5Introduction continues...
- General Row Major Array Representation
- Row major ordering assigns successive elements,
moving across the rows and then down the columns,
to successive memory locations. 0 1 2
3 4 5 6 78 9 10 1112 13
14 15
6Introduction continues...
- Column Major array representation. 0 4
8 12 1 5 9 13 2
6 10 14 3 7 11 15
7Introduction continues...
- Morton layout is a compromise storage layout
between the programming language mandated layouts
such as row-major and column-major. 0 1
2 3 0 1 4 5 4 5 6
7 2 3 6 7 8 9 10 11 8
9 12 1312 13 14 15 10 11 14
15 (Row Major) (Morton Storage Layout)
8Introduction continues...
- Morton storage layout works with almost equal
overhead whether traversed row-wise or
column-wise. - Morton layout works fine with square two
dimensional array, which size is power of 2 such
as 2x2, 4x4, 8x8 etc.
9Introduction continues...
- For non-square matrix, it waste lots of memory
spaces.0 1 2 3 0 1 4 5 4 5
6 7 2 3 6 78 9 10 11 8 9 X
X 10 11(Row Major) (Morton Storage
Layout)
10Introduction continues...
- How Morton Layout Works?
- For any subscript of 2 dimensional array such as
array 2 , 3 Binary value of row 2 -gt 1
0Binary value of col 3 -gt 1 1Morton
Layout stores at 1 1 0 1 location, i.e. 13th
memory location. - Also known as Zip Fastening Array Layout.
11Introduction continues...
- Consider row major large array1 2 3 4 5 6
7 .10001001 1002 1003 1004 1005 1006 100
7 ...20002001 2002
...9001 9002 9003 9004 9005 9006 9007
10000. . . . . . . . - Result is cache miss, page faults and poor
performance.
12Objectives
- Improve cache miss and page fault characteristics
in Large Array using Morton Array Layouts. - Reduce wasted memory in Morton layout.
- Improvement in extendibility of arrays.
13Implementation
- Interleaved bit patterns 4 -gt 0 1 0 0 -gt 0 0
1 0 0 0 09 -gt 1 0 0 1 -gt 1 0 0 0 0 0 115 -gt
1 1 1 1 -gt 1 0 1 0 1 0 1 (Interleaved Bits)
14Implementation continues
- Bit interleaved increment and decrement
- Bit interleaved increment101 1 -gt 1 0 0 0 1
1110 -gt 1 0 1 0 0(Changes are in
interleaved bits) - For any value a, bit interleaved increment is
given bya1 ((a 0xAAAAAAAA) 1)
0x55555555 - 0xAAAAAAAA1010..10101010 (32 bits)
- 0x55555555 0101 .01010101 (32 bits)
15Implementation continues
- Bit interleaved increment a1 ((a
0xAAAAAAAA) 1) 0x55555555 0 0 0 1 -gt Bit
interleaved 1 (0 1)OR 1 0 1 0 1 0 1
1 1 1 1 0 0AND 0 1 0 1 0 1 0 0 -gt Bit
interleaved 2 (1 0)
16Implementation continues
- More examples of bit interleaved increment0 0 0
0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0
0 1 0 1 1 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0
1 1 0 0 0 1 1
17Implementation continues
- Bit interleaved DecrementFor example,1 0 0 - 1
-gt 1 0 0 0 0 - 11 1 -gt 0 0 1 0 1(Changes
are in interleaved bits) - For any value a, bit interleaved decrement is
given by a-1 (a - 1) 0x55555555Where, - 0x55555555 010101010101 (32 bits)
18Implementation continues
- Bit interleaved decrement a-1 (a -1)
0x55555555 0 1 0 0 0 0 -gt Bit interleaved 4
(100) - 1 0 0 1 1 1 1AND 0 1 0 1 0
1 0 0 0 1 0 1 -gt Bit interleaved 3 (11)
19Implementation continues
- More examples of bit interleaved
decrement...1 0 0 0 0 - 1 0 0 1 0 1 0 0
1 0 1 - 1 0 0 1 0 0 0 0 1 0 0 - 1 0 0 0 0 1
0 0 0 0 1 - 1 0 0 0 0 0
20Implementation continues
- Morton Layout Array representation can be
implemented in two ways - First method is by maintaining lookup table of
bit interleaved array subscript for address
calculation. For example,0 -gt 0 0 0 01 -gt 0 0 0
12 -gt 0 1 0 03 -gt 0 1 0 1
21Implementation continues
- For example, any array subscript viz. 2 , 3
Value of 2 (1 0 ) from lookuptable -gt
0100Value of 3 ( 1 1) from lookuptable -gt
0101To get the Morton layout address,ROW
bitwise shift 1 COL0100ltlt1 010110000101,
that is, 1 0 0 0 0 1 0
1 1 1 0 1 (zipped address)
22Implementation continues
- Second Method to implement Morton Array Layout
Representation is by only using bit interleaved
increment and decrement without lookuptable.
23Implementation continues
- Implemented in C as two dimensional array
matrix class with Standard Template Library (STL)
compatibility so as to make it generic, that is,
it is not tied to any particular data structure
or object type. - Internally data are stored in STL vector
sequentially.
24Implementation continues
- Direct accessing the element of array matrix by
using array subscript is implemented using lookup
table. - Random Iterators are defined which make use of
bit interleaved increment and decrement without
using lookup table. - Iterators are generalization of pointers. They
are objects that point to other objects.
25Implementation continues
- Different types of random iterators are
implemented to provide the flexibility in using
the matrix class, such as, - Row Major iterator
- Column Major iterator
- Diagonal iterator
- Row iterator / Super row iterator
- Column iterator / Super column iterator
- Reverse Row Major iterator
26Samples
Sorted Data -9 -9 -8 -8 -8 -8 -7 -6 -6
-5 -4 -4 -2 -2 -2 -1 1 1 2 3 5 5 6
7
Original Data6 -9 -8 -1 -8 -6 -9 -2 -2
-5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1
7
Start
End
//Row Major sorting using STL Sort() mat1matori
coutltltmat1ltltendl sort(mat1.begin(),
mat1.end()) coutltlt"Sorted Data"ltltendl coutltltmat
1ltltendl
27Samples continues...
- Using Column Major iterator
Sorted Data -9 -7 -2 2 -9 -6 -2 3 -8
-6 -2 5 -8 -5 -1 5 -8 -4 1 6 -8 -4
1 7
Original Data 6 -9 -8 -1 -8 -6 -9 -2 -2
-5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1
7
Start
End
//Column Major sorting using STL
Sort() mat1matori coutltltmat1ltltendl sort(mat1.cb
egin(), mat1.cend()) coutltlt"Sorted
Data"ltltendl coutltltmat1ltltendl
28Samples continues...
Original Data 6 -9 -8 -1 -8 -6 -9 -2 -2
-5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1
7
Sorted Data -9 -8 -1 6 -9 -8 -6 -2 -6
-5 -4 -2 -8 -4 2 3 -7 -2 1 5 -8 1 5
7
//Row by row sorting using STL Sort() mat1matori
coutltltmat1ltltendl for(ritermat1.r2rbegin()riter
!mat1.r2rend()riter) sort((riter).begin(),
(riter).end()) coutltltmat1ltltendl
29Samples continues...
- Using super column iterator
Original Data 6 -9 -8 -1 -8 -6 -9 -2 -2
-5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1
7
Sorted Data -8 -9 -9 -8 -2 -8 -8 -4 -2
-6 -7 -2 2 -5 -6 -1 5 1 -4 5 6 3 1
7
//Column by column sorting using STL
Sort() mat1matori coutltltmat1ltltendl for(citerma
t1.c2cbegin()citer!mat1.c2cend()citer) sort
((citer).begin(), (citer).end()) coutltltmat1ltlt
endl
30Samples continues...
Sorted Data 6 -9 -8 -1 0 0 -8 -6 -9 -2
0 0 -2 -5 -6 -4 0 0 2 3 -4 -8 0 0
-2 1 -7 5 0 0 5 -8 1 7 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Original Data 6 -9 -8 -1 -8 -6 -9 -2 -2
-5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1
7
//Resizing the matrix mat1matori coutltltmat1ltltend
l mat1.resize(8, 8, 0) coutltltmat1ltltendl
31Improvement
- Morton array representation can be improved if we
can utilize the wasted spaces for non-square
matrices. - This can be achieved to some extent by using
partial interleaved bit patterns. - Portion of bits are interleaved and remaining
bits are left as it is. This helps in utilizing
the wasted space.
32Improvement continues
- For example Let us consider matrix of size 20 x
4 (actual reqd. space 80). Using Morton layout,
it will require 1000001010 0000000101
10000011115271 528 spacesWith modified
version, it will require1001010 0000101
1001111 791 80 spaces -gtImproved !!!
33Improvement continues
- More details 1000001010 -gt19 (row)
0000000101 -gt 3 (col) 1000001111 -gt527
(Morton location) 100001010 -gt 19 (row)
000000101 -gt 3 (col) 100001111 -gt 79
(Improved Morton)
Extra interleaving bits removed
34Improvement continues
- In the improved version, only N bits are
interleaved where N is total no. of bits in the
smallest of total row-1 and column-1 in row x
column matrix. - For example, in 20x4 matrix, the smallest no. is
4 and 4-13 which is 11 in binary, that is N2
as 3 is represented by 2 bits 11.
35Improvement continues
- Interleaving N bits and leaving remaining bits.
For example, for rows20-11910011 100 10 10
-gt2 bits are interleavedN2 row interleaved
bits.For columns4-1311000 01 01 -gt 2 bits
are interleavedN2 column interleaved bits.
36Improvement continues
- Bit interleaved increment/decrement still works.
- For bit interleaved Increment 001 1010 -gt Bit
interleaved 7 (111)OR 000 0101 -gt Bit Mask 001
1111 1 010 0000AND 111 1010 -gt Bit
Mask (complement) 010 0000 -gt Bit interleaved 8
(1000)
37Improvement continues
- For bit interleaved Decrement 010 0000 -gt Bit
interleaved 8 (1000) - 1 001 1111AND 111
1010 -gt Bit Mask 001 1010 -gt Bit interleaved
7 (111)
38Improvement continues
- Improved array location is calculated by adding
partial bit interleaved row and column. 100 10
10 -gt 19 000 01 01 -gt 3 100 11 11 79 - This method utilizes the wasted space to some
extent but it does not work better than original
Morton layout for square matrix which are not
power of 2.
39Improvement continues
- Improvement for square matrices
- Lets consider matrix NxN and say we want n bits
to be interleaved. There is no change in the
remaining bits of column bit patterns but for row
bit patterns, remaining bits will have special
bit patterns which are multiple of ?N/2n ?. So,
separate lookuptables are required for row and
column bit patterns. - Row bit and column bit patterns are added to get
the modified storage location.
40Improvement continues
- For example, 17x17 matrix with n2 interleaved
bits (actual 289 spaces reqd.) - Space required by normal Morton Layout will be
1000000000 01000000001100000000 7681769 - With Improved version, we have, ?17/22? 5Row
Lookuptable Col Lookuptable0000 0000 0 0000
00000000 0010 1 0000 00010000 1000 2 0000
01000000 1010 3 0000 01010101 0000 4 0001
00000101 0010 5... 0001 0001...
Changed by 5 101
41Improvement continues
- For 17x17 matrix,
- 16 from row lookuptable will be,10100 0000
- 16 from col lookuptable will be,00100 0000
- Total space required will be, 10100 0000
00100 0000 Improved!!! 11000 0000 -gt 384
1385 spaces reqd.
42Improvement continues
- This technique used for the square matrix still
leaves some extra space as shown in the example
of 17x17 matrix. In some cases, it even works
perfectly. However its an improvement over Morton
layout for square matrices which are not power of
2.
43Improvement continues
- Generalized improvement for both square and
non-square matrices - Each row and column have respective partially
interleaved bit patterns. - Either row or column whichever is greater, will
have some non-interleaved and some special bit
patterns. - Different lookup tables for rows and columns are
required to implement.
44Improvement continues
- Lets consider matrix of RxC with n interleaved
bits then r ?R/2n ? and c ?C/2n ? - If rgtc, row will have i regular non-interleaved
bits and some special bit patterns of multiple of
j, or vice versa. - If rgtcFor RowFor Column
n interleaved bits
45Improvement continues
- For rgtc, i ?abs(r - cx2i) is the least
where i 1, 2, 3,...j ?MAX(r/2i, c)? - For cgtr,i ?abs(c - rx2i) is the least
where i 1, 2, 3,...j ?MAX(r, c/2i)?
46Improvement continues
- For example, consider 70x13 matrix with n2
interleaved bits (actually 910 spaces required).
Space required by normal Morton Layout will
be,10000000100010 00000001010000
10000001110010830618307Here,R70, C13, r
?70/22 ? and c ?13/22 ? We have, rgtc,When i1,
abs(r - cx21)10When i2, abs(r - cx22)2When
i3, abs(r - cx23)14? i2 (only used by row in
this case)? j ?MAX(r/22, c)?5
47Improvement continues
- Row Lookuptable Col Lookuptable00000 00
0000 0 00000 00 000000000 00 0010 1 00000 00
000100000 00 1000 2 00000 00 010000000 00
1010 3 00000 00 010100000 01 0000 4 00001 00
000000000 01 0010 5... 00001 00
0001 00000 11 1010 15 00101 00
0000... 16
Changed by 5 101
Only used by Rowbecause row gt col
48Improvement continues
- For 70x13 matrix,
- 69 from row lookuptable will be,10100 01 0010
- 12 from col lookuptable will be,00011 00 0000
- Total space required will be, 10100 01 0010
00011 00 0000 Improved!!! 10111 01 0010 -gt
1490 11491 spaces
49Recommendations
- Devise more efficient algorithms to utilize the
wasted spaces by Morton Array Layout. - If an optimal compromised algorithm is devised
which works with both non-square and square
matrices, then it could be new research paper or
graduate research project.
50Conclusion
- Morton Array Layout and its variant to improve
the wasted spaces by Morton Layout was
implemented in C. - Improvements on Morton Layout such as improvement
for non-square and square matrices was
introduced. - But still optimal algorithm is to be researched.
51Conclusion continues
- C header file of Morton Array Layout matrix
class can be downloaded and evaluated from
http//www.sharad.info/cs691 - For any defects or feedback regarding this header
file, please email me at sharadb_at_bgnet.bgsu.edu
52Any Questions ?
53Thank You !