Title: Twoway External Merge Sort
1Two-way External Merge Sort
2Outline
- Assumptions
- Two-way external merge sort algorithm
- Input and output file format
- File manipulation in JAVA
- Code skeleton
3Assumptions
- The main memory is split into three fixed-size
blocks - Two input blocks hold the records to be sorted
- One output block hold the records that have been
sorted - A record has fixed size
- A file comprises 2k blocks of records
input block 1
input block 2
output block
main memory
a file ( k 2)
a record
4Two-way External Merge Sort Algorithm
1
11
7
12
input
9
8
2
4
3
5
6
10
5Two-way External Merge Sort Algorithm (Cont)
input block 1
input block 2
output block
output file
1
2
1
3
5
4
6
6Two-way External Merge Sort Algorithm (Cont)
input block 1
input block 2
output block
output file
1
2
1
3
5
2
4
6
7Two-way External Merge Sort Algorithm (Cont)
input block 1
input block 2
output block
output file
1
2
1
3
5
2
4
6
3
8Two-way External Merge Sort Algorithm (Cont)
input block 1
input block 2
output block
output file
1
2
1
3
5
2
4
6
3
9Two-way External Merge Sort Algorithm (Cont)
input block 1
input block 2
output block
output file
1
2
1
1
3
5
2
2
4
6
3
3
10Two-way External Merge Sort Algorithm (Cont)
input block 1
input block 2
output block
output file
1
2
1
3
5
2
4
6
3
11Two-way External Merge Sort Algorithm (Cont)
input block 1
input block 2
output block
output file
1
2
4
1
3
5
2
4
6
3
12Two-way External Merge Sort Algorithm (Cont)
input block 1
input block 2
output block
output file
9
2
4
1
10
5
2
12
6
3
13Input and Output File Format
- Input/output file consists of records
- They are not text files
- Student record structure
- sid int (4 bytes), PRIMARY KEY
- fName String (16 bytes)
- lName String (16 bytes)
14File Manipulation in JAVA
- Create a file
- RandomAccessFile f new RandomAccessFile
(filename, mode) - Open a file
- RandomAccessFile f new RandomAccessFile
(filename, r) - Seek
- f.seek(pos)
- Read
- f.sead(buffer, offset, length)
- f.read(buffer)
- Write
- f.write(buffer, offset, length)
- Close a file
- f.close()
15Code Skeleton
- A JAVA console application
- Three classes
- MyRecord
- MyBlockMgr
- Sort
16MyRecord Class
- public class MyRecord implements Comparable
- / Record structure
- sid int (4 bytes) -- search key
- fName String (16 characters - 16
bytes ) - lName String (16 characters - 16
bytes ) - /
- public static int length 36
- private byte buf
- public MyRecord()
- buf new bytelength
-
- public MyRecord(byte b)
- buf b
-
17MyRecord Class (Cont)
- public int getSid()
- int value 0
- for(int i0 ilt3 i)
- value value (bufi 0x00FF)
- value value ltlt 8
-
- value value (buf3 0x00FF)
- return value
-
- public void setSid(int id) throws
IOException - buf0 (byte)(((id gtgtgt 24) 0x00FF))
- buf1 (byte)(((id gtgtgt 16) 0x00FF))
- buf2 (byte)(((id gtgtgt 8) 0x00FF))
- buf3 (byte)(((id gtgtgt 0) 0x00FF))
-
18MyRecord Class (Cont)
- public String getFirstName()
- String fn new String()
- for(int ii4iilt20ii)
- fn (char)bufii
- return fn.trim()
-
- public String getLastName()
- String ln new String()
- for(int j20jlt36j)
- ln (char)bufj
- return ln.trim()
-
19MyRecord Class (Cont)
- // Get byte-array representation of a record
- public byte getBytes()
- return buf
-
- //Import a record from its byte-array
representation. - //"buffer" indicates the byte array
containing the record content - //"offset" indicates where the record content
starts - public void importRecord(byte buffer,int
offset) - for(int i0iltlengthi)
- bufi bufferoffset
-
- public int compareTo(Object o)
- return this.getSid() - ((MyRecord)(o)).getSi
d() -
20MyBlockMgr Class
- Manage the three memory blocks
- Provide methods to read/write a block from/to a
relation - You implement the following methods
- int getNumOfBlocks(FileStream f)
- Return the number of blocks in a file
- void readBlock(FileStream fin, int blockIndex,
int inBlock) - Read the block specified by blockIndex from
FileStream fin into inB1 (if inBlock 1) or
inB2(if inBlock 2). - It first seeks to the position
blockIndexblockSize, and then read blockSize
bytes into inB1 or inB2. - void writeBlock(FileStream fout)
- Append the content of outB (blockSize bytes)
into file fout. - No seek is performed in this routine.
21Sort Class
- Sort a given file on attribute sid in ascending
order - Execute method is the entry of your code
- You are free to develop additional classes.
22Sort Class (Cont.)
- public class Sort
- RandomAccessFile fin, fout
- String inputFile
- MyBlockMgr bm
- int blockNum
- int recNumPerBlock
- public Sort(String inputFile)
- this.inputFile inputFile
-
- private String getOutputFileName(int round)
- return inputFile ".out" round
-
23 public void execute()throws IOException
try fin new RandomAccessFile(inputFi
le, "r") catch (IOException e)
System.out.println("Error openning the input
File! Program aborted") System.exit(1)
bm new MyBlockMgr() blockNum
bm.getNumberOfBlocks(fin) recNumPerBlock
bm.getInBlock1().length / MyRecord.length
internalSortBlocks() fin.close()
externalSortBlocks()
24- public void printFile(String fileName)throws
IOException -
- RandomAccessFile fin new RandomAccessFile(fi
leName,"r") - byte b new byte36
- for(int i0iltfin.length()/MyRecord.lengthii
1) -
- fin.read(b)
- MyRecord m new MyRecord(b)
- System.out.println(i" "m.getSid()"
"m.getFirstName()" "m.getLastName()) -
- fin.close()
-