Title: CSE 710 Progress Report Topic: Prime Numbers
1CSE 710 Progress ReportTopic Prime Numbers
- Philip Matuskiewicz
- November 3, 2009
- Available online (CSE710 Progress Nov 3, 2009)
- http//famousphil.com/school.php
2My Desires (the overall problems to be solved)
- Gain knowledge dealing with prime number
calculations on computers - Learn cuda and write parallel code
- Familiarize myself with super computer
terminology - E.g. OpenMP, MPI
3Overall Goals of this semester
- Devise a sequential algorithm (computer
procedure) for calculating prime numbers - Operable Range 1-1,000,000
- Obtain the runtime of the algorithm
- Sequential Languages
- Java High Level with Object Oriented Features
(Easy) - C Low Level Procedural Language (Slightly more
difficult) - Output prime numbers to
- A text file
- The console System.out.println(), fprint()
- Verify output with known prime number table
- Parallelize the algorithm
- Same constraints as sequential algorithm
- Written in Cuda using multiple nodes of
Magic.cse.buffalo.edu - Report findings on sequential vs parallel runtime
4Plan of Attack / Status
- Green is done, Red is not complete, Yellow is
mostly complete - Find an algorithm to calculate prime numbers
- Implement the algorithm in Java
- Implement the algorithm in C
- Reference the Cuda Documentation for possible
assistance in the following steps - Implement the algorithm in Cuda on a single node
- Migrate the algorithm to multiple nodes
5Procedure Finding an Algorithm
- A simple method to figure out if a number is
prime is to divide every number between 2 and the
square root of the prime candidate. - If the divisor leaves a remainder, the number is
prime. - This works up to 1,000,000
- Sources
- http//science.jrank.org/pages/5482/Prime-Numbers-
Finding-prime-numbers.html - Algorithm
Explaination - http//primes.utm.edu/lists/small/millions/ -
verification table
6Procedure Pseudo Code
- Temp Ceiling ( square root ( number ) )
- From j Temp, test
- (number ! j) AND (number mod j 0) ARE TRUE
- This is NOT a prime number, break out of the loop
now - Else this could be a prime number
- continue the loop until the end
- If the loop finished, echo the number and test
the next prime number - Multiples of 2 except 2 can be ignored (never
prime)
7Procedure Implement the algorithm in Java
- import java.io.
- public class Finder
- //set above 3 - max number to goto
- private int max 1000000
- public Finder()
- try
- File f new File( "D\\primejavaresults.txt"
) - FileWriter out new FileWriter( f )
- BufferedWriter writer new BufferedWriter( out
) - long start System.currentTimeMillis()
- //special cases - (2,3)
- System.out.println("2")
- writer.write("2") writer.newLine()
- System.out.println("3")
- writer.write("3") writer.newLine()
-
- int i 3 //start at 3, first while
iteration will bump this to 5 - int count 2 //see 2 special cases
above
if (isPrimeNumber)
System.out.println(i)
writer.write(""i) writer.newLine()
count
i i2//skip even numbers (4,6,8,etc)
//end while long end
System.currentTimeMillis()
System.out.println("Execution time was
"(end-start)" ms.") System.out.println(
"Total prime numbers found in this query "
count) writer.newLine()
writer.write("Total prime numbers found in this
query " count) writer.newLine()
writer.close() catch ( IOException e
) System.out.println("problem writing to
file") public static void
main(String args) new Finder() //source
files are available upon request
8Procedure Results in Java
- Execution time was 3618 ms on my Laptop.
- Execution time was 2630 ms on Magic
9Procedure Implement the algorithm in C
- include ltstdio.hgt
- include ltmath.hgt
- include lttime.hgt
- int main (int argc, char argv)
-
- clock_t start clock()
- if (argc ! 2)
- printf ("Please type in ./a.out ltnumbergt\n")
- printf("ltnumbergt being how far to count up\n")
- return (0)
-
- FILE file
- file fopen ("coutprime.txt", "w")//w
overwrites file - int max atoi(argv1)
-
- //base case
for (j2 jltii j) if ((i ! j) (i j
0)) isPrime 0 break else
if(isPrime 0) isPrime 1 //end
for if (isPrime 1) fprintf(file,
"d\n", i) printf("d\n",i) count count
1 ii2//increment i by 2 skipping
evens printf("Time elapsed (accurate to 1
second) f\n", ((double)clock() - start) /
CLOCKS_PER_SEC) fprintf(file,"Time elapsed
(accurate to 1 second) f\n", ((double)clock() -
start) / CLOCKS_PER_SEC) fprintf(file, "total
prime numbers found is d \n",
count) printf("total prime numbers found is d
\n", count) fclose(file) return(0) //sou
rce files are available upon request
10Procedure Results in C
- I dont have Linux configured fully yet on my
Laptop Sorry no results as of yet - Time elapsed 1.070000 s
- The time.h c library is accurate to only 1 second
11Completion Schedule Implement the Algorithm in
Cuda
- The Plan
- Read the documentation more thoroughly
- Get the algorithm working on a single node
- Timeframe before Thanksgiving week
- Begin using MPI at some level
- Timeframe during Thanksgiving week
- Report finalized Results
- Timeframe Week after Thanksgiving
12Parallelization Strategy
- Divide the problem
- Give all available processors a subset of the
problem - Devise an algorithm to give processors amount
based on size of number - A processor handling small numbers will get more
small numbers - A processor handling a large number will get
fewer large numbers - TODO Implement this strategy this upcoming week
- Id like to use all 13 nodes and all available
12,000 processors to divide the problem to see
what kind of speedup there is - Uncertain if I will get to Implementing MPI this
semester - it depends on how long Cuda takes
- Maximum number of calculations per test will be
1000 - Sqrt(1,000,000)
- Most numbers will be weeded out and never make it
to 1000 - mod (2) and (3) are ran first and tend to weed
out non primes quickly
13Foreseeable Problems
- Will I need to increase my 1 million target to 1
billion??? - Computers are becoming very fast
- C just gets over the 1 second mark for any type
of accuracy. - Cuda may not have the accuracy I need
(milliseconds) - Possibly run an external analysis application???
- Does my algorithm work to 1 billion?
14Foreseeable Problems Continued
- Race Condition
- The file/console will be written as results are
fed back from the processors - Try to delegate the problem appropriately so
first processors return their results first - Its going to be mostly Trial and Error
- Make multiple files (pieces) for each processor
and merge them at the end? - This will likely be how I solve this problem
15Expected Results
- When running the algorithm on Cuda
- I expect to see the runtime go from roughly 2
seconds to roughly 100 milliseconds provided I
overcome all the foreseeable problems and other
unexpected issues.
16Questions? Comments? Concerns?