Title: Infinite Mixture ModelBased Clustering of DNA Microarray Data Using openMP
1Infinite Mixture Model-Based Clustering of DNA
Microarray Data Using openMP
2DNA microarrays
3Clustering MA data (not computers)
Conditions
Clustering
Genes
4Why do I need sooo much computing power?
- Goal Determine all posterior pairwise
probabilities of two genes/samples belonging to
same cluster - IMMs cannot be solved analytically
- Use sampling method to approximate posterior
probabilities - Typically
- 10,000 iterations
- Sevaral 1,000 genes
- 10 200 samples
- Compute O(genes2 x samples2) probabilities per
iteration - Some overhead for cluster-reassignment, other
model parameters
5Open Multi-Processing (OpenMP)
- Facilitates parallelization of C and Fortran
code for shared memory environments - e.g. multi-processor machines
- Set of compiler directives, system variables, and
library functions - Platform-independent
- Website http//www.openmp.org/
- Parallelize sequential code by using compiler
directives - Relatively small programming effort
- Reduced risk for programming errors
- Use of shared memory
- Reduces the communication overhead required to
synchronize multiple threads - But cannot run threads on multiple nodes
6Hello World in openMP
- include ltstdio.hgt
- include ltomp.hgt
- int main(int argc, char argv)
- int id, nthreads
- pragma omp parallel private(id)
-
- id omp_get_thread_num()
- printf("Hello World from thread d\n",
id) - pragma omp barrier
- if ( id 0 )
- nthreads omp_get_num_threads()
- printf("There are d
threads\n",nthreads) -
-
- return 0
7OpenMP examples
- Header
- include ltomp.hgt
- Library Functions
- omp_set_num_threads(4)
- printf("number of threads d\n",
omp_get_num_threads()) - Compiler directives
- pragma omp for schedule(dynamic, 1)
- for(j0jltQj)
- clusterProbabilitiesj getProbCsMissing2(i,j
,Contexts) -
8OpenMP examples
- More Compiler directives
- pragma omp for
- for(j0jltQj)
-
- pragma omp critical
- sigmasij1.0/gengam(betai vi/2.0,betai
/2.0) -
9OpenMP examples
- More Compiler directives
- int i
- pragma omp parallel for private(i, pos)
- for(j0iltTi)
-
-
- same as
- pragma omp parallel for private(pos)
- for(j0iltTi)
- int i
-
-
10Some more compiler directives
- Reduction
- pragma omp do reduction (sum)
- ? summarize the share variable sum
- Parallel region
- pragma omp parallel
-
-
-
- Sections
- pragma omp sections
-
- pragma omp section
- Code block 1
- pragma omp section
- Code block 2
11The making-of
- Start an interactive session
- jfreuden_at_fructosegt qsub -I -l nodes1opteron
- Intel compiler
- jfreuden_at_bmi-opt2-01gt module load openmpi-intel
- jfreuden_at_bmi-opt2-01gt icpc -w openmp
- g compiler
- jfreuden_at_bmi-opt2-01gt module load gcc-4.2.3
- jfreuden_at_bmi-opt2-01gt g -fopenmp
12The batch file
- PBS -S /bin/csh
- PBS -l nodes1opteronppn2
- PBS -l walltime180000
- PBS e /users/jfreuden/runGimm/stderr.txt
- PBS -o /users/jfreuden/runGimm/stdout.txt
- setenv OMP_NUM_THREADS cat PBS_NODEFILE
grep HOST wc -l - module load intel
- cd /users/jfreuden/runGimm/
- R CMD BATCH runGimm.R
13Simulation study Non-informative samples
- 4 gene clusters of sizes 20, 20, 80, and 80
- 3 sample clusters of size 5
- Additional samples
- m 5, 10, 20, 50, 100
- No change in expression
- Same noise level
- 100 repeats for each level
14Simulation study Non-informative samples
15Simulation study Non-informative samples
16Simulation study Non-informative samples
17Questions?Comments?
18Additional Slides
19Clustering
Dhaeseleer (2005)
20Example for Gibbs Sampling BUGS
21Simulation study Simple Case
22Simulation study Simple Case
23Simulation study Time course 1
24Simulation study Time course 2
25Simulation study Non-informative samples
26Simulation study Non-informative samples