Title: Research Overview SC08
1Research Overview SC08
- Guang R. Gao
- ACM Fellow and IEEE Fellow
- Endowed Distinguished Professor University of
Delaware - ggao_at_capsl.udel.edu
2(No Transcript)
3Selected Research Topics
- Open64
- Cyclops-64 (C64) Applications
- Sorting
- LU Decomposition
- Ray Tracing
- Mstack Benchmark
- SWSWEEP3D
- Cyclops-64 Architechure
- FlashNode I/O
- OpenOpell
- The following slides will provide a brief look at
these topics by providing - A selected slide
- Members of the team associated with this project
4OPEN64 SELECTED SLIDE
5This slide describes the history of Open64 and
how it came to University of Delaware
6Open64 Team
Ge Gan
Jean Christophe Beyler
Juergen Ributzka
Tom St. John
Handong Ye
7C64 APP1 SORTINGSELECTED SLIDE
8This slide describes our sorting algorithm which
is optimized for the memory hierarchy of the
Cyclops-64
9C64 App1 Sorting Team
Jean Christophe Beyler
Kelly Livingston
Joseph Manzano
10C64 APP2 LU DECOMPOSITION SELECTED SLIDE
11This slide shows the Performance of the LU
Decomposition on the C64 after each optimization
was added.
12C64 App2 LU Team
Daniel Orozco
Ioannis Vennetis
Juergen Ributzka
13C64 APP3 RAY TRACINGSELECTED SLIDE
14This slide describes an optimization in Ray
Tracing that will be used in our Cyclops-64
implementation of Ray Tracing
15C64 App3 Ray Tracing Team
Kelly Livingston
16C64 APP4 MSTACK BENCHMARKSELECTED SLIDE
17This slide shows the memory access pattern of the
Mstack benchmark across the 3 dimensional input
array
18C64 App4 Mstack Team
Ioannis Vennetis
Joseph Manzano
Mark Pellegrini
Ryan Taylor
Tom St. John
19C64 APP5 SWSWEEP3DSELECTED SLIDE
20Sweep3D is an application with wavefront type
dependencies. So far, existing architectures have
failed to exploit fine grain parallelism. Cyclops
64 has adequate support for this kind of
parallelism, and it is likely to achieve good
scalability, even at a fine grain level.
21C64 App5 SWSWEEP3D Team
Daniel Orozco
22C64 ARCHITECTURE FLASHNODE I/OSELECTED SLIDE
23The following slide shows the storage hierarchy
of the C64 with the additions of Memory Mapped
Flash Memory and File Cache Flash Memory
24FlashNode I/O Team
Yuhei Hayashi
Brian Lucas
Dimitrij Krepis
25OPENOPELL SELECTED SLIDE
26The following slide shows how the OpenOpell
toolchain generates code for the Cell Broadband
Engines PPU and SPU processors from a single
source code.
27OpenOpell Team
Joseph Manzano
Ge Gan
Ziang Hu
Yi Jiang
28CAPSL Alumni
Seattle
Edmonton
Portland
CAPSL (1996-2007)
Boston
New York
Philadelphia
San Francisco
Washington
Los Angeles
Phoenix
Oversea H. Sakane (Japan) R. Yakay (Turkey) I.
Dogru I. Vennetis (Greece)
Xin Wang (China) Yan. Xie (China)
29DAPLDS
- Dynamically Adaptive Protein Ligand Docking
30Overview
- The DAPLDS project aims to build a
computational environment to assist scientists in
understanding the atomic details of
protein-ligand interactions
Global Computing Lab 2008
31Objectives (I)
- Explore the multi-scale nature of dynamic
protocol model adaptations for protein-ligand
docking
Global Computing Lab 2008
32Objectives (II)
- Develop methods and models that efficiently
accommodate computational adaptations in VC
environments supported by BOINC - Extend knowledge with respect to protein-ligand
complexes and make this knowledge accessible to
the scientific community via cyber-infrastructures
Global Computing Lab 2008
33Protein-Ligand Docking
- Computational methods simulate the protein-ligand
interaction - The resulting structural information can be used
for the design of new drugs
?
protein
ligand
protein ligand
Global Computing Lab 2008
34Docking Algorithm
MD Simulated Annealing Conformational Search
Dock onto 3D Grid Protein Model
Restore All-Atom Protein Model Minimize
Energy Sort Conformations by Energy
Model Initial 3D Conformation
Generate Random Rotations
Generate Random Conformations
Global Computing Lab 2008
35Docking_at_Home
- High-throughput, protein-ligand docking
simulations are performed on a computational
environment that deploys a large number of
volunteer computers connected to the Internet
Global Computing Lab 2008
36Docking_at_Home
Global Computing Lab 2008
37Result Post-Processing
- Protein-ligand docking complexes are scored based
on energy values. - Estimation of energy values is inaccurate because
of modelling assumptions - A structure with a minimum energy is not always a
native-like structure
?
Protein docked in the nature
Global Computing Lab 2008
38Result Post-Processing
Thousands of results provided by Docking_at_Home
Hypothesis If protein-ligand docking is
simulated using a sufficiently accurate model, a
large number of independent simulations can
eventually converge to a native-like structure
Global Computing Lab 2008
39Result Post-Processing
Clustering thousands of results provided by
Docking_at_Home
- Adaptive k-means clustering is used to group
similar ligand conformations - If the simulations converge, then the largest
cluster with minimum energy is also the most
likely to contain more native-like structures - The centroid of the biggest cluster is selected
as a probable native-like structure
Global Computing Lab 2008
40Result Post-Processing
The adaptive clustering is a promising method to
aid in the selection of native-like ligand
conformations from a significantly large set of
candidates
Protein-ligand conformations
Selected cluster
Centroid RMSD0.102
2D representation of protein-ligand conformations
(Energy, RMSD)
Global Computing Lab 2008
41Docking_at_Home Screensaver
42(No Transcript)
43Acknowledgements
GCL members (Fall 2008) Trilce Estrada Joe
Davis Abel Licon Pat McClory Adnan
Ozsoy James Atlas Reed Matrz Obaidur
Rahaman Kevin Kreiser
Sponsors
DAPLDS Collaborators Sandeep Patel (UD) David
Anderson (UC Berkeley) Kevin Reed (World
Community Grid, IBM) Charles L. Brooks III and
Roger Armen (U Mich) Pat Teller (UTEP)
Group Webpage http//gcl.cis.udel.edu
Global Computing Lab 2008
44RNAVLab
Case StudyUsing Genetic Algorithms to Generate
Training Data Abel Licon, Reed Martz and Michela
Taufer
44
Global Computing Lab 2008
45RNAVLab
- A collection of tools written in Java for
- Prediction
- Sampling
- Analysis
- Provides high-level interface to parallel
resources
45
Global Computing Lab 2008
46RNAVLab Framework
46
Global Computing Lab 2008
47RNAVLab Framework
- Provide users with a Web interface
- Supply services with the RNAVLab backend
- Parallel resources
- Sequence alignment
- Structure comparison
- Pseudoknot classification
47
Global Computing Lab 2008
48RNAVLab
48
Global Computing Lab 2008
49RNAVLab
49
Global Computing Lab 2008
50Web-Page Interface
50
Global Computing Lab 2008
51Web-Page Interface
8/17/87 3/15/07 5/6/08 11/19/08 11/28/08 3/29/09 5
/3/09 1/15/10 12/24/11 7/5/12 12/27/12
HIV Type 1 beet soil-borne virus tobacco mild
green mosaic virus foot-and-mouth disease virus,
serotype C Visna-Maedi virus Bacillus
subtilis Escherichia coli human coronavirus
229E SARS coronavirus cucurbit aphid-borne
yellows virus oilseed rape mosaic
virus E.Coli Nemesia ring necrosis virus pepper
mild mottle virus
Global Computing Lab 2008
51
Global Computing Lab 2008
52Web Service Backend
52
Global Computing Lab 2008
53Web Service Backend
- Provide RNAVLab services via the REST protocol
- Enable Java clients and otherapplications to use
the Web service
53
Global Computing Lab 2008
54Case Study
- Predict very long RNA secondary structures
- Attempt to build large structures from small
sub-structures - Challenges
- Search space is huge
- Possible combinations are 2(n2)
- Searching entire spaces unfeasible
54
Global Computing Lab 2008
55Genetic Sampling
- Use a genetic algorithm to search the space of
possible sub-structures - Submit predictions to Condor Grid via RNAVLab
interface - Use generated training data to train a classifier
55
Global Computing Lab 2008
56GA Evolution
Global Computing Lab 2008
57Future Work
- Use training data to train classifier
- Introduce more prediction algorithms
- MFE based
- Alignment based
- Use trained predictor on unknown set and quantify
results
Global Computing Lab 2008
58Acknowledgments
GCL members (Fall 2008) Trilce Estrada Joe
Davis Abel Licon Pat McClory Adnan
Ozsoy James Atlas Reed Matrz Kevin Kreiser
Obaidur Rahaman
Sponsors
RNAVlab Collaborators Ming-Ying Leung, Kyle L.
Johnson, David Mireles, Roberto Araiza, and
Olac Fuentes (UTEP)? Thamar Solorio (UT Dallas)?
Group Webpage http//gcl.cis.udel.edu
58
Global Computing Lab 2008
59MD on GPUs
Molecular Dynamics Simulations on Graphics
Processing Units Joe Davis, Adnan Ozsoy, Sandeep
Patel, and Michela Taufer
60Introduction
- Graphics Processing Units (GPUs) have been
extensively used in graphics intensive
applications - Development driven by economy, e.g. video game
industry, motion picture - The inherent parallelization of GPUs makes them
suitable for scientific applications - Recent exploration of potential of GPUs for
mathematics and scientific computing - Medical diagnostics
- GPUs coupled to MRI Hardware (Stone et al. Proc.
of 2007 Computing Frontiers conference, 7-9 May,
2008) - Molecular modeling
- Electrostatic Potential Calculation (Stone et al.
J. Comp. Chem. 28, 16, pp. 2618-2640) - Ion Placement (Stone et al. J. Comp. Chem. 28,
16, pp. 2618-2640) - Van der Waals Fluids / Polymers (Anderson et al.
J. Comput. Physics 2008)
60
Global Computing Lab 2008
61GPGPUs
- Special purpose hardware specific types of
calculations - Protein Explorer systems and its LSI 'MDGRAPE-3
chip (Taiji et al. in Proc. of 2003 ACM/IEEE
Supercomputing Conference,?15-21 Nov. 2003) - Anton and its 12 identical MD-specific ASICs
(Shaw et al. in Proc. of the 34th Annual
International Symposium on Computer Architecture,
9-13 June, 2007) - General Purpose GPUs (or GPGPUs) cost effective
and readily available in recent workstations - GeForce FX5600
- 1.5GBytes memory
- Cost 2,795
- GeForce 9800 GX2
- Dual GPU-based graphics card
- 512MBytes memory per GPU
- Cost 665
61
62Programming GPUs
- Past APIs originally through graphics interfaces
e.g., OpenGL - Not easy to use for general usage cast
computation in terms of graphics operations - Draw the calculation
- Interpret image post-calculation
- Present NVDIA CUDA (Compute Unified Device
Architecture) language/library - Easy to use CUDA provides minimal set of
extensions necessary to expose power of GPGPUs - Includes C-compiler and development tools
- CUDA optimization strategy
- Maximize independent parallelism
- Maximize arithmetic intensive computation
- Take advantage of on-chip per-block shared memory
- Do computation on the GPUs and avoid data transfer
From CUDA Programming Guide, NVIDIA
62
Global Computing Lab 2008
63MD on GPUs
- Why MD on GPU?
- Non-bond expand scales of time and physical
dimension (system complexity) - All-atom resolution (micro to milliseconds)
- Course-graining (seconds)
- Continuum physics with molecular detail?
- MD on GPU Non-bond interactions (pair
interactions) - Non-bond list is generated by checking all pair
distances against the cut-off in parallel
(efficient tiling approach) - A thread iterates through the non-bond list for a
single atom and accumulates the non-bonded
interactions
63
Global Computing Lab 2008
64Water Model
- Flexible Water SPC/Fw (Wu et al, J. Chem. Phys.,
2006) - Intra-molecular potential
- Computed on GPU using lists (bond/angle lists)
- Non-bonded potential
- Lennard-Jones
- Shifted-force electrostatics with cut-off only
(no Ewald) - List-based evaluation
- Computing system
- GPU NVIDIA Quadro FX 5600
- CPU (CHARMM) Intel Xeon 5150 2.66 GHz (Woodcrest)
64
Global Computing Lab 2008
65Performance
- Performance metrics number of MD time steps
calculated in one second
GPU is 7x faster on average!
65
Global Computing Lab 2008
66Accuracy
67Conclusions
- Current achievements
- Implementation of a local version of MD code on
current generation of GPUs - Straightforward, naive implementation
- Promising results
- Work in progress
- Optimization and tuning of performance
- Expand MD options (additional potentials, PME)
- Final goals
- Effective compilation of CHARMM on GPU
- Study of large solvent systems for
long simulation times, up to 100ns, with CHARMM
67
Global Computing Lab 2008
68GCL at UD
GCL members (Fall 2008) Trilce Estrada Joe
Davis Abel Licon Pat McClory Adnan
Ozsoy James Atlas Reed Matrz Obaidur
Rahaman Kevin Kreiser Michela Taufer
Sponsors
GPU_at_GCL Collaborators Sandeep Patel (UD) Charles
L. Brooks III and Roger Armen (U Mich)
Group Webpage http//gcl.cis.udel.edu
68
Global Computing Lab 2008
69jTopaz
Plug your PC into the Grid using Mozilla Patrick
McClory Martin Swany Michela Taufer
69
Global Computing Lab 2008
70What is GridFTP
- Extension of the standard File Transfer Protocol
(FTP) - Designed with three main principles in mind
- Security
- Reliability
- High Performance
Global Computing Lab 2008
71Current Software
- globus-url-copy - script provided by Globus
Toolkit - can only transfer one file at a time
- require reauthenticating/reauthorizing for each
transfer - UberFTP interactive client
- Both require having the Globus Toolkit libraries
installed on the users machine
Global Computing Lab 2008
72The Challenge
- Although GridFTP has numerous advanced features,
there is a lack of easy to use client software
for end users to take advantage of the Grid.
72
Global Computing Lab 2008
73jTopaz
- Our GridFTP client software addresses this
challenge by providing a simple, easy to use
interface to GridFTP servers - jTopaz is packaged as a Firefox extension
- jTopaz is portable across platforms
- Work on Linux, Windows, and Mac machines
Global Computing Lab 2008
74(No Transcript)
75Java CoG Toolkit
- Java Commodity Grid Toolkit
- Allow Grid users, administrators, and developers
to work with the Grid from a higher abstraction
level - jGlobus Library
- Provide basic API's for interacting with Grid
services such as GridFTP and MyProxy - Key component in jTopaz
Global Computing Lab 2008
76Future Work
- Currently jTopaz only implements a simple
client-server file transfer model - Future work include advanced features
- Third party transfers
- Parallel transfers
- Partial file transfers
Global Computing Lab 2008
77jTopaz Demo
Global Computing Lab 2008
78jTopaz Demo
Select jTopaz in the Tools menu
79jTopaz Demo
Enter GridFTP server info
80jTopaz Demo
Local Files
Remote Files
81Acknowledgements
GCL members (Fall 2008) Trilce Estrada Joe
Davis Abel Licon Pat McClory Adnan
Ozsoy James Atlas Reed Matrz Kevin Kreiser
Obaidur Rahaman
Sponsors
jTopaz Collaborators Martin Swany (UD) Karan
Bhatia (SDSC)
Group Webpage http//gcl.cis.udel.edu
81
Global Computing Lab 2008
82Intelligent CompilersJohn Cavazos
(cavazos_at_cis.udel.edu)The Adaptive Compilation
Environment (ACE) ProjectDept. of Computer
Information Sciences, University of Delaware
83Motivation
- Architectures are getting increasingly more
complex - Finding efficient heuristics to solve hard
compiler problems challenging - Quick retargeting of optimizing compilers for new
architectures are needed
84Solution Intelligent Compilers
- Machine learning
- Automates the process of tuning optimizing
compilers - Allows specialization of compiler to targeted
hardware
85Overview Intelligent Compiler
86Methodology Description
- Phrase as a Machine Learning Problem
- Feature Construction
- Generate Training Instances
- Feed Instances to Learning Algorithm
- Integrate the Learned Heuristic
- Evaluate the Learned Heuristic
87Case Study PathScale
88Case Study
- PC Model trained using (1) program
characteristics (from performance counters), (2)
best optimization sequences for each program, and
(3) speedups obtained from best seqs. - Predictive model predicts which optimizations
will be beneficial optimizations for each
application
89SPEC C/C and Fortran Benchmarks
Obtained 17 average improvement over most
aggressive optimization level (-Ofast) in an
industry-strength compiler. Experiments
performed using PathScale compiler.
90Future Work
- Optimization Phase-Ordering
- Multicore optimizations
For more information http//www.cis.udel.edu/cav
azos
91Research Overview SC08
- Murat Bolat, Liang Gu, Jakob Siegel, Ryan Taylore
- Principal Investigator Xiaoming Li
- University of Delaware
- xli_at_ece.udel.edu
92Library Generation and Optimization Projects -
Model-driven optimization for FFTW - Lattice
Boltzmann Method for CUDA
93Model-driven optimization for FFTW
94Model-driven optimization for FFTW
- Goal
- Understand why FFTW produces high-performance
code. - Discover the role of FFTWs empirical search
engine in code optimization. - Generate FFT library with equally high quality
without empirical search.
95Performance of our model-driven FFTW
Better
Our Code
96Search time of our model-driven FFTW
Better
Our Code
97Accelerate Lattice Boltzmann Method (LBM) on CUDA
98Accelerate LBM on CUDA
- The LBM models Boltzmann particle dynamics on a
2D or 3D lattice. - LBM is one of the most important physical
simulation methods.
99Challenges of Optimize LBM on CUDA (1)
- Extensive and irregular data exchange between
lattice cells
Our Optimization Techniques (1) Co-optimize
global memory and shared memory layout. (2)
Coalesce memory accesses. (3) 2-D data padding
and buffering.
100Challenges of Optimize LBM on CUDA (2)
- Boundary testing and barrier detection
Our Optimization Techniques (1)
Control-structure splitting. (2) Kernel
splitting. (3) Adaptive thread grid and block
size selection.
101- Performance of our LBM code on CUDA
- 140X speedup
- Scale up well with problem size
Better
Our Code