Research Overview SC08 - PowerPoint PPT Presentation

1 / 101

About This Presentation

Title:

Research Overview SC08

Description:

Research Overview SC08 – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 102

Provided by: compile

Category:

Tags: dui | k9 | overview | pu | research | sc08

more less

Transcript and Presenter's Notes

Title: Research Overview SC08

1
Research Overview SC08

Guang R. Gao
ACM Fellow and IEEE Fellow
Endowed Distinguished Professor University of
Delaware
ggao_at_capsl.udel.edu

2
(No Transcript)
3
Selected Research Topics

Open64
Cyclops-64 (C64) Applications
Sorting
LU Decomposition
Ray Tracing
Mstack Benchmark
SWSWEEP3D
Cyclops-64 Architechure
FlashNode I/O
OpenOpell

The following slides will provide a brief look at
these topics by providing
A selected slide
Members of the team associated with this project

4
OPEN64 SELECTED SLIDE
5
This slide describes the history of Open64 and
how it came to University of Delaware
6
Open64 Team
Ge Gan
Jean Christophe Beyler
Juergen Ributzka
Tom St. John
Handong Ye
7
C64 APP1 SORTINGSELECTED SLIDE
8
This slide describes our sorting algorithm which
is optimized for the memory hierarchy of the
Cyclops-64
9
C64 App1 Sorting Team
Jean Christophe Beyler
Kelly Livingston
Joseph Manzano
10
C64 APP2 LU DECOMPOSITION SELECTED SLIDE
11
This slide shows the Performance of the LU
Decomposition on the C64 after each optimization
was added.
12
C64 App2 LU Team
Daniel Orozco
Ioannis Vennetis
Juergen Ributzka
13
C64 APP3 RAY TRACINGSELECTED SLIDE
14
This slide describes an optimization in Ray
Tracing that will be used in our Cyclops-64
implementation of Ray Tracing
15
C64 App3 Ray Tracing Team
Kelly Livingston
16
C64 APP4 MSTACK BENCHMARKSELECTED SLIDE
17
This slide shows the memory access pattern of the
Mstack benchmark across the 3 dimensional input
array
18
C64 App4 Mstack Team
Ioannis Vennetis
Joseph Manzano
Mark Pellegrini
Ryan Taylor
Tom St. John
19
C64 APP5 SWSWEEP3DSELECTED SLIDE
20
Sweep3D is an application with wavefront type
dependencies. So far, existing architectures have
failed to exploit fine grain parallelism. Cyclops
64 has adequate support for this kind of
parallelism, and it is likely to achieve good
scalability, even at a fine grain level.
21
C64 App5 SWSWEEP3D Team
Daniel Orozco
22
C64 ARCHITECTURE FLASHNODE I/OSELECTED SLIDE
23
The following slide shows the storage hierarchy
of the C64 with the additions of Memory Mapped
Flash Memory and File Cache Flash Memory
24
FlashNode I/O Team
Yuhei Hayashi
Brian Lucas
Dimitrij Krepis
25
OPENOPELL SELECTED SLIDE
26
The following slide shows how the OpenOpell
toolchain generates code for the Cell Broadband
Engines PPU and SPU processors from a single
source code.
27
OpenOpell Team
Joseph Manzano
Ge Gan
Ziang Hu
Yi Jiang
28
CAPSL Alumni
Seattle
Edmonton
Portland
CAPSL (1996-2007)
Boston
New York
Philadelphia
San Francisco
Washington
Los Angeles
Phoenix
Oversea H. Sakane (Japan) R. Yakay (Turkey) I.
Dogru I. Vennetis (Greece)
Xin Wang (China) Yan. Xie (China)
29
DAPLDS

Dynamically Adaptive Protein Ligand Docking

30
Overview

The DAPLDS project aims to build a
computational environment to assist scientists in
understanding the atomic details of
protein-ligand interactions

Global Computing Lab 2008
31
Objectives (I)

Explore the multi-scale nature of dynamic
protocol model adaptations for protein-ligand
docking

Global Computing Lab 2008
32
Objectives (II)

Develop methods and models that efficiently
accommodate computational adaptations in VC
environments supported by BOINC
Extend knowledge with respect to protein-ligand
complexes and make this knowledge accessible to
the scientific community via cyber-infrastructures

Global Computing Lab 2008
33
Protein-Ligand Docking

Computational methods simulate the protein-ligand
interaction
The resulting structural information can be used
for the design of new drugs

?
protein
ligand
protein ligand
Global Computing Lab 2008
34
Docking Algorithm
MD Simulated Annealing Conformational Search
Dock onto 3D Grid Protein Model
Restore All-Atom Protein Model Minimize
Energy Sort Conformations by Energy
Model Initial 3D Conformation
Generate Random Rotations
Generate Random Conformations
Global Computing Lab 2008
35
Docking_at_Home

High-throughput, protein-ligand docking
simulations are performed on a computational
environment that deploys a large number of
volunteer computers connected to the Internet

Global Computing Lab 2008
36
Docking_at_Home
Global Computing Lab 2008
37
Result Post-Processing

Protein-ligand docking complexes are scored based
on energy values.
Estimation of energy values is inaccurate because
of modelling assumptions
A structure with a minimum energy is not always a
native-like structure

?
Protein docked in the nature
Global Computing Lab 2008
38
Result Post-Processing
Thousands of results provided by Docking_at_Home
Hypothesis If protein-ligand docking is
simulated using a sufficiently accurate model, a
large number of independent simulations can
eventually converge to a native-like structure
Global Computing Lab 2008
39
Result Post-Processing
Clustering thousands of results provided by
Docking_at_Home

Adaptive k-means clustering is used to group
similar ligand conformations
If the simulations converge, then the largest
cluster with minimum energy is also the most
likely to contain more native-like structures
The centroid of the biggest cluster is selected
as a probable native-like structure

Global Computing Lab 2008
40
Result Post-Processing
The adaptive clustering is a promising method to
aid in the selection of native-like ligand
conformations from a significantly large set of
candidates
Protein-ligand conformations
Selected cluster
Centroid RMSD0.102
2D representation of protein-ligand conformations
(Energy, RMSD)
Global Computing Lab 2008
41
Docking_at_Home Screensaver
42
(No Transcript)
43
Acknowledgements
GCL members (Fall 2008) Trilce Estrada Joe
Davis Abel Licon Pat McClory Adnan
Ozsoy James Atlas Reed Matrz Obaidur
Rahaman Kevin Kreiser
Sponsors
DAPLDS Collaborators Sandeep Patel (UD) David
Anderson (UC Berkeley) Kevin Reed (World
Community Grid, IBM) Charles L. Brooks III and
Roger Armen (U Mich) Pat Teller (UTEP)
Group Webpage http//gcl.cis.udel.edu
Global Computing Lab 2008
44
RNAVLab
Case StudyUsing Genetic Algorithms to Generate
Training Data Abel Licon, Reed Martz and Michela
Taufer
44
Global Computing Lab 2008
45
RNAVLab

A collection of tools written in Java for
Prediction
Sampling
Analysis
Provides high-level interface to parallel
resources

45
Global Computing Lab 2008
46
RNAVLab Framework
46
Global Computing Lab 2008
47
RNAVLab Framework

Provide users with a Web interface
Supply services with the RNAVLab backend
Parallel resources
Sequence alignment
Structure comparison
Pseudoknot classification

47
Global Computing Lab 2008
48
RNAVLab
48
Global Computing Lab 2008
49
RNAVLab
49
Global Computing Lab 2008
50
Web-Page Interface
50
Global Computing Lab 2008
51
Web-Page Interface
8/17/87 3/15/07 5/6/08 11/19/08 11/28/08 3/29/09 5
/3/09 1/15/10 12/24/11 7/5/12 12/27/12
HIV Type 1 beet soil-borne virus tobacco mild
green mosaic virus foot-and-mouth disease virus,
serotype C Visna-Maedi virus Bacillus
subtilis Escherichia coli human coronavirus
229E SARS coronavirus cucurbit aphid-borne
yellows virus oilseed rape mosaic
virus E.Coli Nemesia ring necrosis virus pepper
mild mottle virus
Global Computing Lab 2008
51
Global Computing Lab 2008
52
Web Service Backend
52
Global Computing Lab 2008
53
Web Service Backend

Provide RNAVLab services via the REST protocol
Enable Java clients and otherapplications to use
the Web service

53
Global Computing Lab 2008
54
Case Study

Predict very long RNA secondary structures
Attempt to build large structures from small
sub-structures
Challenges
Search space is huge
Possible combinations are 2(n2)
Searching entire spaces unfeasible

54
Global Computing Lab 2008
55
Genetic Sampling

Use a genetic algorithm to search the space of
possible sub-structures
Submit predictions to Condor Grid via RNAVLab
interface
Use generated training data to train a classifier

55
Global Computing Lab 2008
56
GA Evolution
Global Computing Lab 2008
57
Future Work

Use training data to train classifier
Introduce more prediction algorithms
MFE based
Alignment based
Use trained predictor on unknown set and quantify
results

Global Computing Lab 2008
58
Acknowledgments
GCL members (Fall 2008) Trilce Estrada Joe
Davis Abel Licon Pat McClory Adnan
Ozsoy James Atlas Reed Matrz Kevin Kreiser
Obaidur Rahaman
Sponsors
RNAVlab Collaborators Ming-Ying Leung, Kyle L.
Johnson, David Mireles, Roberto Araiza, and
Olac Fuentes (UTEP)? Thamar Solorio (UT Dallas)?
Group Webpage http//gcl.cis.udel.edu
58
Global Computing Lab 2008
59
MD on GPUs
Molecular Dynamics Simulations on Graphics
Processing Units Joe Davis, Adnan Ozsoy, Sandeep
Patel, and Michela Taufer
60
Introduction

Graphics Processing Units (GPUs) have been
extensively used in graphics intensive
applications
Development driven by economy, e.g. video game
industry, motion picture
The inherent parallelization of GPUs makes them
suitable for scientific applications
Recent exploration of potential of GPUs for
mathematics and scientific computing
Medical diagnostics
GPUs coupled to MRI Hardware (Stone et al. Proc.
of 2007 Computing Frontiers conference, 7-9 May,
2008)
Molecular modeling
Electrostatic Potential Calculation (Stone et al.
J. Comp. Chem. 28, 16, pp. 2618-2640)
Ion Placement (Stone et al. J. Comp. Chem. 28,
16, pp. 2618-2640)
Van der Waals Fluids / Polymers (Anderson et al.
J. Comput. Physics 2008)

60
Global Computing Lab 2008
61
GPGPUs

Special purpose hardware specific types of
calculations
Protein Explorer systems and its LSI 'MDGRAPE-3
chip (Taiji et al. in Proc. of 2003 ACM/IEEE
Supercomputing Conference,?15-21 Nov. 2003)
Anton and its 12 identical MD-specific ASICs
(Shaw et al. in Proc. of the 34th Annual
International Symposium on Computer Architecture,
9-13 June, 2007)
General Purpose GPUs (or GPGPUs) cost effective
and readily available in recent workstations
GeForce FX5600
1.5GBytes memory
Cost 2,795

GeForce 9800 GX2
Dual GPU-based graphics card
512MBytes memory per GPU
Cost 665

61
62
Programming GPUs

Past APIs originally through graphics interfaces
e.g., OpenGL
Not easy to use for general usage cast
computation in terms of graphics operations
Draw the calculation
Interpret image post-calculation
Present NVDIA CUDA (Compute Unified Device
Architecture) language/library
Easy to use CUDA provides minimal set of
extensions necessary to expose power of GPGPUs
Includes C-compiler and development tools
CUDA optimization strategy
Maximize independent parallelism
Maximize arithmetic intensive computation
Take advantage of on-chip per-block shared memory
Do computation on the GPUs and avoid data transfer

From CUDA Programming Guide, NVIDIA
62
Global Computing Lab 2008
63
MD on GPUs

Why MD on GPU?
Non-bond expand scales of time and physical
dimension (system complexity)
All-atom resolution (micro to milliseconds)
Course-graining (seconds)
Continuum physics with molecular detail?
MD on GPU Non-bond interactions (pair
interactions)
Non-bond list is generated by checking all pair
distances against the cut-off in parallel
(efficient tiling approach)
A thread iterates through the non-bond list for a
single atom and accumulates the non-bonded
interactions

63
Global Computing Lab 2008
64
Water Model

Flexible Water SPC/Fw (Wu et al, J. Chem. Phys.,
2006)
Intra-molecular potential
Computed on GPU using lists (bond/angle lists)
Non-bonded potential
Lennard-Jones
Shifted-force electrostatics with cut-off only
(no Ewald)
List-based evaluation
Computing system
GPU NVIDIA Quadro FX 5600
CPU (CHARMM) Intel Xeon 5150 2.66 GHz (Woodcrest)

64
Global Computing Lab 2008
65
Performance

Performance metrics number of MD time steps
calculated in one second

GPU is 7x faster on average!
65
Global Computing Lab 2008
66
Accuracy
67
Conclusions

Current achievements
Implementation of a local version of MD code on
current generation of GPUs
Straightforward, naive implementation
Promising results
Work in progress
Optimization and tuning of performance
Expand MD options (additional potentials, PME)
Final goals
Effective compilation of CHARMM on GPU
Study of large solvent systems for
long simulation times, up to 100ns, with CHARMM

67
Global Computing Lab 2008
68
GCL at UD
GCL members (Fall 2008) Trilce Estrada Joe
Davis Abel Licon Pat McClory Adnan
Ozsoy James Atlas Reed Matrz Obaidur
Rahaman Kevin Kreiser Michela Taufer
Sponsors
GPU_at_GCL Collaborators Sandeep Patel (UD) Charles
L. Brooks III and Roger Armen (U Mich)
Group Webpage http//gcl.cis.udel.edu
68
Global Computing Lab 2008
69
jTopaz
Plug your PC into the Grid using Mozilla Patrick
McClory Martin Swany Michela Taufer
69
Global Computing Lab 2008
70
What is GridFTP

Extension of the standard File Transfer Protocol
(FTP)
Designed with three main principles in mind
Security
Reliability
High Performance

Global Computing Lab 2008
71
Current Software

globus-url-copy - script provided by Globus
Toolkit
can only transfer one file at a time
require reauthenticating/reauthorizing for each
transfer
UberFTP interactive client
Both require having the Globus Toolkit libraries
installed on the users machine

Global Computing Lab 2008
72
The Challenge

Although GridFTP has numerous advanced features,
there is a lack of easy to use client software
for end users to take advantage of the Grid.

72
Global Computing Lab 2008
73
jTopaz

Our GridFTP client software addresses this
challenge by providing a simple, easy to use
interface to GridFTP servers
jTopaz is packaged as a Firefox extension
jTopaz is portable across platforms
Work on Linux, Windows, and Mac machines

Global Computing Lab 2008
74
(No Transcript)
75
Java CoG Toolkit

Java Commodity Grid Toolkit
Allow Grid users, administrators, and developers
to work with the Grid from a higher abstraction
level
jGlobus Library
Provide basic API's for interacting with Grid
services such as GridFTP and MyProxy
Key component in jTopaz

Global Computing Lab 2008
76
Future Work

Currently jTopaz only implements a simple
client-server file transfer model
Future work include advanced features
Third party transfers
Parallel transfers
Partial file transfers

Global Computing Lab 2008
77
jTopaz Demo
Global Computing Lab 2008
78
jTopaz Demo
Select jTopaz in the Tools menu
79
jTopaz Demo
Enter GridFTP server info
80
jTopaz Demo
Local Files
Remote Files
81
Acknowledgements
GCL members (Fall 2008) Trilce Estrada Joe
Davis Abel Licon Pat McClory Adnan
Ozsoy James Atlas Reed Matrz Kevin Kreiser
Obaidur Rahaman
Sponsors
jTopaz Collaborators Martin Swany (UD) Karan
Bhatia (SDSC)
Group Webpage http//gcl.cis.udel.edu
81
Global Computing Lab 2008
82
Intelligent CompilersJohn Cavazos
(cavazos_at_cis.udel.edu)The Adaptive Compilation
Environment (ACE) ProjectDept. of Computer
Information Sciences, University of Delaware
83
Motivation

Architectures are getting increasingly more
complex
Finding efficient heuristics to solve hard
compiler problems challenging
Quick retargeting of optimizing compilers for new
architectures are needed

84
Solution Intelligent Compilers

Machine learning
Automates the process of tuning optimizing
compilers
Allows specialization of compiler to targeted
hardware

85
Overview Intelligent Compiler
86
Methodology Description

Phrase as a Machine Learning Problem
Feature Construction
Generate Training Instances
Feed Instances to Learning Algorithm
Integrate the Learned Heuristic
Evaluate the Learned Heuristic

87
Case Study PathScale
88
Case Study

PC Model trained using (1) program
characteristics (from performance counters), (2)
best optimization sequences for each program, and
(3) speedups obtained from best seqs.
Predictive model predicts which optimizations
will be beneficial optimizations for each
application

89
SPEC C/C and Fortran Benchmarks
Obtained 17 average improvement over most
aggressive optimization level (-Ofast) in an
industry-strength compiler. Experiments
performed using PathScale compiler.
90
Future Work

Optimization Phase-Ordering
Multicore optimizations

For more information http//www.cis.udel.edu/cav
azos
91
Research Overview SC08

Murat Bolat, Liang Gu, Jakob Siegel, Ryan Taylore
Principal Investigator Xiaoming Li
University of Delaware
xli_at_ece.udel.edu

92
Library Generation and Optimization Projects -
Model-driven optimization for FFTW - Lattice
Boltzmann Method for CUDA
93
Model-driven optimization for FFTW
94
Model-driven optimization for FFTW

Goal
Understand why FFTW produces high-performance
code.
Discover the role of FFTWs empirical search
engine in code optimization.
Generate FFT library with equally high quality
without empirical search.

95
Performance of our model-driven FFTW
Better
Our Code
96
Search time of our model-driven FFTW
Better
Our Code
97
Accelerate Lattice Boltzmann Method (LBM) on CUDA
98
Accelerate LBM on CUDA

The LBM models Boltzmann particle dynamics on a
2D or 3D lattice.
LBM is one of the most important physical
simulation methods.

99
Challenges of Optimize LBM on CUDA (1)

Extensive and irregular data exchange between
lattice cells

Our Optimization Techniques (1) Co-optimize
global memory and shared memory layout. (2)
Coalesce memory accesses. (3) 2-D data padding
and buffering.
100
Challenges of Optimize LBM on CUDA (2)