Title: The Telescience Project
1The Telescience Project
TM
Lu Dai, Alex Kulungowski, Adam Lathers, Vikrum
Nijjar, Abel W. Lin, Jeff Mock, Tomas Molina,
George Yang National Center for Microscopy and
Imaging Research University of California San
Diego Sandeep Chandras, Kurt Mueller GEON/NBCR G
aurang Mehta, Mei Su, Ewa Deelman Information
Science Institute University of Southern
California
2- Mission
- Develop and implement technologies to determine
and reveal supramolecular details in their
subcellular, cellular, and tissue contexts. - NCMIR is focused on accelerating the process to
fill in information about biological systems in
the mesoscales between 3 nm3 and 30 µm3 - NCMIR forges innovations in instrumentation,
labeling technologies and specimen preparation,
and information technology to enable solutions
for grand challenges in biomedical imaging
research
3NCMIR is an NIH Biomedical Technology Research
and Development Resource
- NCMIR makes available for use intermediate
voltage electron microscopes (IVEMs) and
associated technologies for correlated
microscopy, 3D reconstruction, and visualization,
as well as advice and training in the application
of these technologies. - NCMIR has gt100 collaborative and service projects
in operation. - Research from NCMIR and its collaborators has
been published in a variety of journals and is
frequently presented in local, national, and
international forums - NCMIR is a highly interdisciplinary environment
(biologists, biochemists, physicists,
mathematicians, software developers, engineers,
etc. all working hand in hand). - 70 personnel (staff, students, academic/faculty)
4NCMIR is an Accessible Resource Center for
Advanced Biomedical Research Fielding
High-throughput Imaging Instruments,
Computational Analysis Tools and Databases
5For more information
6Telescience is a Methodology
Integrate resources, technologies and
applications using standardized Grid middleware
technologies and advanced networking to provide
an end-to-end solution for challenges like
multi-scale biomedical imaging.
7Electron Microscopic Tomography was the 1st
Testbed for Telescience
- Derive 3D information about a sample from a
series of 2D projections. - Perfect application for driving the integration
of technologies - Computation and data intensive
- Requires increased access to unique, expensive
instrumentation - Requires advanced visualization tools for
segmentation - and analysis of the data
- Detailed process that is natively collaborative
- Demand from neuroscience community for
accelerated - population of databases of biological structure
8Telescience ATOMIC
9Interacting with the Grid
10Interacting with the Grid
That fragmentation is a significant burden on
applications developers. In our experience,
getting applications "onto the Grid" is the
single most rate-limiting step in the growth of
scientific Grid-based Projects
- Two primary challenges when attempting to
integrate and use the Grid - The tools required to develop applications for
any grid infrastructure are complicated often
different tools provide overlapping functionality - The "Grid" is still a relatively young
discipline Lifecycle of Grid software outraces
lifecycle of established scientific software
11Challenges facing Application/Portal Developers
Ratio of time on core application development
versus "Grid maintenance" is too high
12Telescience ATOMIC addresses those challenges by
bridging the Gap between middleware and
applications
Portal and Applications
ATOMIC
NMI Collective Services MyProxy, Pegasus,
DataCutter, GridFTP, etc. Local (Services) GSI,
Globus, Condor, RLS, SRB, NWS, etc
Physical Resources Data Storage, Compute
Resources, etc.
Applications to Middleware Interaction Components
13Telescience ATOMIC
- Application to Middleware Interaction Components
- Enabling Ubiquitous Uniform Developer Access
14ATOMIC works together with Interface layers to
bring applications to the Grid
Provides an intuitive GUI for end-users to launch
jobs and manage data
Telescience Portal and Applications
ATOMIC TeleAuth/GAMA, TeleWrap, TeleRun, etc.
Provides a programming interface and other tools
for developers (Telescience Portal and others) to
access to Telescience Grid (compatible with other
Scientific Grids).
15Telescience is reducing the threshold to the use
of Grid technologies
ATOMIC services scientific applications
developers (mathematicians, physicists,
etc) ATOMIC supports legacy applications and
native grid applications with equal intensity.
 Motivation for ATOMIC The Grid should be
brought to bear for scientific processes...scienti
fic processes should not necessarily have to
conform to the Grid
16Telescience Infrastructure
Richly integrated user environment
ATOMIC applications (portal) enabler
NMI
Data Devices generate, compute, store
17Telescience Building a Unified Grid
18ATOMIC is built out of Applied Experience, not
only leading CS concepts
ATOMIC is more than just a packaging of
technologies, it is the bundling of the
interdisciplinary experience required to in bring
these technologies to bear for scientific
applications.
19Telescience Workflows
20Tomography Algorithms were the 1st testbed for
Telescience workflow methodologies
- Derive 3D information about a sample from a
series of 2D projections. - Perfect application for driving the integration
of technologies - Computation and data intensive
- Requires increased access to unique, expensive
instrumentation - Requires advanced visualization tools for
segmentation and analysis of the data - Detailed process that is natively collaborative
- Demand from neuroscience community for
accelerated - population of databases of biological structure
21GTomo Our original proof of concept efforts
Pleasantly parallel analysis algorithm perfectly
suited for Grid deployment
The Lessons from Gtomo
- Early version of the GTomo code was built onto
the Grid using direct ties into Globus (1.1.3) on
a 32-bit architecture - Grid portability means that an application must
function independent of its middleware
foundation - Its already a slow and grueling process to
develop and refine the mathematics behind these
algorithms. - Domain Scientists should focus on algorithm
design, not Grid paradigms/models
22NCMIR/Telescience Computing Infrastructure
Enabling Uniform Access to Computational Resources
- Telescience CA accepted on a mix
- of resources (each running different
- versions of software)
- TeraGrid
- OptIPuter
- NBCR
- BIRN
- NCMIR
- This is an increased burden on the
- portals and applications developer
23Using the Infrastructure
Use of workflow (Pegasus) and scheduling tools
(Condor DAGman) provides a natural separation
between core application development
Applications developers provide core
functionality in modular components using any
language Grid developers "encode" parallelism
using workflow tools
More than just efficient computing (not just for
cycle scraping) also the efficient use of man
hours
Result is an accelerated time-to-solution for
Grid Applications
24Parallel TxBR - Quadratic Reconstruction
Putting the Infrastructure to Use
Acute need for Grid computing 12 days of
computation time on high powered dual CPU
workstation (approximately 5GB raw data resulting
in a 35GB 3D volume) Challenge 1 Applications
developers are interested in the mathematical
algorithms to produce better 3D volumes, NOT in
Grid computing paradigms Challenge 2 Primary
Application is developed with MATLAB
Lawrence, A., Bouwer, JC., Perkins, G., and
Ellisman M.H. (2005) Transform Based
Backprojection for Volume Reconstruction of Large
Format Electron Microscope Tilt Series, Journal
of Structural Biology (Accepted for Publication
December 2005) Lawrence A., Bouwer, J., Perkins,
G., Kulungowski, A., Peltier, S., and Ellisman,
M.H. (2005) Electron Microscope Tomography
Calculating and Inverting theGeneralized Ray
Transform, Proceeding of the SIAM Conference on
Imaging Sciences, May 15-17, 2006, (Submission
accepted December 2005)
25Parallel TxBR - Quadratic Reconstruction
Putting the Infrastructure to Use
Immediate Solution (3-5 days) Use Condor to
submit MATLAB M-Files to local NCMIR
resources Result Same reconstruction runs in
lt 2 days, 6x increase
26Parallel TxBR - Quadratic Reconstruction
Putting the Infrastructure to Use
Long-Term Solution Transform MATLAB M-Files
to Compiled Language (i.e. C/C) Perform
additional Pre-processing (managed by
Pegasus) Use Pegasus to manage workflow,
allowing for more dynamic resource discovery and
scheduling Result Same reconstruction runs in
hours This framework applies to other
applications (i.e. Visualization)
27Telescience Workflow Summary
Workflow and job distribution tools have helped
decrease throughput time consuming steps in
tomography Independence from middleware is a
requirement to enable rapid deployment of Grid
enabled codes Pipelined development process with
the help of Condor and Pegasus has helped reduce
effective time to Grid We can leverage the same
tool set we for other compute intensive processes
28Portals Powered by Telescience
29Portals Building dynamic user and applications
environments
The portal ecosystem and portal fabric will
become the dominant models of application
delivery by 2005."
- Telescience driven Portals are built on the
JSR168 - compliant GridSphere Project - extends the suite of richly integrated, grid
enabled tools for the end user - provides a new level of flexibility,
customization, and seamless (administration free)
harnessing of the power of global grids - accomplishes this while also reducing the
complexity of its own creation and management. - amplifies the extensibility of its underpinnings
to other scientific domains with analogous
needs.
30NCMIR Multi-Scale Imaging
31GIS Portal for analyzing Toxicological Events
from the aftermath Hurricanes Katrina/Rita
32Grid Portals by design are meant to provide a
simple and intuitive web-interface for users to
launch different applications and access shared
resources with the convenience of single sign-on
and a unified security architecture.
But not all portals are built the same. Is there
really an advantage here?
33Richly integrated, multi-tiered workflows
Generalized Teleinstrumentation
Parallel Distributed Batch Computation
Telescience produces an environment
oriented/tailored to the researcher. Design
emphasizes simplicity, ergonomics, and a unified
look-and-feel. Grid complexity and heterogeneity
of technologies is abstracted.
Collaboration/Administration
Interactive Visualization/Analysis
Federated Databases
Virtual Data Grid
34Telescience driven Portals are User centric
Portals...
The Telescience methodology enables a rich user
environment, where state information from user
actions in the scientific process driven workflow
management portlets are also reflected in other
portlets, and can be further passed to external
applications and "CS workflows".
...not a Grid middleware Portals
35More than just Middleware Interfaces
Data Grid Portlet is not just another "SRB
Interface"... but rather works in tandem with
all other portlets, in particular the main
workflow controller
36The Next Generation Moving beyond the
hub-and-spoke
1st generation portals and Grid providers treated
the scientific users/process as a "hub and spoke"
environment Users have to "login" to
resources (via either command line or web) and
must manually transfer data to and from each
remote resource
Scientific processes, however, are more akin to
point-to-point processes, where data output from
one components naturally flows to the
next. Telescience aims to unify the fragmented
Grid tools to allow for a user environment that
mirrors this natural working paradigm
37More Resources
http//telescience.ucsd.eduhttp//ncmir.ucsd.edu
IBM developerWorks Series "Building a Unified
Grid"http//www.ibm.com/developerworks/grid/libra
ry/gr-unified1
All Telescience software is freely available via
CVS http//ncmir.ucsd.edu/Downloads
38Acknowledgments