Focus Study: Mining on the Grid with ADaM - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Focus Study: Mining on the Grid with ADaM

Description:

Automated discovery of patterns, anomalies from vast ... Condor-G. Globus. WRF Initializations. 230 WRF runs were made, two control (single-cell) ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 22
Provided by: sand178
Category:
Tags: adam | condor | focus | grid | mining | study

less

Transcript and Presenter's Notes

Title: Focus Study: Mining on the Grid with ADaM


1
Focus StudyMining on the Grid with ADaM
Sara Graves Sandra Redman Information Technology
and Systems Center and Information Technology
Research Center University of Alabama in
Huntsville National Space Science and Technology
Center 256-961-7806 sgraves_at_itsc.uah.edu sredman_at_
itsc.uah.edu www.itsc.uah.edu
2
Data Mining
  • Automated discovery of patterns, anomalies from
    vast observational data sets
  • Derived knowledge for decision making,
    predictions and disaster response
  • http//datamining.itsc.uah.edu

3
Creating a Successful Environment for Data Mining
  • Provide scientists with the capabilities to allow
    the flexibility of creative scientific analysis
  • Provide data mining benefits of
  • Automation of the analysis process
  • Reducing data volume
  • Provide a framework to allow a well defined
    structure to the entire process
  • Provide a suite of mining algorithms for creative
    analysis that can adapt to new hypotheses
  • Provide capabilities to add science algorithms to
    the environment
  • Exploit emerging technologies in computational
    and data grids, high-performance networks, and
    collaborative environments

4
Challenges for Next-generation Mining
  • Develop and document common/standard interfaces
    for interoperability of data and services
  • Design new data models for handling
  • real-time/streaming input
  • data fusion/integration
  • Design and develop distributed standardized
    catalog capabilities
  • Develop advanced resource allocation and load
    balancing techniques
  • Exploit the grid concept for enhanced data mining
    functionality
  • Develop more intelligent and intuitive user
    interfaces
  • Integrate with collaborative environments
  • Develop ontologies of scientific data, processes
    and data mining techniques for multiple domains
  • Support language and system independent
    components
  • Incorporate data mining into science and
    engineering curricula

5
Algorithm Development and Mining System (ADaM) -
System Overview
  • Consists of over 100 interoperable mining and
    image processing components
  • Each component is provided with a C application
    programming interface (API), an executable in
    support of scripting tools (e.g. Perl, Python,
    Tcl, Shell)
  • ADaM components are lightweight and autonomous,
    and have been used successfully in a grid
    environment (NASA IPG, TeraGrid, lab)
  • ADaM has several translation components that
    provide data level interoperability with other
    mining systems (such as WEKA and Orange), and
    point tools (such as libSVM and svmLight)
  • Web service interfaces in development
  • Executes in multiple environments (e.g.
    workstation, cluster, grid, on-board, etc.)
  • NMI Integration Testbed test cases

6
MEADModeling Environment for Atmospheric
Discovery
  • One of the NSF PACI Alliance research Expeditions
  • Expeditions ensure intense collaboration among
    technology developers and application scientists
    and focus on the deployment of infrastructure
    that supports computational science and
    engineering and science in a variety of
    disciplines
  • MEADs focus is on retrospective analysis of
    hurricanes and severe storms using the TeraGrid,
    integrating computation, grid workflow
    management, data management, model coupling, data
    analysis/mining, and visualization

7
MEAD Mining ExampleMesocyclone Detection
Algorithm
  • Science Objective
  • To investigate different thunderstorm cell
    interactions favorable for subsequent tornado
    (mesocyclone) formation
  • Goals
  • Develop a mesocyclone detection algorithm (in
    both 2D and 3D)
  • Develop an algorithm to track the temporal
    evolution of the mesocyclone features
  • Investigate the use of clustering techniques to
  • Summarize differences in simulation runs
  • Provide an overview of all the simulations

8
Approach
  • Mining Approach
  • Use idealized WRF model simulations with
    different initial conditions
  • Create a large parameter space of thunderstorm
    cell interaction and storm behavior
  • Mine this search space for patterns and trends
  • Grid Approach
  • Application scripts developed in Python and
    tested on linux modified for Globus environment
    by writing a simple Globus RSL file
  • Application scripts constructed to run each
    combination of tools in parallel on a different
    node on the grid

9
Example MEAD Workflow
Initial Setup
Model Execution
Post Run Analysis
Initial Data and Parameters
Data Mining (ADaM)
Multiple WRF Models (Weather)
Model Results
Inter-model communications
Model Results
Multiple ROMS Models (Ocean)
Visualization
Initial Data and Parameters
Grid environment supports the demanding
computational, data storage and post analysis
requirements
10
Using the TeraGrid
  • Excellent user documentation at
    http//www.teragrid.org/userinfo/
  • Account Management - Procedures vary per site
  • Get account at each site
  • Obtain certificate (from one of several sites,
    X.509 or KX.509)
  • Establish Distinguished Name in grid-mapfile at
    each site
  • Create certificate proxy (grid-proxy-int,
    MyProxy, kinit)
  • Programming Environment Know your systems
  • Compilers (you have a number of choices)
  • Environment Variables (SoftEnv)
  • Message Passing (several flavors available)
  • Executing Jobs
  • Condor-G
  • Globus

11
WRF Initializations
  • 230 WRF runs were made, two control
    (single-cell)
  • Each corresponded to a particular
    arrangement of a pair of initial storm cells
  • In figure at left
  • Each square 1 simulation
  • 1st storm in the middle
  • 2nd at one of blue squares
  • Center cell stronger

Matrix of WRF simulations
Slide Source Brian Jewett
12
Example Tracking Results
13
Mesocyclone Detection and Tracking Results
Features with time durations of a single time
step are filtered out
14
Summary Mesocyclone Detection
  • Number of mesocyclones with higher duration tend
    to be associated with initializations where the
    second cell is closer to the first
  • Mesocyclones found in the storm simulations are
    sensitive to the particular arrangement of a pair
    of initial storm cells (secondary storm placement
    at 45 degrees to the primary storm)
  • Clustering techniques are useful
  • Summarize differences in simulation runs
  • Provide an overview of all the simulations
  • Limitations of Clustering algorithms
  • Investigated K-Means, Dbscan, Maximin and
    Hiearchical Clustering Algorithms
  • K-Means clustering quality is inferior but
    provides useful cluster centers or profiles

15
LEAD Linked Environments for Atmospheric
Discovery
  • A cyberinfrastructure for mesoscale
    meteorology
  • real-time, on-demand, and dynamically adaptive
    needs for mesoscale weather research
  • High volume data sets and streams
  • Computationally demanding numerical models and
    data assimilation systems

16
LEAD NSF Information Technology Research (ITR)
program Multi-Disciplinary team contributing
expertise in meteorological applications,
analysis tools, forecast tools, data distribution
and management, portal development, workflow
orchestration, education and outreach
17
LEAD An integrated framework for identifying,
accessing, preparing, assimilating, predicting,
managing, analyzing, mining, and visualizing
meteorological data, independent of format and
physical location Dynamic workflow
orchestration and data management are key
elements
18
LEAD GWSTBsGrid and Web Services Testbeds
  • Local User Environment customized portal,
    control of information flows, collaboration
    tools, managing processes
  • Productivity Environment models, tools, and
    algorithms
  • Data Services Environment data transport, data
    formatting, and interoperability
  • Distributed Technologies Environment workflow
    infrastructure to autonomously acquire resources
    and adapt to changing plans
  • Data Archive recent and historical data,
    products, and tools

19
The Portal as a Grid Access Point
  • The Portal Server provides the users Grid Context.

OGCE or GridSphere Grid Portal Server
https
SOAP WS-Security
Web Services Resource Framework Web Services
Notification
Physical Resource Layer
20
Services Oriented Architecture
  • User interfaces with portal via browser
  • Portal provides tools for users to build and
    launch workflows
  • Portlets (JSR-168) provide interface between user
    and grid services
  • Applications can be wrapped as services via a
    Portal Factory Service Generator
  • Requires application, script to run it, input
    parameters, output parameters
  • Write an AppService document and upload to Portal
    Factory Service Generator (in portal)
  • Service is created as well as the portal client
    interface
  • Security model integral to design

21
Data Integration and Mining From Global
Information to Local Knowledge
Emergency Response
Precision Agriculture
Bioinformatics
Urban Environments
Weather Prediction
Write a Comment
User Comments (0)
About PowerShow.com