Experiments in Utility Computing: Hadoop and Condor

About This Presentation

Title:

Experiments in Utility Computing: Hadoop and Condor

Description:

Several clusters of 100s-1000s of nodes ... (HDFS, Lustre, Ibrix, ...) Programming Models (MPI, DAG, MW, MR...) Applications (Crawl, Index, ... – PowerPoint PPT presentation

Number of Views:172

Avg rating:3.0/5.0

Slides: 18

Provided by: Csw5

Learn more at: https://research.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Experiments in Utility Computing: Hadoop and Condor

1
Experiments in Utility Computing Hadoop and
Condor

Sameer Paranjpye
Y! Web Search

2
Outline

Introduction
Application environment, motivation, development
principles
Hadoop and Condor
Description, Hadoop-Condor interaction

3
Introduction
4
Web Search Application Environment

Data intensive distributed applications
Crawling, Document Analysis and Indexing, Web
Graphs, Log Processing,
Highly parallel workloads
Bandwidth to data is a significant design driver
Very large production deployments
Several clusters of 100s-1000s of nodes
Lots of data (billions of records, input/output
of 10s of TB in a single run)

5
Why Condor and Hadoop?

To date, our Utility Computing efforts have been
conducted using a command-and-control model
Closed, cathedral style development
Custom built, proprietary solutions
Hadoop and Condor
Experimental effort to leverage open source for
infrastructure components
Current deployment Cluster for supporting
research computations
Multiple users, running ad-hoc, experimental
programs

6
Vision - Layered Platform, Open APIs
Applications (Crawl, Index, )
Programming Models (MPI, DAG, MW, MR)
Batch Scheduling (Condor, SGE, SLURM, )
Distributed Store (HDFS, Lustre, Ibrix, )
7
Development philosophy

Adopt, Collaborate, Extend
Open source commodity software
Open APIs for interoperability
Identify and use existing robust platform
components
Engage community and participate in developing
nascent and emerging solutions

8
Hadoop and Condor
9
Hadoop

Open source project developing
Distributed store
Implementation of Map/Reduce programming model
Led by Doug Cutting
Implemented in Java
Alpha (0.1) release available for download
Apache distribution
Genesis
Lucene and Nutch (Open source search)
Hadoop (factors out distributed compute/storage
infrastructure)
http//lucene.apache.org/hadoop

10
Hadoop DFS

Distributed storage system
Files are divided into uniform sized blocks and
distributed across cluster nodes
Block replication for failover
Checksums for corruption detection and recovery
DFS exposes details of block placement so that
computes can be migrated to data
Notable differences from mainstream DFS work
Single storage compute cluster vs. Separate
clusters
Simple I/O centric API vs. Attempts at POSIX
compliance

11
Hadoop DFS Architecture

Master Slave architecture
DFS Master Namenode
Manages all filesystem metadata
Controls read/write access to files
Manages block replication
DFS Slaves Datanodes
Serve read/write requests from clients
Perform replication tasks upon instruction by
namenode

12
Hadoop DFS Architecture
Metadata (Name, replicas, ) /home/sameerp/foo,
3, /home/sameerp/docs, 4,
Namenode
Metadata ops
Client
Datanodes
I/O
Client
Rack 1
Rack 2
13
Benchmarks
14
Deployment

Research cluster of 600 nodes
Billion web pages
Several months worth of logs
10s of TB of data
Multiple-users running ad-hoc research
computations
Crawl experiments, various kinds of log analysis,
Commodity Platform Intel/AMD, Linux, locally
attached SATA drives
Testbed for open source approach
Still early days, deployment exposed many bugs
Future releases to
First stabilize at current size
Then scale to 1000 nodes

15
Hadoop-Condor interactions

DFS makes data locations available to
applications
Applications generate job descriptions
(class-ads) to schedule jobs close to data
Extensions to enable Hadoop programming models to
run in scheduler universe
Master/Worker, MPI universe like meta-scheduling
Condor enables sharing among applications
Priority, accounting, quota mechanisms to manage
resource allocation among users and apps

16
Hadoop-Condor interactions
Scheduler universe apps
HDFS
Data locations (d,e)
1
Condor
Classads (Schedule on d,e)
2
3
Resource allocation
4
17
The end