Title: Cluster Computing in the Classroom: Topics, Guidelines, and Experiences
1Cluster Computing in the Classroom Topics,
Guidelines, and Experiences
- Amy Apon
- Department of Computer Science Computer
Engineering - University of Arkansas
2Clusters and Data Engineering
- A cluster is a set of whole computers connected
via a network, and used as an integrated resource
to solve a single application - Increase throughput for massive data processing
- Inexpensive - uses commodity computers with lots
of disks and disk space
3Teaching Challenges
- Prerequisites are difficult to establish
- One course does not fit all!
We propose
- Cluster teaching material organized as modules
- Accessible to a variety of situations
4Outline
- Overview of target audience for the proposed
teaching materials - Description of course modules
- Problem areas
- Conclusions
- Acknowledgements and references
5Courseware developed with
Dr. Amy Apon
Dr. Jens Mache
Dr. Hai Jin
Dr. Rajkumar Buyya
6Who our students are
- Juniors, seniors, graduate students
- With a variety of preparation
- Operating Systems?
- Maybe havent seen threads
- Computer Networks?
- Maybe havent seen sockets
- Computer Architecture?
- Maybe dont understand how cache works
7Course Units
- Needed because of the diversity of institutions
and student preparation - Matched to the Computing Curricula 2001 to avoid
overlap with existing courses - Basic Units (have overlap with ACM Core)
- Core Units (essential to cluster computing)
- Extended Units (more advanced, optional)
8Course Units Can Be Combined
- We propose sample courses with an emphasis in one
of - Architecture
- Programming
- Algorithms and Applications
9Five Basic Units
- Programming Fundamentals (PF2, PF5)
- Algorithms and problem-solving
- Event-driven programming (3 hours total)
- Architecture and Organization 4 (AR4)
- Memory system organization (1 hour)
- Architecture and Organization 7 (AR7)
- Multiprocessing architectures (1 hour)
- Operating Systems 3 (OS3)
- Concurrency (1 hour)
- Net-Centric Computing 2 (NC2)
- Communication and networking (2 hours)
10Ten Core Units
- Algorithms and Complexity 4 (AL4)
- Distributed algorithms (1 hour)
- Algorithms and Complexity 11 (AL11)
- Parallel algorithms (3 to 7 hours)
- Architecture and Organization 7 (AR7)
- Multiprocessing and alternative architectures (2
hours) - Architecture and Organization 9 (AR9)
- Architectures for networks distributed systems
(1-4 hours) - Operating Systems 11 (OS11)
- System performance evaluation (1-2 hours)
11Ten Core Units, continued
- Net-Centric Computing 2 (NC2)
- Communication and networking (1 hour)
- Net-Centric Computing 6 (NC6)
- Network management (1-2 hours)
- Social and Professional Issues 9 (SP9)
- Economic issues in computing (2 hours)
- Software Engineering 2 (SE2)
- Using APIs Basic MPI or PVM, basic PVFS (2
hours) - Computational Science 4 (CN4)
- High-performance computing (6 or more hours)
12Many Choices for Extended Units!
- Software Engineering (SE3), Software tools and
environments - Debugging tools
- Operating Systems (OS8)
- Parallel file systems
- Algorithms (AL11)
- Advanced parallel algorithms.
- Architecture and Organization (AR9)
- Architecture for networks and distributed systems
- Graphics and Visualization (GV9)
- Intelligent Systems 4 (IS4), Advanced search
- Information Management (IM8, IM9, IM10, IM11)
- Distributed databases, physical database design,
data mining, and information storage and
retrieval on clusters - Computational Science (CN1, CN3)
13Cluster Architecture Emphasis
- Similar requirements as for a course in advanced
computer architecture - Suited for advanced undergraduates and graduate
students who have completed - Computer organization
- Computer networks
- Operating systems
- Programming
14Cluster Architecture Topics
15Programming Emphasis
- Suited for undergraduates with exposure to
- Data structures and algorithms
- Computer organization
- Can use general access computer lab/LAN (if
performance is not an issue) - Can use generally available programming
environments
16Cluster Programming Topics
- Shared memory programming
- Leading to a discussion of NUMA
- Sockets
- Leading to discussion about network overhead,
low-latency protocols - Parallel programming using MPI
- Middleware Java RMI, CORBA
17Algorithms and Applications
- Suited for
- Advanced undergraduate with a strong algorithms
and programming background - Graduate students
- Can be
- Parallel algorithms
- With a focus on topics from a particular domain
18Algorithms and Applications Topics
- Application Overview
- Compression, data mining, image rendering,
genetic algorithms, - Techniques of Algorithm design
- Partitioning, divide and conquer, communication
and synchronization, - Modeling and visualization
- Performance tuning
19Classroom Favorites
- Build your own cluster
- Using old lab machines, install PVM or MPI
- Parallel matrix multiply, sort
- Implement these using MPI, evaluate the
performance using data of varying size, present
results graphically - Term programming project
- Can have students select their own!
20Problem Areas
- Cluster setup and administration
- Cluster usage (especially for performance
experiments) - Security
21Conclusions
- Cluster computing is a low cost approach to
massive data processing - Cluster computing can be taught at the
undergraduate level - Modules help to organize the material so that it
is appropriate for your institution - Modules can be mixed and matched
22References and Acknowledgements
- Cluster Computing in the Classroom Topics,
Guidelines, and Experiences - by Amy Apon, Rajkumar Buyya, Hai Jin, Jens Mache,
First International Workshop on Cluster Computing
Education, Cluster.Edu 2001 - See http//citeseer.nj.nec.com/395286.html