Title: COSC 3P93 Seminar:
1 COSC 3P93 Seminar
Brandon Visser
2 Distributed Computing
Seminar Overview
- Distributed Computing What is it?
- Why its so useful
- Its relation to the world of Parallel Computing
- How Distributed Computing Works
- Different DC architectures
- Good DC Problems Bad DC Problems
- Applications of Distributed Computing
- The Future of DC Computing
3 Before we start
- Distributed Computing vs Grid Computing vs P2P
-
- Grid Computing
- Computational grid
- Usually focused on Dedicated workstations,
servers, and mainframes - Huge datasets that run for days
- Distributed Computing
- subset of Grid Computing
- Geared to pooling resources networked end-user
PCs - Much more Limited in memory/CPU power
- Primary usage not distributed computing but
serving their user - Peer to Peer (P2P)
- Network computing system in which all pcs are
treated as equal on the network - May share resources such as Hard Drives, CD Roms,
etc - P2P protocol for sharing MP3s and other media
over the Internet - Kazaa/Napster
4 Distributed Computing What is it?
- many ways to define Distributed Computing
- Been around for years
- Various vendors
- General DC Definition
- Distributed Computing is any computing that
involves multiple computers remote from each
other that each have a role in a computation
problem or information processing. - Seminar will focus on DC systems distributed
across the internet - Recent technological jumps have made DC more
attractive - Increased bandwidths
- Extremely Fast CPUs
5 Why Distributed Computing?
- Distributing a problem over a large network has
many advantages - Easy on the wallet
- Seti_at_home Faq
- Reliability
- Raw Performance
6 Why Distributed Computing? (contd)
- Case Study
- Brock University
- Utilization of Brocks User Services Lab
Computers - Several Computer labs all containing P4 computers
averaging speed between 1.6-2.4 GHz
7 Why Distributed Computing? (contd)
- 417 Intel P4 computers with an average speed of
1.8 GHz - Using commercial DC software, equivelent to speed
of
Source http//www.ud.com United Devices
8 Why Distributed Computing? (contd)
Source www.extremetech.com
9 Why Not Distributed Computing?
- However, we must not fool ourselves into thinking
its the best parallel solution for any
application
- Central Server still needed for coordination
- Finding client machines is not an automatic
process - Data dependencies
- Slow communication channels compared to typical
parallel architectures
10 DCs Relation to Parallel Computing (Contd)
- Similar in concept to a Parallel Computing, but
we must distinguish between the two - Parallel computing has the advantage over
Distributed computing because of the close range
of the processors - Communication between processors much faster
- Better suited then DC for problems requiring
inter-processor communication and dependent
variables
11 DCs Relation to Parallel Computing (contd)
- Beginning to see support for parallel machines
- Windows XP now has support for up to 2 CPUs
- Linux/Unix Many CPUs.
- For now, not widespread, and Applications must be
programmed with multiple CPUs in mind. This can
create platform dependencies. - Doom 3, Quake 3, Adobe Photoshop
- Clustering
- grouping of workstations connected together in a
local-area network with applied middleware to
make them act like a parallel machine.
12 DCs Relation to Parallel Computing (contd)
- Beowulf is the most popular example of a
clustering system - Runs on Linux/Unix systems
- Inexpensive form of parallel computing
- Support for these systems are still fairly
limited - Closest parallel architecture example to DC
13 How Distributed Computing Works
- DC Systems Today consist of
- Lightweight software agents
- Dedicated DC Management Servers
- Role of Client End
- Agent notifies server when system is idle (often
a screen saver) - Agent requests data from server
- Computes when it has spare CPU cycles
- Control given back to user immediately upon input
from mouse or keyboard
14 How Distributed Computing Works (contd)
- Important that control is returned as soon as
user requests - Any delay would likely be unacceptable
- Role of Distributed Computing Management Server
- Divide large tasks into smaller tasks
- Monitor jobs currently being run
- Receive results from clients and assemble
- Usually a database would help with this
- If a server doesnt hear from a client for a long
time, it can - Assumes user on machine
- Send same package to another client
15 How Distributed Computing Works (contd)
- Other things to keep in mind
- Architecture requirements increase with size of
network - Server
- Client
- Network
- Security and authentication
- Resource identification
- Know client PC characteristics
16 Distributed Computing Architectures
- Several different solutions for DC available
- Some commercial, some Open sourced
- Current Vendors of DC Systems
- Entropia
- Data Synapse
- Sun
- Parabon
- Avaki
- United Devices
17 Distributed Computing Architectures
- We will take a look at two types of Architectures
- Entropia
- DataSynapses LiveCluster
- Entropias System
- Known as a Hub and Spoke with the Server at the
hub. - No communication between individual nodes
- Data communicated back and forth between server
and clients as batch jobs - Works on virtually any computer with a connection
to the internet (Dial up or dedicated line)
18 Distributed Computing Architectures (contd)
Picture from www.entropia.com
19 Distributed Computing Architectures (contd)
- Livecluster
- Inter-client Communication as well as
communications between client and server - Inter-client communication comes in 20 ms
bursts - Advantage of this
- Applications can be divided into tasks that have
mutual dependencies - Takes some load off server
- Drawbacks
- Most effective on internal network or broadband
internet.
20 Distributed Computing Architectures (contd)
Picture from www.datasynapse.com
21 Distributed Computing Problems
- Bad DC Problems
- The closer an application is to running in real
time, the less appropriate DC is
http//www.extremetech.com - Systems that run for only a couple of hours may
not see much of a benefit from DC - overhead
22 Distributed Computing Problems (contd)
- Good Dc Problems
- Most appropriate applications are those which
exhibit loosely coupled, non-sequential tasks in
batch processes with a high compute-to-data
ratio." www.entropia.com - High compute-to-communication ratio also
important - Any problem that fully extends the Course Grain
Parallelism principle - it should be possible to partition the
application into independent tasks or processes
that can be computed concurrently
http//www.extremetech.com
23 Distributed Computing Problems (contd)
- Examples of good DC Problems
- Complex Modeling and Simulation techniques
- Car crash simulations
- Weather forecasting
- AI Exhaustive Search techniques
- Life Sciences
- sequencing the human genome
- As a result of sequencing the human genome,
the number of identifiable biological targets for
today's drugs is expected to increase from about
500 to about 10,000. Pharmaceutical firms have
repositories of millions of different molecules
and compounds, some of which may have
characteristics that make them appropriate for
inhibiting newly found proteins. The process of
matching all these "ligands" to their appropriate
targets is an ideal task for distributed
computing, and the quicker it's done, the quicker
and greater the benefits will be. Another related
application is the recent trend of generating new
types of drugs solely on computers.
http//www.extremetech.com
24 Applications of Distributed Computing
- Commercial and Non-Commercial
- Commercial
- Market themselves to any corporation, engineer or
scientist who needs to crunch huge amounts of
numbers but cannot afford a super computer - Often company promises to pay the end systems
users for borrowing wasted CPU cycles - Several Commercial DC Companies
- United Devices
- http//www.ud.com
- Parabon Computation
- http//www.parabon.com/
25 Applications of Distributed Computing (contd)
- A quick word on For-Profit or Commercial DC
- Concerns as to viability of for-profit
Distributed Computing - Anyone would choose to Get paid for running DC
software - Process Tree Network
- Started a for pay DC system in January 2001
- Paid clients 12.50 per month to run their
software - Parent company went bankrupt in may 2001 (lack of
funding) - Perhaps Distributed Computing is best served for
non-profit purposes - Personal Hobbies and interests.
26 Applications of Distributed Computing (contd)
- DC has seen most success in volunteer-based
projects
- SETI_at_Home
- Arguably most successful active project
- Search for Extra Terrestrial Intelligence
- Google Toolbar!
- Folding_at_Home
- Simulates protein folding
- Supported by Intel
- 150 000 active CPUs
- United Devices Cancer Research
- distributed.net
- Cryptography
- Complete List of current and finished Projects
http//distributedcomputing.info/projects.html
27 Applications of Distributed Computing (contd)
- Lets take a closer look at Seti_at_Home
- Homepage http//setiathome.ssl.berkeley.edu/
- Project Statistics
28 Applications of Distributed Computing (contd)
29 Applications of Distributed Computing (contd)
- Notable Completed Projects
- RSA Factoring By Web
- First large scale project to factor a 130 digit
number - Completed on April 10, 1996
- Internet Animation 99
- Proof of concept
- Using DC system as a render farm
- Used nothing more than email and a web page
- Completed August, 1999
30 Applications of Distributed Computing (contd)
- Safer Markets Project
- Ran on entropia platform
- April 2001-Jan 2002
- Goal was to find a formula which could predict
stock market volatility - Soon as project ended, the site was taken down
and the url was forwarded to entropias homepage.
31 Future of Distributed Computing
- Distributed Computing is becoming recognized as a
practical platform for solving large
computational problems - Some of the biggest names in the industry are
getting their feet wet and currently in the news - IBM
- World Community Grid Project
- Intel
- Intel Peer-to-Peer Accelerator Kit
- Middleware for DC Applications
32 Future of Distributed Computing (contd)
- Eventually, inter-communication between client
nodes on large projects over the internet - Currently we share information over the net with
projects such as Seti_at_home, but not computational
resources Too risky - Thanks for your time!
33Useful Links and Resources
- DC Central
- http//library.thinkquest.org/C007645/english/0-we
lcome.htm
- Wikipedia
- http//en.wikipedia.org/wiki/Distributed_computing
- Distributed.net
- http//www.distributed.net
- Extreme Tech website
- http//www.extremetech.com/
- Distributed Computing
- http//distributedcomputing.info/