Title: Objectives of these slides:
12. Server Architecture
- Objectives of these slides
- examine suitable hardware architectures for
servers. - Background Reading
- Chapter 4 on Berson
- Operating Systems , Computer Architectures
2Overview
- 1. Server Functions
- 2. Server Requirements
- 3. Hardware Choices
- 4. Why Multiprocessor Systems?
- 5. Server OS Requirements
31. Server Functions
- Resource sharing and management
- file sharing
- printer sharing
- Database manipulation
- Communication services
42. Server Requirements
- Multi-user support
- Scaleability
- vertical and horizontal
- Performance, throughput
- Storage capacity
- Availability
- robustness, on-line administration
- Complex data types
- Networking and communication
52.1. A Performance Measure
- Performance 1 / ( CT PL CI)
- CT (cycle time) ? inverse of clock rate
- faster clock ? reduced cycle time
- PL (path length) no. of instructions required
to execute one command - CI (cycles/instructions) no. of cycles
necessary to execute one instruction
63. Hardware Choices
- Mainframes were traditional used as servers, but
modern microcomputers have much better
price/performance measures - SPARC 1,500 transaction/sec (TPS)
- IBM 150,000 TPS
- But, SPARCs are slower, and come with less
well-developed transaction processing software.
continued
7- Three popular server architectures
- RISC (Reduced Instruction Set Computer)
- SMP (Symmetric MultiProcessing)
- MPP (Massively Parallel Processing)
84. Why Multiprocessor Systems?
- Specialised processors
- FPU, graphics
- Parallel execution of large databases
- Concurrent processing of users
- Higher transaction rates
- Complex applications
- virtual reality, weather forecasting
94.1. Classifying the Architectures
- Interconnection structure between processors
- bus, star, ring, etc.
- Component interdependencies
- Degree of coupling between processors and memory
- shared memory, distributed memory
continued
10- Architectures can be classified by the way that
instructions (code) and data are
distributed.Two main designs - MIMD
- SIMD
114.2. Shared Memory Architecture
Global(shared)memory
Global Bus
- bus must be wide (64 or 128 bits)
- could be in one machine or many
12Features
- Large memory
- 4-30 processors
- If all the processors have similar capabilities,
then called a symmmetric multiprocessor (SMP)
architecture. - Seemless distribution (to the users)
- Limited scaleable performance
- system bus, memory, cache size, cache coherence
continued
13- Fairly easy to program since memory model is
similar to a uniprocessors - techniques include UNIX forks, threads, shared
memory - Many application areas involving parallel, mostly
independent, tasks - e.g. ATM funds withdrawal
- Implementations
- Sequent Symmetry, Sun SparcCenter
14Programming Shared Memory
- Can use conventional languages with simple
concurrency and locking of shared data
Global (shared)memory
- Where to put processes?
- Many-to-many communication
g
1.0
g g1
g g2
processes
154.3. Distributed Memory Architecture
- network could be inside one machine or many
16Features
- 100s of processors massively parallel
- New programming paradigms are needed
- new OSes, new compilers, new programming
languages - Programming languages are often based on
processes (concurrent objects) and message
passing.
17Programming with Distributed Memory
- Harder to program since must explain how
processes communicate.
- Poor UNIX tools
- Where to put processes?
processes
184.4. MIMD
- Multiple Instruction, Multiple Data
- Multiple processors simultaneously execute
different instructions on different data. - Best for large-grained parallel applications
- e.g. concurrent objects communicating
- Can be employed by shared-memory and distributed
architectures.
194.5. SIMD
- Single Instruction, Multiple Data
- Multiple processors simultaneously execute the
same instructions on different data. - Best for fine-grained parallel applications
- e.g. parallel pixel modification in a graphic,
many mathematical applications - Can be employed by shared-memory and distributed
architectures.
205. Server OS Requirements
- Support for multiple concurrent users
- Preemptive multitasking
- a (higher priority) task/process can interupt
another - Light-weight concurrency
- threads, (forks)
- Memory protection
- Scaleability
- load balancing
continued
21- Security
- Reliability/availability
- Two examples
- UNIX, Windows/NT
223. Language Constructs for Parallelism
- Objectives of these slides
- give an overview of the main ways of adding
parallelism to programming languages
23Overview
- 1. Relevance to Client/Server
- 2. Co-routines
- 3. Parallel Statements
- 4. Processes
- 5. Linda Tuple Spaces
- 6. More Details
241. Relevance to Client/Server
- Most server architectures are multiprocessor
- e.g. shared-memory, distributed architectures
- Most server applications require parallelism
- e.g. concurrent accesses to databases
- Client applications can also benefit from
parallel coding - e.g. concurrent access to different Web search
engines
252. Co-routines
- A routine is a function, procedure, method.
- Two co-routines are executed in an interleaved
fashion - part of the first, part of the second, part of
the first, etc. - A co-routine may block (suspend) and resume
another co-routine.
26Example
- void odd() int i for (i1 i lt 1000
ii2) printf(d, ii) resume
even()
void even() int i for (i2 i lt 1000
ii2) printf(d, ii) resume odd()
27Features
- Very hard to write co-routines correctly since
the programmer must remember to give all of them
a chance to execute. - Easy to starve a co-routine
- it never gets resumed
- Deadlocked programs are ones where no
co-routines can proceed - perhaps because they are all suspended
283. Parallel Statements
- parbegin statement1 statement2
statement3parendparfor(i0 ilt10 i)
statement using i - Other constructs are possible
- parallel if
29Examples
- Add two arrays (vectors) in parallel
- int a10, b10, c10 parfor(i0 i
lt 10 i) ci ai bi
continued
30- Parallel matrix multiplication
- float x1010, y1010, z1010
parfor(i0 ilt10 i) parfor(j0 jlt10 j)
zij 0.0 for(k0 k lt10 k)
zij zij xik
ykj
31Features
- Very fine-grained parallelism is typical
- can adjust it to some degree
- Most common as a way of parallelising numerical
applications (e.g. in FORTRAN, C). - Automatic parallelisation is a research aim
- e.g. compiler automatically changes for to parfor
continued
32- Speed-ups depend on code, data and the underlying
architecture - creation of processes may take longer than the
execution of the corresponding sequential code - data dependencies may force a sequential order
- x 2y x 3
- no point having 100 parallel processes if there
are only 2 actual processors
334. Processes
- A process is a separately executing piece of
code. Each process can have its own data. - The main issues are how processes communicate and
how they share resources. - Threads are light-weight versions of processes
- they store less OS information
- they have simpler communication facilities
34Process Example
- void main() fork compile(file1.c,file1.o)
fork compile(file2.c, file2.o) join
link(file1.o, file2.o)void compile(char
fnm, char ofnm) . . .
35Comunication Synchronisation
- Example a parallel compiler
front_end()
source code
assembly code
object code
assembler()
364.1. Shared Variables
- int count 0 / shared data /Line
bufferMAX / shared data /process void
front_end() int write_pos 0 Line ln
while (true) / generate a line of assembly
/ while (count MAX) / busy wait /
bufferwrite_pos ln write_pos
(write_pos1) MAX count
continued
37- process void assembler() int read_pos 0
Line ln while (true) while (count
0) / busy wait / ln
bufferread_pos read_pos (read_pos1)
MAX count-- / process line of assembly
code /
38Issues
- Busy wait versus polling
- wait until count lt MAX
- Race conditions
- seen in updates to count
- solution lock variables, semaphores, monitors,
etc.
394.2. Message Passing
- process void front_end() Message msg Line
ln while (true) / generate a line of
assembly code / msg ln send msg to
assembler()
process void assembler() Message msg Line
ln while (true) receive msg from
front_end() ln msg / process line of
assembly code /
40Issues
- Should the sender block (suspend) until a message
arrives at its destination? - how does the sender know it has arrived?
- How can the sender receive a reply?
- how is a reply associated with the original
message? - should a sender wait for a reply?
- Message ordering
- Testing received messages
- how are messages stored while being tested?
- what happens to a message that is rejected by a
receiver? - RPCs are a form of message passing where the
sender waits for an answer (the result of the
function call).
415. Linda Tuple Space
- Linda is the name for a set of operations that
can be added to existing languages. - e.g. C/Linda, FORTRAN/Linda
- The operations allow the creation/manipulation of
a tuple space by processes.
continued
42- A tuple space is a form of (virtual) shared
memory consisting of tuples (like structs,
records) that are accessed by matching on their
contents. - Three atomic operations
- out adds a tuple to the space
- read reads in an existing tuple
- in reads in and deletes the tuple from the
space - read and in may block the process using them
43Example
process 1
process 2
out(matrx,2,2, 3.4)out(matrx,2,3,
3.1)
446. More Details
- Programming Language EssentialsH.E. Bal D.
Grune Addison-Wesley, 1994 - A more technical and detailed discussion
- Models and Languages for Parallel
ComputationD.B. Skillicorn, D. TaliaACM
Computing SurveysVol. 30, No. 2, June 1998