Title: CS514: Intermediate Course in Operating Systems
1CS514 Intermediate Course in Operating Systems
- Professor Ken Birman
- Ben Atkin TA
2Perspectives on Computing Systems and Networks
- CS314 Hardware and architecture
- CS414 Operating Systems with a focus on
single-processor and multi-processor systems - CS513 A course on security for operating systems
and networks - CS514 Emphasis on middleware networks,
distributed computing, technologies for building
reliable applications over the middleware - CS614 A survey of current research frontiers in
the operating systems and middleware space - CS444, CS476, CS644, CS676 networks, routers,
theory of network protocols, not offered recently
3Styles of Course
- CS514 tries to be practical in emphasis
- We look at the tools used in real products and
real systems - The focus is on technology one could build / buy
- But not specific products
- CS614 emphasis is on research opportunities
- We try to understand the state of the art
- Idea is to find good research topics
- Both have projects, but
- CS514 builds on popular middleware components
- CS614 tries to break new ground
4Recent Trends
- Massive network rollout
- Larger and larger numbers of small devices,
web-compatible cell phones - Object orientation and components emerge as
prevailing structural option - Widespread use of transactions for reliability
and atomicity - XML The web-ization of everything
- Java/Jini, .NET code can run on anything
- Client-server yielding to scalable replication
5Understanding Trends
- Basically two options
- Study the fundamentals
- Then apply to specific tools
- Or
- Study specific tools
- Extract fundamental insights from examples
6Understanding Trends
- Basically two options
- Study the fundamentals
- Then apply to specific tools
- Or
- Study specific tools
- Extract fundamental insights from examples
7Kens bias
- I work on reliable, secure distributed computing
- Air traffic control systems
- Stock exchanges
- Next generation electric power grid
- To me, the question is
- How can we build systems that do what we need
them to do, reliably, accurately, and in a secure
manner?
8Butler Lampsons Insight
- Why computer scientists didnt invent the web
- CS researchers would have wanted it to work
- The web doesnt really work
- But it doesnt really need to!
- Gives some reason to suspect that Kens bias
isnt widely shared!
9World Wide Web
- A seductive pass-time, but increasingly seen as a
serious business model - Idea would be to put information you need at your
fingertips to enable better, more informed, more
intelligent actions - The Web can also replace paper entirely a
world-wide tool for sharing knowledge
10Relying on the Web Banking
- Companies and individuals will need to rely on
the Web for this model to work - Broker will rely upon up-to-the minute stock
quotes and investment data and advice - Back office will trade stocks based on what the
broker currently wants - Criminals will try and violate security/privacy
to steal funds or manipulate trades
11Relying on the Web Medicine
- Web-style interface in a hospital
- Doctor relies on accuracy of patient status
records to make treatment decisions - Nurse relies on accuracy of drug dosage and
frequency data to administer treatment - Hospital legally obligated to provide for
security and privacy of the data
12Relying on the Web Publisher
- More and more publications will go electronic in
coming years (so will movies, MTV videos,
classical music, etc) - Publishers edge quality of authors, quality of
material. Will sell information - But for this to work, need reliable ways to
charge for access and to limit access to
authorized individuals!
13Air Traffic Control on the Web
- Web interface could easily show planes, natural
for controller interactions - But clearly need to know that trajectory and
flight data is current and consistent - Also need help with routing options
- Continuous availability is vital. Security and
privacy also needed
14New Air Traffic Control System AAS
- Started by FAA in 1989 to replace existing ATC
system - Current system has video display of radar for
controllers to use - Database has information about each flight
- Telephones to talk to the planes
15ATC systems divide country up
16More details on ATC
- Each sector has a control center
- Centers may have few or many (50) controllers
- Data comes from a radar system that broadcasts
updates every 10 seconds - Database keeps other flight data
- Controllers each own smaller sub-sectors
17Current System has Problems!
- Overloaded computers that often crash
- Getting slow as volume of air traffic rises
- Inconsistent displays a problem phantom planes,
missing planes, stale information - Some major outages recently (Newark down for 1/2
hour, LA down for 1 hour in 1995). One near-miss
associated with LA outage
18Concept of New System
- Replace video terminals with workstations
- Build a highly available real-time system
guaranteeing no more than 3 seconds downtime per
year - Offer much better user interface to ATC
controllers, with intelligent course
recommendations and warnings about future course
changes that will be needed
19ATC Architecture
NETWORK INFRASTRUCTURE
DATABASE
20Technologies Used
- Base on standard, off-the-shelf workstations
(easier to maintain, upgrade, manage) - IBM proposed software for fault-tolerance and
consistent system implementation - Fancy graphical user interface much like the Web,
pop-up menus for control decisions, etc.
21Project Was a Fiasco!!
- IBM unable to implement a fault-tolerant software
architecture! Problem was much harder than they
expected. - Even a non-distributed interface turned out to be
very hard, major delays, scaled back goals - Resulting system is unsatisfactory even before
delivery
22Free Flight
- Many think this is the next step in aviation
- Planes use GPS receivers to track own location
accurately - Combine radar and a shared database to see each
other - Each pilot makes own routing decisions
- ATC controllers only act in emergencies
23Free Flight (cont)
- Now each plane is like an ATC workstation
- Each pilot must make decisions consistent with
those of other pilots - ... but if FAAs project failed in 1994, why
should free flight succeed in 2010? - Something is wrong with the distributed systems
infrastructure!
24Other critical applications
- Banking, stock markets, stock brokers
- Heath care, hospital automation
- Control of power plants, electric grid
- Telecommunications infrastructure
- Electronic commerce and electronic cash on the
Web (very important emerging area) - Corporate information base a companys memory
of decisions, technologies, strategy - Military command, control, intelligence systems
25We depend on distributed systems!
- If these critical systems dont work
- When we need them
- Correctly
- Fast enough
- Securely and privately
- ... then revenue, health and safety, and national
security may be at risk!
26Signs of a Crisis in Computing
- Highly visible fiascos ATC project, Denver
lug-gage handling system, London Stock Exchange. - Hackers pose an increasingly serious threat
dis-rupted telephone services, breakins to
critical computing systems - Vendors offering little in the way of reliability
(security situation is better)
27Critical Needs of Critical Applications
- Security Can tell who is doing what and can use
this to enforce authorization - Privacy Intruders cant see data or user ids
- Availability System is continuously up
- Recoverability Can restart failed components
- Consistency Actions of system at different
locations are consistent with each other.
28Web Brownouts
- Domain name service (DNS) can overload (1-3)
- Server or proxies can overload, crash (4-9)
- Communication lines can overload or break
- DNS or proxy can return stale data
29Infrastructure Needs to Change
- To avoid brownouts need to make more use of
replicated (cached) data - DNS replication caching of host addresses
- Web proxies replicate copies of documents
- Creates a new challenge
- Coherence guarantee that a cached copy of an
object is up to date
30What this course is really about
- Distributed computing is rapidly transforming the
way we work, live, the way that companies do
business. - Increasingly, distributed computing systems are
the only ones you can buy. - The challenge build distributed systems which
can be relied upon in critical settings
31Whats the Story Today?
- Few distributed systems or Web applications
consider reliability issues - The ones that do worry about reliability are
often naive about what they are getting into,
leading to highly visible failures - But we do have technical answers to many of the
basic problems and some exciting initial options
32Goals for this course?
- Understand the basic technologies from which
distributed systems are constructed - Maintain a degree of emphasis on reliability
issues throughout how reliable are the standard
technologies? Can they be used reliably despite
their limitations? - Look at advanced technologies in context of real
systems built in standard ways
33Trends are changing
- More and more pressure on industry
- When the network is down, your company wont make
money - Clients want tools they can rely on
- This is creating pressure on vendors who offer
middleware - Result is a new emphasis on scalability and
reliability - We want reliability, as long as we can have
performance and scalability too.
34Technologies we will cover
- RPC and client-server computing Streams
- Internet technologies (email, news, msg. bus) and
trends (the next generation Internet) - DCE, Corba, COM Object-Oriented and Component
Environments - Web technologies (HTTP, XML), how the popular
scalable architectures work - Process group computing and scalability issues
- Transactions and reliability
- Just a Taste of Security
- System Management, Clusters, Realtime
35Course Overview 24 lectures
- Intro Basic technologies 4 lectures
- Web and Internet 2 lectures
- Reliability technologies
- Distributed group solutions 6 lectures
- Security options 2 lectures
- Real-time issues 2 lectures
- Transactional systems 2 lectures
- Management 3 lectures
- Other topics 3 lectures
36Project
- CS514 has
- Homeworks, from time to time
- A reasonably ambitious software project (can be
used to satisfy your MEng project requirement) - Projects can be done in groups
- Usually involve tackling reliability or
scalability with some popular technology - This semester, hoping to use two Java-oriented
b2b technologies - HPs eSpeak
- BEA Systems WebLogic
- Youll teach yourself how to use them
37Major Themes?
- Modularity (also known as object-orientation).
Better structured systems are more reliable. - Performance. Technologies need to be fast to be
perceived as working well - Exploiting group structures. These are common in
reliable distributed systems - Rigor. We want to know why a technique works
ad-hoc solutions often break under stress
38Scalability
- Suddenly the hot issue for industry
- Basically, customers expect solutions that
- Can be developed on a small scale
- Continue to work during prime-time
- Scalability and stability can be considered from
many dimensions - Today, most of the most popular solutions scale
poorly!
39The Prevailing Mindset
- Many developers believe that reliable systems are
clumsy, overengineered, slow - Image a robust bridge. Sounds like some sort
of ugly, heavy eyesore - The Web and the Net are about elegant,
light-weight, fast systems antithesis of
robust ones - Reliability is also at odds with using standard
components and packages
40Insights From Course?
- Reliability techniques are often very elegant
- Complexity is a challenge modularity used to
control these costs - Can achieve high performance in reliable
distributed systems - ... but they sometimes are hard to combine with
standard technologies
41Lightweight but Resilient Bridges, Secure
Computing Enclaves
42Lightweight but Resilient Bridges, Secure
Computing Enclaves
- A good way to imagine the technology we seek
- Our job is to build those enclaves
- Trick is to use the technical tools the right
way! - In CS514, we wont study the security aspects of
the problem in more than a shallow manner
43Recommended Reading
- Textbook read the Introduction
- While surfing the Web, think about outages
- Keep a count over half an hour of surfing the
net how often did you have problems? What sorts
of problems? - Find the University of Michigan Web pages on
internet availability. What does this data tell
you?