Title: Slide Title
1(No Transcript)
2Microsoft Research Directions Jim GraySenior
ResearcherMicrosoft Corporation
gray_at_microsoft.com http//www.Research.Microsoft
.com/Gray
3Microsoft Research
- Goal pursue strategic technologies for
Microsoft - Founded in 1991
- 200 researchers in 12 areas
- Redmond, San Francisco, Cambridge England
- Growing to 600 by 2001
- Internationally recognized research teams
- Many publications, conference presentations
- Leadership roles in professional societies,
journals, conferences - Direct involvement with product and service
groups at Microsoft
4Microsoft Research Themes
- Programming tools, methodologies and techniques
- Basic block tool, program analysis, IP
- Advanced interactivity and intelligence
- Speech, natural language, vision
- Decision theory, 3D graphics, UI
- Systems and architecture
- OS, databases, scalable servers
5Advanced Development Tools
- Analysis of executables
- Dynamic analysis driven by user scenarios
- Instrumented code
- Automatic reorganization of executables
- Reduction of code working set size
- Branch straightening
- Boot ordering for boot-time reduction
6Initial Results
- Reduced code working sets up to 50
- Improved throughput by 10
- Delivered to 35 clients
Windows NT working set size
500
400
Original
300
Pages referenced
Optimized
200
100
0
0
20
40
60
80
100
Seconds
7Speech Technology
- Speech recognition
- Speaker-independent, command, and control
- Dictation
- Speaker-independent, large vocabulary
- Discrete and continuous speech
- Trainable speech synthesis
- Prosody and concatenative speech units learned
from corpus - Download from MS Research web site
8Natural Language
- Broad-coverage syntax analyzes unrestricted text
- Dictionary-based semantic network provides
growing knowledge base - Flexible underlying system for multiple languages
9Robotics
Machine learning
Interactive movies
Info highway cruiser
Advanced summary
Discourse/ pragmatics
Interactive games
UI
Peedy
Concept normalizing
NL query
Improved IR
Sense choosing
Bob
Enhanced help
Improved SR
SR
Logical form
Semantic critiques
Revised syntax
Auto indexing
Probs(DT)
Syntactic critiques
Initial syntax
Phrase spacing
Morphology
Find and replace
10Levels Of Writing Critiques
- We scheduled the next meeting for noon.
- Each of the products are designed to help.
- I saw the Grand Canyon flying to Arizona.
- Ladies are requested not to have children in the
bar. (From a sign in a Norwegian cocktail lounge)
11Comic Chat
- Comic panels based on chat input
- Users control character's emotions
- Comic strip acts as compelling record of the
conversation
- Automated
- Character placement
- Balloon construction
- Balloon layout
- Camera zoom
- Panel breaks
- Etc.
123D Graphics Research
- Bring very high-performance, high-quality
graphics to PCs - Interactive
- Uniform treatment of multimedia
- Modeling
- Representation of 3D models
- Automatic simplification
- Animation
13Simplification Problem
70,000
8,700
34,100
4,200
2,600
2,300
Competing goals accuracy and conciseness
14Vision Projects
- 3D reconstruction from video and images
- Motion analysis for video compression
- Model acquisition for rendering
- Visual human/computer interaction
- Communication by gestures and expressions
- Multimodal speech/vision interfaces
15Motion Analysis
- Convert masked images into a background sprite
for content-based coding Scrunch -
-
- Working with Softimage on motion tracking
16Video-Based 3D Modeling
- Convert a video sequence into a solid 3D model
based on object silhouettes - Being used in Lumigraph project
17Systems Research Areas
- Scalable, fault-tolerant servers and services
- Most of this talk is about scalable servers
- Other OS projects
- Video Audio servers - NetShow
- Real time OS for set-top boxes
- WindowsCE grew from an AT project
- High-performance distributed computing
- Zero Admin Windows
- IPv6
181987 256 Tps Benchmark
- 14 million computer (Tandem)
- A dozen people
- False floor, two rooms of machines
Admin expert
Hardware experts
A 32 node processor array
Auditor
Network expert
Simulate 25,600 clients
Manager
Performanceexpert
OS expert
DB expert
A 40 GB disk array (80 drives)
191997 10 Years LaterOne person and one box
1250 tps
- One breadbox 5x 1987 machine room
- 23 GB is hand-held
- One person does all the work
- Cost/tps is 1,000x less 1 micro dollar per
transaction
4x200 Mhz cpu 1/2 GB DRAM 12x4 GB disk
Hardware expert OS expert Net expert DB
expert App expert
3x7x4 GB disk arrays
20ThesisMany little beat few big
3
1 MM
1 million
100,000
10,000
Pico Processor
Nano
Micro
10 picosecond ram
1 MB
Mainframe
Mini
10 nanosecond ram
100MB
10 microsecond ram
10 GB
10 millisecond disc
1TB
1.8"
2.5"
3.5"
5.25"
10 second tape archive
100 TB
9"
14"
1 M SPECmarks, 1TFLOP 106 clocks to bulk
ram Event-horizon on chip VM reincarnated Multipro
gram cache, On-chip SMP
- Smoking, hairy golf ball
- How to connect the many little parts?
- How to program the many little parts?
- Fault tolerance?
21Future Super Server4T Machine
- Array of 1,000 4B machines
- 1 bps processors
- 1 BB DRAM
- 10 BB disks
- 1 Bbps comm lines
- 1 TB tape robot
- A few megabucks
- Challenge
- Manageability
- Programmability
- Security
- Availability
- Scalability
- Affordability
- As easy as a single system
Cyber Brick a 4B machine
Future servers are CLUSTERS of processors,
discs Distributed database techniques make
clusters work
22The Hardware Is In PlaceAnd then a miracle occurs
- SNAP scalable network and platforms
- Commodity-distributedOS built on
- Commodity platforms
- Commodity networkinterconnect
- Enables parallel applications
?
23Scalable ComputersBOTH SMP And Cluster
Grow up with SMP4xP6 is now standard Grow out
with cluster Cluster has inexpensive parts
SMP Super Server
Departmental Server
Cluster of PCs
Personal System
24What TPC-Benchmarks Say
- PC technology 2.5x cheaper than high-end SMPs
- PC performance is 1/4 high-end SMPs
- 4xP6 vs 24x UltraSparc
- 9.1k tpmC _at_ 49/tpmC vs 31 ktpmC _at_ 109/tpmc
- 6x more cpus, 3.5x more thruput.
- NT 2.3 ktpmC/cpu vs Solaris 1.3 ktpmC/cpu
- Still, UltraSparc performance IS impressive
- Commodity solutions will come
25HPs New TPC-C Result
26How Big Are Windows NT SQL Servers?
- Study found
- Several at 50 GB to 100 GB nodes
- A few multi-node up to one TB
- http//131.107.1.182/research/barc/gray/SQL
Server Scaleability.doc - None beyond 100 GB per node
- A survey shows relatively few operational DBs
beyond 1 TB (1 TB 500K of disk!)
http//www.wintercorp.com/topten.html - Want to pioneer large DBs on Windows NT
27Goal
- Build a 1 TB SQL Server database
- Show off Windows NT and SQL Server scalability
- Stress test the product
- Demo it on the Internet
- WWW accessible by anyone
- So data must be
- 1 TB
- Unencumbered
- Interesting to everyone everywhere
- And not offensive to anyone anywhere
28The Hardware
- DEC Alpha
- 324 StorageWorks Drives (1.4 TB)
- SQL Server 7.0
- USGS data
- Russian Space data
- Two meterresolutionimages
29Image Data Sources
300 GBSrc USGS and UCSB
UCSBmissing some DOQs
DOQ
30Demo
31Cluster Advantages
- Clients and servers made from the same stuff
- Inexpensive built with commodity components
- Fault tolerance
- Spare modules mask failures
- Modular growth
- Grow by adding small modules
- Parallel data search
- Use multiple processors and disks
32Cluster Shared What?
- Shared memory multiprocessor
- Multiple processors, one memory
- All devices are local
- DEC, SG, Sun Sequent 16..64 nodes
- Easy to program, not commodity
- Shared disk cluster
- An array of nodes
- All shared common disks
- VAXcluster Oracle
- Shared nothing cluster
- Each device local to a node
- Ownership may change
- Tandem, SP2, Wolfpack
33Clusters Being Built
- Teradata 500 nodes (50K/slice)
- Tandem, VMScluster 150 nodes (100K/slice)
- Intel, 9,000 nodes _at_ 55 million
(6K/slice) - Teradata, Tandem, DEC moving to Windows NT low
slice price - IBM 512 nodes _at_ 100 million (200K/slice)
- PC clusters (bare handed) at dozens of nodes Web
servers (msn, PointCast...), DB servers - Key technology is the applications
- Applications distribute data
- Applications distribute execution
- Its the applications STUPID!
34Billion Transactions per Day Project
- Built a 45-node Windows NT Cluster (with help
from Intel Compaq) gt 900 disks - All off-the-shelf parts
- Using SQL Server DTC distributed transactions
- DebitCredit Transaction
- Each node has 1/20 th of the DB
- Each node does 1/20 th of the work
- 15 of the transactions are distributed
35Billion Transactions Per Day Hardware
- 45 nodes (Compaq Proliant)
- Clustered with 100 Mbps Switched Ethernet
- 140 cpu, 13 GB, 3 TB.
361.2 B tpd
- 1 B tpd ran for 24 hrs.
- Sized for 30 days
- Linear growth
- 5 micro-dollars per transaction
- Out-of-the-box software
- Off-the-shelf hardware
- AMAZING!
37How Much Is 1 Billion Tpd?
- 1 billion tpd 11,574 tps (transactions per
second) 700,000 tpm (transactions/minute) - ATT
- 185 million calls per peak day (worldwide)
- Visa does 20 million tpd
- 400 million customers
- 250K ATMs worldwide
- 7 billion transactions (cardcheque) in 1994
- New York Stock Exchange
- 600,000 tpd
- Bank of America
- 20 million tpd checks cleared (more than any
other bank) - 1.4 million tpd ATM transactions
- Worldwide Airlines Reservations 250 Mtpd
38Clusters (Plumbing)
- Single-system image
- Naming
- Protection/security
- Management/load balance
- Fault tolerance
- Wolfpack demo
- Hot pluggable hardware and software
39So, Whats New?
- When slices cost 50,000, you buy 10 or 20
- When slices cost 5,000 you buy 100 or 200
- Manageability, programmability, usability become
key issues (total cost of ownership) - PCs are much easier to use and program
MPP vicious cycle No customers!
Apps
CP/commodity virtuous cycle Standards allow
progress and investment protection
Standard OS and Hardware
Customers
40Windows NT Server ClusteringHigh availability
on standard hardware
- Standard API for clusters on many platforms
- No special hardware required
- Resource Group is unit of failover
- Typical resources
- Shared disk, printer...
- IP address, NetName
- Service (Web,SQL, File, Print Mail, MTS )
- API to define
- Resource groups
- Dependencies
- Resources
- GUI administrative interface
- A consortium of 60 HW and SW vendors (everybody
who is anybody)
2-node cluster in beta test now Available H1
97 gt2 node is next SQL Server and OracleDemo on
it today Key concepts System a node Cluster
systems working together Resource HW/SW
module Resource dependency resource needs
another Resource group fails over as a
unit Dependencies do not cross group boundaries
41Where We Are Today
- Clusters moving fast
- OLTP
- Wolfpack
- Technology ahead of schedule
- CPUs, disks, tapes, wires...
- OR databases are evolving
- Parallel DBMSs are evolving
- HSM still immature
42(No Transcript)
43Metcalfs LawNetwork Utility Users2
- How many connections can it make?
- One user no utility
- 100,000 users a few contacts
- 1 million users many on Net
- 1 billion users everyone on Net
- That is why the Internet is so hot
- Exponential benefit
44Moores First Law
- XXX doubles every 18 months 60 increase per
year - Micro processor speeds
- Chip density
- Magnetic disk density
- Communications bandwidthWAN bandwidth
approaching LAN speeds - Exponential growth
- The past does not matter
- 10x here, 10x there, soon youre talking REAL
change - PC costs decline faster than any other platform
- Volume and learning curves
- PCs will be the building bricks of all future
systems
1GB
128MB
1 chip memory size ( 2 MB to 32 MB)
8MB
1MB
128KB
8KB
2000
1980
1990
1970
256M
1M
16M
bits 1K
4K
16K
64K
256K
4M
64M
45Bumps In The Moores Law Road
- DRAM
- 1988 United Statesantidumping rules
- 1993-1995 ?price flat
- Magnetic disk
- 1965-1989 10x/decade
- 1989-1996 4x/3year! 100X/decade
46Gordon Bells Seven Price Tiers
- 10 wrist watch computers
- 100 pocket/ palm computers
- 1,000 portable computers
- 10,000 personal computers (desktop)
- 100,000 departmental computers
(closet) - 1,000,000 site computers (glass house)
- 10,000,000 regional computers (glass
castle)
Super server costs more than 100,000Mainframe
costs more than 1 million Must be an array
of processors, disks, tapes, comm ports
47Bells Evolution Of Computer Classes
- Technology enables two evolutionary paths1.
Constant performance, decreasing cost2.
Constant price, increasing performance
Mainframes (central)
Minis (dept.)
Log price
WSs
PCs (personals)
??
Time
1.26 2x/3 yrs - 10x/decade 1/1.26 .8 1.6
4x/3 yrs - 100x/decade 1/1.6 .62
48Software Economics
Microsoft 9 billion
- An engineer costs about150,000/year
- RD gets 5...15of budget
- Need 3 million1 million revenue per
engineer
Profit 24
RD 16
SGA 34
Tax 13
Productand Service 13
Intel 16 billion
IBM 72 billion
Oracle 3 billion
Profit 15
Profit 6
RD 9
RD 8
Profit
22
Tax 7
SGA
11
Tax
SGA
12
PS 59
43
PS 47
PS 26
49Software Economics Bills Law
Fixed_
Cost
Price
Marginal _Cost
Units
- Bill Joys law (Sun) dont write software for
less than 100,000 platforms _at_ 10 million
engineering expense, 1,000 price - Bill Gates law dont write software for less
than 1,000,000 platforms _at_ 10 engineering
expense, 100 price - Examples
- UNIX versus Windows NT 3,500 versus 500
- Oracle versus SQL Server 100,000 versus 6,000
- No spreadsheet or presentation pack on
UNIX/VMS/... - Commoditization of base software and hardware
50Gordon Bells Platform Economics
- Traditional computers custom or semi-custom,
high-tech and high-touch - New computers high-tech and no-touch
100000
10000
Price (K)
1000
Volume (K)
Applicationprice
100
10
1
0.1
0.01
Mainframe
WS
Browser
Computer type
51Groves LawThe New Computer Industry
- Horizontal integrationis new structure
- Each layer picks best from lower layer
- Desktop (C/S) market
- 1991 50
- 1995 75
Example
Function
Operation
ATT
Integration
EDS
Applications
SAP
Middleware
Oracle
Baseware
Microsoft
Systems
Compaq
Intel and Seagate
Silicon and Oxide