Title: Data Management in a Highly Connected World
1Data Management in a Highly Connected World
- James Hamilton
- JamesRH_at_microsoft.com
- Microsoft SQL Server
March 3, 2000
2Agenda
- Client Tier
- Number of devices
- Device interconnect fabric
- Standard programming infrastructure
- Client tier database issues
- Resource requirements
- Implementation language
- Administrative cost implications
- Development cost implications
- Middle Tier
- Server Tier
- Summary
3How Many Clients?
- 1998 US WWW users (IDC)
- US 51M World wide 131M
- 2001 estimates
- World Wide 319M users
- 515M connected devices
- ½ billion connected Clients
- Conservative estimate based upon conventional
device counts
4Other Device Types
- TVs, VCRs, stoves, thermostats, microwaves, CD
players, computers, garage door openers, lights,
sprinklers, appliances, driveway de-icers,
security systems, refrigerators, health
monitoring, etc. - Sony evangelizing IEEE 1394 Interconnect
- http//www.sel.sony.com/semi/iee1394wp.html
- Microsoft consortium evangelizing Universal
Plug Play - www.upnp.org
- WAP Wireless Application Protocol
- http//www.wap.net/
5Device Interconnect Infrastructure
- Power line control
- X10 http//www.x10.org
- Sunbeam Thalia
- http//www.thaliaproducts.com/
6Why Connect These Devices?
- TV guide auto VCR programming
- CD label info song list download
- Sharing data resources
- Set clocks (flashing 1200)
- Fire and burglar alarms
- Persist thermometer settings
- Feedback data sharing based systems
- Temperature control power blind interaction
- Occupancy directed heating and lighting
7Device Communication Implications
- The need is there
- Infrastructure is going in
- Wireless
- Power line communications
- Unused twisted pair (phone) bandwidth
- Connectable devices infrastructure arriving
being deployed - On order of billions of client devices
8Device Interconnect Example
9Device Interconnect Example
10Device Interconnect Example
11Device interconnect Example
Den
Windows NT Server
Ethernet Hub
56k bps line
Ethernet Backbone
Deck
Filtration Plant
130 Gallon F/W Aquarium
Living Room
660 Gallon Marine Aquarium
X10 Backbone
130 Gallon F/W Aquarium
Home Sprinklers
Bedroom
12Improvements For Example
- Cooperation of lighting, A/C and power blind
systems - Alarms and remote notification for failures in
- Circulations pump
- Heating cooling
- Salinity other water chemistry changes
- Filtration system
- Feedback directed systems
13Palmtop Resource Trends
- Palmtops Ive purchased through the years
- All about same cost physical size
Palmtop RAM
Moores Law
100
Casio E105 (32M)
32M
HP 200LX (2M)
10
HP 100LX (1M)
Everex A20 (4m)
HP 95LX (0.5M)
Sharp IQ8300 (0.25M)
Sharp IQ7000 (0.125M)
1
0.1
1992
1994
1990
1996
1998
2000
2002
14O/S Memory Requirements
- Windows Memory requirements over time
Desktop RAM
Moores Law
128m
100
Windows 2000(64M)
Windows98 (16M)
10
Windows95 (4M)
WFW 3.1 (3M)
Windows 3.0 (2M)
1
Windows 2.0 (512K)
Windows 1.0 (256K)
0.1
1989
1991
1987
1993
1995
1997
1999
1985
15Smartcard Resource Trends
300 M
1 M
Memory Size (Bits)
10 K
You are here
3 K
1990
1992
1996
1998
2000
2002
2004
Source Denis Roberson PIN/Card -Tech/ NCR
16Devices Smaller Than PDAs
- Qualcomm PDQ
- 2 MB total memory
- Same mem curve as PDAsjust 2 to 5 years behind
- Nokia 9000il
- 8 MB total Memory
17Digital Cameras
Make Model Memory
Agfa CL30 60 to 360MB
Canon PowerShot S20 6 to 176MB
Epson PhotoPC 850Z 10 to 120MB
Kodak DC-280 32 to 245MB
Olympus D-340R 18 to 120MB
Panasonic Palmcam PV-SD4090 450 to 1,500MB
Sanyo VPC-SX500 19 to 120MB
18Resource Trend Implications
- Device resources at constant cost are growing at
super-Moore rates - Same but 2 to 3 yrs behind desktop system growth
- Same is true of each class of devices
- Telephones trail PDAs but again grow at the same
rate - Memory growth is not the problem
- However devices always smaller than desktops
- Devices more specialized so resource consumption
less can still run standard vertical app slice
19Standard Infrastructure at Client
- Clearly specialized user interface S/W needed
- But we have the memory resources to support
- Standard communications stack (TCP/IP)
- Standard O/S software
- Standard data management S/W with query
- Transparent replication
- Symmetric multi-tiered infrastructure S/W
- Leverage best development environments
- No need to rewrite millions of redundant lines of
code - More heavily used tested so less bugs
- Better productivity in programming to richer
platform - A full DBMS at client both practical useful
20Client-Side Database Issues
- Honey I shrunk the database (SIGMOD99)
- DB Footprint
- Implementation Language
- Both issues either largely irrelevant or soon to
be - Resource availability trends support standard
infrastructure S/W - Dominant costs admin, operations user
training, and programming - Vertical slice of standard apps rather than full
custom infrastructure
21DB Implementation Language
- Special DB implementation language (Java)
argument - centers on auto-installation of S/W
infrastructure - Auto-install is absolutely vital, but independent
of implementation language - Auto-install not enough client should be a cache
of recently used S/W and data - Full DBMS at client
- Client-side cache of recently accessed data
- Optimizer selected access path choice
- driven by accuracy currency requirements
- balanced against connectivity state
communications costs
22Admin Costs Still Dominate
- 60s large system mentality still prevails
- Optimizing precious machine resources is false
economy - Admin education costs more important
- TCO education from the PC world repeated
- Each app requires admin and user trainingmuch
cheaper to roll out 1 infrastructure across
multiple form factors - Sony PlayStation has 3Mb RAM Flash
- Nokia 9000IL phone has 8Mb RAM
- Trending towards 64M palmtop in 2001
- Vertical app slice resource reqmt can be met
23Dev Costs Over Memory Costs
- Specialty RTOS weak dev environments
- Quality quantity of apps driven by
- Dev environment quality
- Availability of trained programmers
- Requirement for custom client development
configuration greatly reduces deployment speed - Same apps have wide range of device form factors
- Symmetric client/server execution environ.
- DB components and data treated uniformly
- Both replicated to client as needed
24Client Side Summary
- On order of billions connected client devices
- Most are non-conventional computing devices
- All devices include standard DB components
- Standard physical logical device interconnect
standards will emerge - DB implementation language irrelevant
- Device DB resource consumption much less
important than ease of - Installation
- Administration
- Programming
- Symmetric client/server execution environment
25Agenda
- Client Tier
- Middle Tier
- High Availability via redundant data metadata
- Fault Isolation domains
- XML
- Mid-tier Caching
- Server Tier
- Summary
26High Availability is Tough
Availability Annual Lost Data Access Number of Nines
90 1 week 1
99 lt4 days 2
99.9 lt9 hours 3
99.99 1 hour 4
99.999 5 min 5
99.9999 30 sec 6
27Server Availability Heisenbugs
- Industry good at finding functional errors
- Multi-user application interactions hard
- Sequences of statistically unlikely events
- Heisenbugs (http//research.microsoft.com/gray/ta
lks) - Testing for these is exponentially expensive
- Server stack is nearing 100 MLOC
- Long testing and beta cycles delay software
release - System size complexity growth inevitable
- Re-try operation (Microsoft Exchange)
- Re-run operation against redundant data copy
(Tandem) - Fail fast design approach is robust but only
acceptable with redundant access to redundant
copies of data
28The Inktomi Lesson
- Inktomi web search engine (Brewer --SIGMOD98)
- Quickly evolving software
- Memory leaks, race conditions, etc. considered
normal - Dont attempt to test beta until quality high
- System availability of paramount importance
- Individual node availability unimportant
- Shared nothing cluster
- Exploit ability to fail individual nodes
- Automatic reboots avoid memory leaks
- Automatic restart of failed nodes
- Fail fast fail restart when redundant checks
fail - Replace failed hardware weekly (mostly disks)
- Dark machine room
- No panic midnight calls to admins
- Mask failures rather than futile attempt to avoid
29Apply to High Value TP Data?
- Inktomi model
- Scales to 100s of nodes
- S/W evolves quickly
- Low testing costs and no-beta requirement
- Exploits ability to lose individual node without
impacting system availability - Ability to temporarily lose some data W/O
significantly impacting query quality - Cant loose data availability in most TP systems
- Redundant data allows node loss w/o data
availability lost - Inktomi model with redundant data metadata a
potential solution
30Redundant Data Metadata
- TP Point access to data nearly solved problem
- TP systems scale with user number, people on
planet, or business size - All trending at sub-Moore rates
- Data analysis systems growing far faster than
Moores Law - Gregs law 2x every 9 to 12 (SIGMOD98Patterson)
- Seriously super-Moore implying that no single
system can scale sufficiently clusters are the
only solution - Storage trending to free with access speed
limiting factor - Detailed data distribution statistics need to be
maintained - Improve access speed availability using
redundant data (indexes, materialized views,
etc.) - Async update for stats, indexes, mat views
- Data paths choice based upon need currency
accuracy
31Affordable Availability
- Web-enabled direct access model driving high
availability requirements - recent high profile failures at eTrade and
Charles Schwab - Web model enabling competition in information
access - Drives much faster server side software
innovation which negatively impacts quality - Dark machine room approach requires auto-admin
and data redundancy - Inktomi model (Erik BrewerSIGMOD98)
- 42 of system failures admin error (Gray)
- Paging admin at 2am to fix problem is dangerous
32Connection Model/Architecture
Client
Server Node
- Redundant data metadata
- Shared nothing
- Single system image
- Symmetric server nodes
- Any client connects to any server
- All nodes SAN-connected
Server Cloud
33Compilation Execution Model
Client
- Query execution on many subthreads synchronized
by root thread
Server Thread Lex analyze Parse Normalize Optimize
Code generate
Server Cloud
Query execute
34Node Loss/Rejoin
Client
- Rejoin
- Node local recovery
- Rejoin cluster
- Recover global data at rejoining node
- Rejoin cluster
Server Cloud
35Redundant Data Update Model
Client
- Updates are standard parallel query plans
- Optimizer manages redundant access paths
- Query plan responsible for access plan
management - No significant new technology
- Similar to materialized view index updates today
Server Cloud
36Fault Isolation Domains
- Trade single-node perf for redundant data checks
- Complex error recovery more likely to be wrong
than original forward processing code - Many redundant checks are compiled out of retail
versions when shipped - Fail fast rather than attempting to repair
- Bring down node for mem-based data structure
faults - Dont patch inconsistent data copies keep
system available - If anything goes wrong fire the node and
continue - Attempt node restart
- Auto-reinstall O/S, DB and recreate DB partition
- Mark node dead for later replacement
37Data Structure Matters
- Most internet content is unstructured text
- restricted to simple Boolean search techniques
- Docs have structure, just not explicit
- Yahoo hand categorizes content
- indexing limited human involvement doesnt
scale well - XML is a good mix of simplicity, flexibility,
potential richness - Structure description language of internet
- DBMSs need to support as first class datatype
- Too few librarians in world
- so all information must be self-describing
38Relational to XML
- SELECT FOR XML
- FOR XML RAW (return an XML rowset)
- FOR XML AUTO (exploit RI, name matching, etc.)
- FOR XML EXPLICIT (maximal control)
- Annotated Schema
- Mapping between XML and relational schema
expressed in XML - Templates
- Encapsulated parameterized query
- XSL/T support
- XPATH support
- Direct URL access (SQL owned virtual root)
- SELECT FOR XML
- Annotated schema
- Templates
39XML to Relational
- XML bulk load
- Templates and Annotated Schema
- SQL server hosted XML tree
- Directly insert document into SQL Server hosted
XML tree - Select from server hosted XML tree rowset
insert into SQL tables - XML Data type support
- Hierarchical full text search
40XML Example
- http//SRV1/nwind?sqlSELECTDISTINCTContactTitle
FROMCustomersWHEREContactTitleLIKE'Sa25'OR
DERbYContactTitleFORXMLAUTO - Result set
- ltCustomers ContactTitle"Sales Agent"/gt
- ltCustomers ContactTitle"Sales Associate"/gt
- ltCustomers ContactTitle"Sales Manager"/gt
- ltCustomers ContactTitle"Sales Representative"/gt
41Mid-Tier Cache Requirements
- Non-proprietary multi-lingual programming
- Symmetric mid-tier server programming model
- Non-connected, stateless programming model
- High scale thread pool based
- Efficient main memory DB support
- Full query over local cache
- Query over just cached data, or
- Query over full corpus (server interaction reqd)
- Ability to handle network partitions server
failure - Support for life-time attributed data
- Transactional (possibly multi-server)
- Near real time
- Every N time units
- Read only
42Agenda
- Client Tier
- Middle Tier
- Server Tier
- Affordable computing by the slice
- Everything online
- Disk are actually getting slower
- Processing moves to storage
- Approximate answers quickly
- Semi-structured storage support
- Administrative issues
- Summary
43Server-Side Changes
- Server databases more functionally rich than
often required - Trend reversal
- Less at the server-tier with richer mid-tier
- Focus at back-end shifts to
- Reliability, Availability, and Scalability
- Reducing administrative costs
- Server side trends
- Scalability over single-node performance
- Everything online
- Affordable availability in high scale systems
44Compaq/Microsoft TPC-C Benchmark
tpmC
These are Top 5 benchmarks as of Feb 17, 2000.
227,079
152,207
135,815
135,815
135,461
98
55
53
20
19
Enterprise 6500 Solaris 2.6 Oracle 8i v 8.1.6
13,153,324. 97.10/tpmC
Escala EPC2400 AIX 4.3.3 Oracle
v8.1.6 7,462,215 54.94 tpmC
ProLiant 8500 Cluster Windows 2000 SQL Server
2000 4,341,603. 19.12 tpmC
IBM RS/6000 S80 AIX 4.3.3 Oracle v 8.1.6
7,156,910. 52.70/tpmC
ProLiant 8500 Cluster Windows 2000 SQL Server
2000 2,880,431. 18.93 tpmC
NOTE All TPC-C results reported as of February
17, 2000
45Computing by the Slice
Source TPC report executive summary
46Just Save Everything
- Able to store all Info produced on earth (Lesk)
- Paper sources less than 160 TB
- Cinema less than 166 TB
- Images 520,000 TB
- Broadcasting 80,000 TB
- Sound 60 TB
- Telephony 4,000,000 TB
- These data yield 5,000 petabytes
- Others estimate upwards of 12,000 petabytes
- World wide 1998 storage production 13,000
petabytes - No need to manage deletion of old data
- Most data never accessed by a human
- Access aggregations analysis, not point fetch
- More storage than data allows for greater
redundancy - indexes, materialized views, statistics, other
metadata
47Disk are Becoming Black Holes
- Seagate Cheetah 73
- Fast 10k RPM, 5.6 ms access, 16 MB cache
- But Very large 73.4 GB
- Result? Black hole 2.4 accesses/sec/gb
- Large data caches required
- Employ redundant access paths
48Processing Moves Towards Storage
- Trends
- I/O bus bandwidth is bottleneck
- Switched serial nets support very high bandwidth
- Processor/memory interface is bottleneck
- Growing CPU/DRAM perf gap leading to most CPU
cycles in stalls - Combine CPU, serial network, memory, disk in
single package - E.g. David Patterson ISTORE project
49Processing Moves Towards Storage
- Each disk forms part of multi-thousand node
cluster - Redundant data masks failure
- RAID-like approach
- Each cyberbrick commodity H/W and S/W
- O/S, database, and other server software
- Each slice plugged in personality set
- E.g. database or SAP app server)
- No other configuration required
- On failure of S/W or H/W, redundant nodes pick up
workload - Replace failed components at leisure
- Predictive failure models
50Approximate Answers Quickly
- DB systems focus on absolute correctness
- As size grows, correct answer increasingly
expensive - Text search systems depend upon quick approx
answer - Approx answer with statistical confidence bound
- Steadily improve result until user satisfied
- Ripple Joins for Online Aggregation
(Hellerstein-SIGMOD99) - Allows rapid exploration of large search spaces
- Conventional full accuracy only when needed
- Run query on incomplete mid-tier cache?
51Semi-Structured Storage Support
- Example applications
- Directory systems (e.g. Microsoft Active
Directory) - Document management systems
- Storage characteristics
- Flexible sparse schema support
- Fine grained security
- Recursive query
- Notification based extensibility common
- XML support important
- Particularly difficult to support when native SQL
access is also allowed - Important area for RDBMS expansion
52Examples Performance W/O Admin
- Multiple cached plans for different parameter
marker sub-domains - Async statistics gathering
- Async optimization
- Feedback-directed techniques
- Adapting number of histogram buckets
- Re-optimizing when cardinality errors discovered
during execution - re-optimize with additional data distribution
info gained during previous execution - Optimizer-created indexing structures
- Add indexes when needed (Exchange AS/400)
53Summary
- After 30 years, DB technology more relevant than
ever - Database innovations required at all tiers
- All devices run standard DB components
- Symmetric multi-tier programming model
- Hierarchical caching model
- Administration including installation disappears
- All info online machine accessible
- Symmetric programming model on all tiers
- Redundant data for availability performance
- Increased dependence on Approximate answers
- Support for semi-structured apps
- Mid-tier Client data moves to the processors
- Server-Tier Processing moves to data
54Data Management in a Highly Connected World
- James Hamilton
- JamesRH_at_microsoft.com
- Microsoft SQL Server
March 3, 2000