Title: Analyst Tour June 1998
1Databases in Internet Applications Case
Studies Anil Nori CTO AserA Inc. Palo
Alto USA anori_at_asera.com
2Acknowledgements
- Sources for some of the material
- Oracle Corporation
- CNN Custome News
- Excite
- Cisco
3Database Technology Timeline
Simple Data Management
Global Enterprise Management
Early 80s
Late 80s
Early - Mid 90s
Late 90s - 21st C
Pre- relational
EarlyRelational
Client-server Relational
Enterprise -capable Relational
Internet Computing
Data Warehouse Hi-end OLTP
Packaged Vertical Applications
Simple OLTP
Active Database
Middleware (messaging, queues, events) Java,
CORBA, Web interfaces
Scaleable OLTP, parallel query, partitioning,
cluster support, row-level locking, high
availability
Simple transactions, on-line backup recovery
Support for all types of data, extensibility,
objects
Stored procedures, triggers
4Current State of DBMSs
- OLTP applications
- Large amounts of data
- Simple data, simple queries and updates
- Update statement from debit/credit
transactionUPDATE accounts SET abalance
abalance deltaWHERE aid aid - Typically update intensive
- Large number of concurrent users (transactions)
- Data warehousing applications
- Large amounts of data
- Simple data but complex querying
- Typically read intensive
- Large number of users
5Current State of DBMSs
- These applications require
- Large users/transactions
- High performance
- High availability (7x24 operations)
- Scalability
- High levels of security
- Administrative support
- Good utilities
6Internet Applications Challenges
Data Warehousing
Users
Every Employee
Analysts
Size
7Internet Applications Challenges
Site Operation
Management
Low TCO, Mission Critical
Availability
24X7
Occasional
8Internet Challenges
- Availability
- Need near 100 availability
- Must be easy to manage
- Replication, hot standby, foolproof system?
- Scalability
- Number of users is orders of magnitude higher
- Security
- Global users
- Managing millions of users
- Encryption
- Performance
- Internet user expectations
- Speed vs correctness
- (e.g. Search engines vs blade/cartridge/extender
- Availability vs correctness
9Internet Application Architecture Today
Client Tier
authoring
Browser
Browser
tools etc.
HTTP
HTTP
Physical Middle Tier
WEB/APP Server
Data Integration, Storage, Query,
Management
Middle Tier Application
Application messages
Remote messages
Gateways
Data Sources
Other
OLE/DB
ORDBMS
Data
Data source
Sources
10Case Studies
- CNN Custom News
- Excite
- Cisco Internet Applications
11CNN Custom News
- On-line news service
- Allows users to customize news in a personalized
manner - Offers variety of news items (e.g. national,
international, business etc.)
12Custom News Application Architecture
Client Tier
Browser
Browser
HTTP
Hardware Load Balancing
Physical Middle Tier
WEB Server
WEB Server
WEB Server
...
Database Tier
OPS
13CNN Custom News
- Backend
- SUN SOLARIS enterprise servers
- Oracle Parallel Server 7.3.4
- Middle-Tier (9 Machines)
- Web Servers
- Oracle Application Servers
- PL/SQL Cartridges
- Load Balancing
- Harware based
- DNS router
- Round -robin
14Oracle Application Server
Adapter
CORBA Backend
15CNN Custom News
- Data feeds into the database
- Keeps text in the database
- Images in files
- Images accessed in the middle-tier
- PL/SQL Cartridge
16PL/SQL Cartridge
PL/SQL Cartridge
Connection pooling Session Caching Parameter
Marshalling Validation Result Processing
OAS
Oracle DBMS PL/SQL
17PL/SQL
- Server-side
- Used to generate HTML
- Suited for database logic
18 Searching
- Uses Oracle ConText cartridge
- Content-based searching
- Uses bitmap indexes
19CNN Custom News Observations
- Database-centric
- Uses PL/SQL based scripting
- Application Server for scalability
20Excite
- Personalized online service that gives Web users
everything they want, all in one place - Builds tools that manage vast amounts of
information available on the internet - Provides variety of user services (apps)
- News
- Money and Investing -- stock quotes
- Message boards and Chat
- Mail
- Communities
- Classifieds
- Jobs
21Excite
- Supports suite of applications
- Each application uses three-tier architecture
- Federated approach
- Many databases
- Databases specific to applications
- Application logic in the middle-tier as
multi-threaded embedded C programs (proc
programs)
22Excite An Application Architecture
Client Tier
Browser
Browser
HTTP
HTTP
Physical Middle Tier
WEB Server
WEB Server
Database Tier
23Excite - PFP Application
- Personalized front page application
- Application is deployed as 50 middle-tier daemon
processes - The middle-tier application daemons perform
- Application logic in C
- Connection pooling
- Each daemons keeps about 40 connections to the
database (about 2000 total connections to the
database) - Load balancing
24Excite - PFP Database Configuration
- Oracle8 on SUN solaris server
- 2 SUN 6500s -- 28 way SMP
- PFP database is split into multiple databases for
load balancing and scalability - Scalar data stored in the database in relational
tables - About 20 tables for storing user profiles 100
tables for content
25Excite - PFP Database Configuration
- Multi-media content (e.g. Stock quotes or news
item) stored in memory mapped files for fast
access. File references stored in the database - Lot of the content is read-only need not be
backed up can be reconstructed from the original
sources
26Excite - Scalability
- By partitioning the application across multiple
databases - Each application partition supported by multiple
middle-tier daemon processes - Multiple web servers to reduce traffic congestion
27Excite - Availability
- Using replication and hot standby
- Uses oracle8 hot standby feature
- Uses asynchronous replication. Data replicated at
10 sec latency - Almost every database is replicated for failover
- Replication preferred over hot standby. Hot
standby cannot be used for normal usage
28Excite - Other Applications
- Most of the Excite applications have similar
three-tier architecture
29Excite - Observations
- Some content (specially, for communities
applications) could be stored in the database.
Management benefits attractive. If content stored
in the database, access performance is very
critical - Need fast replication
- Currently not using middle-tier caching. Caching
could be quite useful but coherency is an issue
30Cisco
- Successfully implemented applications for the
internet - Internet commerce
- Order placement
- Checking order status
- On-line, guided product configuration
- Price quotes
- Employee self-service
- Provides all employee services electronically
- Employee directories
- Employee benefits
- Expense reports
31Cisco
- Supply chain management
- Networked suppliers, resellers and customers
- Enables business partners to manage and operate
major portions of its supply chain - Entire supply chain works off one central demand
forecast - Customer care
- Exchange of technical information
- Software upgrades (90 of software upgrades via
internet) - On-line support ( 70 of support on-line)
- On-line, assisted trouble-shooting
32Cisco
- Communications and collaboration
- Sales and technical training
- Virtual classrooms
- Company-wide meetings and broadcasts
33Cisco Commerce Server Architecture
Client Tier
Browser
Browser
HTTP
HTTP
Physical Middle Tier
WEB Server
Oracle DBMS
Database Tier
Oracle Applications
34Cisco Commerce Server
- Typical three-tier architecture
- Proprietary web server
- Performs content aggregation
- Encryption
- Accesses oracle DBMS
- Runs on a dedicated SUN server
- Proprietary commerce server
- Proprietary application server
- Performs variety of commerce functions
35Cisco Commerce Server
- Scalability and availability
- Big servers for scalability
- Multiple commerce server processes for load
balancing - Databases replicated
- Hot standby for availability
36Case Studies Observations
- Database is being used mostly for storage
- Application in the middle-tier
- Middle-tier also provides
- scalability
- load balancing
- large number of users
37Analyzing Internet Applications
- Web integration
- Web publishing
- Application integration
- E-commerce
38WEB Integration
- Heterogeneous data sources
- Heterogeneous data types
- 1000s of data sources
- Dynamic data
- Warehousing
39Web Publishing
- Problem internet placing new requirements on
content management - Heterogeneity access different types of content
from browsers e.g. Email, data warehouses,
reports, HTML files - Personalized structured, dynamic, customized
content - Transactive content blending with application
- Aggregation portalization via major gateways
40Application Integration
- Integrating Multiple Applications (e.g. ERP/Front
Office) - Application workflow specification
- Asynchronous communication
- Queuing and propagation
- Message tracking
- Message warehouse (persistence)
- Message broker/server
- Data transformation
- Transforming messages to different application
formats (e.g. SAP, CLARIFY, I
41Electronics Commerce
- Automating business-to-business,
business-to-consumer interactions - Selling and buying
- Order management
- Product catalogs
- Product configuration
- Sales and marketing
- Education and training
- Service
- Communities
42Database Technology Uses
- Business/workflow transactions
- Support across multiple database/ERP systems
- Transactional
- Tools to generate compensating actions
- Transformations
- Queuing
- Support for heterogeneous messages
- Transactional
- Querying, e.g. On attribute, value pairs
- Indexing, e.g. On attribute, value pairs
- Publish/subscribe
43Database Technology Uses
- Rule engines
- Complex business processing rules
- Customization/profiling rules
- Business domain rules
- Presentation rules
- Repositories for Application Development
- Managing Java objects, interfaces, etc.
- Must for application integration
- Standardized object models and protocols
- Directories vs repositories
44Database Technology Uses
- XML support
- XML schema/storage
- XML caching
- XML querying
- Coexistence with SQL -- current efforts seem
disjoint - Multiple caches
- Consistency of middle-tier and database caches
- Data mining
- Algorithms need to become more pragmatic
45Database Technology Uses
- Internet user expectations
- Speed vs correctness
- (e.g. Search engines vs blade/cartridge/extender
) - Availability vs correctness
- Component Architecture
- Caching
- XML support
- Querying
- Transactions
- Rule engines
- Metadata management
- Queueing
46Database Technology Uses
- Availability
- Need near 100 availability
- Must be easy to manage
- Replication, hot standby, foolproof system?
- Scalability
- Number of users is orders of magnitude higher
- Security
- Global users
- Managing millions of users
- Encryption
- Performance
47Internet Applications Architecture Future
Client Tier
XML enabled
tools
Browser
Browser
authoring tools
etc.
XML
XML
Logical Middle Tier
WEB/APP Server
XML enabled Application Messages
XML Integration
Query Server
XML Database
Warehouse Server
XML
XML
XML Transformer
XML
Gateway
Data Sources
XML enabled
Other
documents
OLE/DB
ORDBMS
on the Web
Data source
e.g. HTML,
WORD
48XML in the Database
- XML has the potential to impact four important
markets - Web integration
- Web publishing
- Application integration
- Electronic commerce
- Xml-enable the DBMS
49Xml-enabled DBMS
- Xml-enable the database system
- Store XML data/documents the database server
- Querying and searching of structured and
unstructured XML - In generate XML data from the database server
- Add XML capabilities in supporting database
facilities
DBMS
Integrate with other facilities
Generate XML
Store XML
50Store XML Data
- Enhance XML storage facilities in the database
with support in utilities - Facilities to load XML data into the database
- Provide more efficient database storage
(componentized storage, compression, indexing,) - XML export facilities from the server
51Search and Query XML Data
- Search XML data efficiently
- Special SQL queries over structured
unstructured XML - Content-based indexing (e.g. Text indexes) for
searching XML data efficiently - Support for XML query languages (e.g. XQL) on XML
data
52Generate XML
- Generate XML from the database server
- Map SQL92, SQL3 and PL/SQL datatypes to XML
- Provide mappings between java, SQL and XML types
- Script XML content from the database
- Allow SQL queries to return XML results
- Provide embedded XML in stored procedures
- Java scripting support embedded XML in java
- Common apis to access any XML content in databases
53XML and Supporting Facilities
- Provide XML capabilities in supporting database
facilities - Support XML in database utilities - loader,
export/import .. - Allow server-to-server replication of XML data
- Fine grained access to XML documents
54XML Caching
- Need to temporarily cache it, index it, update
the cached copy, transact it - Need to query XML caches
- Also requires a store for managing it in the
middle-tier - Provides XML logical views
55DBMS Architecture for Internet Applications
- Monolithic architecture
- Enhance the DBMS with all the features necessary
for supporting internet applications - Component architecture
- Provide components for supporting internet
applications - Components can reside in the DBMS or in the
middle-tier
56Monolithic Approach
- Database is the platform
- Leverage DBMS infrastructure
- Uniform management
- - Not flexible
- - Forces 2-tier architecture
- - May not be suitable for high-end
configurations - - Not suitable for heterogeneous application
integration
57Component Approach
- Flexible
- Accommodates multi-tier architecture -
components can be deployed in the middle or
database tier - Facilitates heterogeneous integration of
applications - - Need to manage multiple components
58Looking Ahead
- Database Technology has lot to offer for building
internet applications! - Componentized Databases?