Web Programming: A Short History - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Web Programming: A Short History

Description:

Client-side fun: Javascript, DHTML, DOM. Workload generation ... More ambitious: target page rewriting (Babelfish used to do this) ... – PowerPoint PPT presentation

Number of Views:2488
Avg rating:3.0/5.0
Slides: 30
Provided by: arman3
Category:

less

Transcript and Presenter's Notes

Title: Web Programming: A Short History


1
Web ProgrammingA Short History
  • Armando Fox
  • CS294-1 Fall 06

2
Travel photo
3
Web Programming 101 Outline
  • Basics RPC, client-server, HTTP, Apache
  • Web sites are really programs CGI FastCGI
  • Server-side storage, cookies
  • Programming stacks LAMP, J2EE, Rails
  • User tracking
  • Client-side fun Javascript, DHTML, DOM
  • Workload generation
  • Finding bottlenecks, Lab 1 discussion

4
The Web is largely RPC using HTTP
  • Review of Remote Procedure Call (RFC707, 1976)
  • Problems RPC had to overcome
  • Engineering Argument marshaling, argument
    result types
  • Fundamental calling semantics (at-least-once,
    at-most-once, exactly-once)
  • Fundamental (for all distributed systems)
    failure semantics
  • How does HTTP fix these?

browser
5
A Conversation With a Web Server
  • Open TCP connection to server on port 80
    (default)
  • Browser uses TCP to send the following chunk
    ostuff
  • GET /index.html HTTP/1.0
  • User-Agent Mozilla/4.73 en (X11 U Linux
    2.0.35 i686)
  • Host www.yahoo.com
  • Accept image/gif, image/x-xbitmap, image/jpeg,
    image/pjpeg, image/png, /
  • Accept-Encoding gzip
  • Accept-Language en
  • Accept-Charset iso-8859-1,,utf-8
  • Cookie B2vsconq5p0h2n

6
A Conversation With a Web Server
  • Server replies
  • HTTP/1.0 200 OK
  • Content-Length 16018
  • Content-Type text/html
  • Yahoo!hrefhttp//www.yahoo.com/
  • etc.
  • If there are embedded images, such as
  • m/us.yimg.com/a/an/anchor/icons2.gif"
  • then repeat the whole process with this new URL.

7
HTTP, a simple chatty protocol
  • ASCII-based commands over TCP/IP
  • a bad fit for TCP/IPwhy?
  • Fundamentally request-reply (like RPC),
    client-initiated
  • precludes true server push
  • Stateless every request completely independent
  • No intrinsic way to create associations between
    distinct requests
  • No provisions for maintaining state that persists
    across requests
  • Early addition to HTTP cookies
  • Extra header added by server to HTTP response
  • Cookie is typically opaque to client
  • Client should hand cookie back on subsequent
    requests to same server
  • Client isnt obligated to honor (usually a user
    pref) but in practice most sites are now useless
    without it

8
Meet Apache (a patchy web server)
  • Naive server implementation (NCSA httpd) listen,
    accept, forkexec, repeat ad infinitum - breaks
    down quickly in engineering
  • Open-source Apache evolved c.1996 from patches to
    original httpd, now most popular on Web (70 in
    2006)
  • Replace fork() with select() thread management
  • Many processes, many threads, sophisticated
    memory management
  • Can be configured as a proxy or cache too (later)
  • Many, many functionalities and configuration
    options
  • Apache modules compiled-in glue to other
    components without process fork and context
    switch
  • Eg, relational databases and interpreted
    languages like perl

9
Web sites as programs CGI
  • Idea run a program and send its output back to
    browser
  • Need to name the program, pass input parameters,
    execute program, capture output, deal with errors
    (i.e. everything RPC does)
  • First cut Common Gateway Interface
  • Allowable programs to execute are specified in a
    server config file that maps URLs to
    subdirectories
  • Parameters embedded in URLs or forms (later,
    cookies)
  • http//www.foo.com/search?termwhite20rabbitshow
    10page1
  • fork()exec() used to execute program (later, for
    performance, embed certain types of code in
    server process fastCGI)
  • program must generate correct HTTP headers, etc.
  • join stderr to stdout to capture errors
  • Remote program may circumvent HTTP limitations
  • embed tokens or other hidden parameter to
    associate requests from same user
  • store data persistently on server, eg in
    filesystem, Berkeley DB, etc.

10
CGI FastCGI
  • How many concurrent clients?
  • FCGI dispatchers
  • CGI limited by ability to fork()
  • open file descriptors (sockets)
  • Noteworthy
  • Logically, dispatching mechanism is orthogonal to
    app code
  • In practice, most middleware/app servers
    hardwire the choice
  • Rails works with any
  • Problems in moving cgi programs to fastcgi
    environment?
  • Primitive steps toward separating I/O resource
    management from app code

Filesystem or database
your app
forkexec
Filesystem or database
TCP or Unix domain sockets
11
Server-side storage
  • How to persist data across consecutive HTTP
    requests from same user?
  • Embed data in URL, cookie, or as hidden variables
    in forms
  • Better Store data on server side, embed just the
    handle ...why?
  • Message truth is on the server program
    defensively
  • Special case of data Session identifier that
    ties together requests from same user session
    (eg, generated on user login)
  • Why better to store data on server?
  • Untrustworthy client, unreliable client, size of
    data...
  • Handle can be authenticated/cryptographically
    signed, have an expiration time, etc. (client
    cant forge data if the only access to it is via
    authenticated handle)
  • General pattern servertruth clienthint

12
Programming Stacks application servers
  • Observation many programmers were reinventing
    common machinery for dynamic Web sites
  • Naming routing between URLs and programs
  • Connections between programs and storage (eg
    database)
  • Presentation of output (wrapping in HTTP
    HTML)
  • Managing concept of user sessions
  • Marshalling unmarshalling stuff from cookies,
    forms, etc.
  • Programming environments capture commonalities
  • App writer creates business logic
  • Mechanics of above largely handled by programming
    stack
  • Improves business logic portability by
    virtualizing resources such as DB
  • Pick your favorite language, storage solution,
    etc.

13
LAMP (Linux, Apache, PHP, MySQL)
  • MySQL excellent open-source RDBMS
  • PHP Perl-like language that can be embedded into
    HTML pages
  • .php pages are passed through PHP interpreter
    before sent back to browser
  • Provides a lot of common machinery
  • Virtualizes connection to MySQL
  • Extract params from URLs, forms, etc
  • Lots of libraries available via PEAR
  • Mixing of code and HTML can get messy as app gets
    complex
  • How many interpreter instances?

You
have already voted. Go away. setcookie(test, rated, time()86400) ? T COLORblueYou haven't voted before so I
recorded your vote
admin, adpass) mysql_select_db(db,
mysql_link) result mysql_query("SELECT
impressions from tds_counter where
COUNT_ID'cid'", mysql_link) nmysql_num_rows(
result)) for (i0 i mysql_fetch_row(result) .... ?
14
Noteworthy about PHP
  • Virtualizes DB API, but not stored objects
  • Programmer still writes raw SQL queries to
    access objects mapping of database fields to
    program objects not intrinsic
  • In contrast, cookies and sessions are first-class
    abstractions
  • Proprietary alternativesMicrosoft ASP.net,
    ColdFusion

.php page
.php page
.php page
Filesystem or database
PHP interpreter
15
Example Java 2 Enterprise Edition (a/k/a EJB)
  • Application collection of Java components
    called beans
  • Different bean types encapsulate business logic
    (functions) or access to objects stored in
    underlying DB
  • At runtime, beans are deployed (instantiated)
    into containers
  • J2EE server manages deployment, memory, Java
    thread allocation, database, etc.
  • Java servlets or Java server pages activate
    appropriate bean(s)
  • Open but incomplete spec open-source (eg Jboss)
    and commercial (eg Weblogic, Websphere) J2EE
    servers compete on features and engineering

16
J2EE continued
  • J2EE began life as Txn Monitors (eg Tuxedo)
  • Before 100s of DB clients, each a client-side
    app with long-lived connection
  • Now 100Ks of clients, each a Web browser?
    Doh!!
  • Transaction monitors a/k/a txn-oriented
    middleware multiplex a small pool of
    long-lived DB connections across Web servers
  • There are some gnarly scheduling issues here
  • J2EE TMmore common functionsthe blessing of
    Java
  • Noteworthy management features
  • can distinguish (e.g.) singleton beans from
    replicatable ones
  • session object beans can be marked stateful
    (manages a database row) or stateless
  • Accepted practice generally eschews stateful
    SBs...why?
  • Lets app developer talk about state management

17
J2EE vs. LAMP
  • J2EE much more heavyweight
  • Harder learning curve, HelloWorld involves
    setting up JSPs, mapping to EJBs, declaration
    of EJB types, mapping of EJB accesses to database
    tables, ...
  • Very challenging to engineer efficient J2EE
    serverthe tail wagging the dog (typ. 90 of
    total programming stack)
  • Memory management is a bear long-lived objects,
    references all over the place, no good time to do
    garbage collection, leaks a fact of life
  • Richness of Java language makes it harder
  • Arguably promotes good practices
  • Cleaner separation of logic (EJBs) from
    presentation (JSPs, servlet pages)
  • Java platform independence (but some appservers
    APIs nonstandard)
  • modularity, class-based OOP, mountains of files
    describing configurations of EJBs, etc.
  • 800-pound gorilla for enterprise, less common in
    e-commerce

18
The new kid Ruby on Rails
  • Rails instantiates object-relational model over
    DB
  • concise code in terms of ORM, not SQL queries
  • Strongest separation yet of model, view,
    controller
  • Easy to write inefficient code (issues multiple
    queries)
  • High level abstraction makes data relationships
    obvious
  • Ruby reflection facilitates convention over
    configuration

MODEL class Order ... belongs_to customer class
Customer ... has_many orders CONTROLLER def
findOrder(dt) _at_ords Order.find(conditions
(shipDate ?, Time.now)) end VIEW
_at_ords.each do o Name

Date

admin, adpass) mysql_select_db(db,
mysql_link) result mysql_query("SELECT
c.name, o.shipDate from Customers c, Orders o
WHERE c.name name AND o.customer_idc.id AND
o.shipDate NOW, mysql_link) if
(nmysql_num_rows(result)) for (i0
i
echo Name . row0 . ship date .
row1 ?
19
User-tracking techniques
  • Apache log scraping
  • Cant identify correlated requests, but can
    guess
  • Time granularity typically seconds, depends on
    httpd.conf
  • adsl-70-132-27-24.dsl.snfc21.sbcglobal.net - -
    04/Sep/2006214427 0000 "GET
    /includes/nav/images/link_press-over.gif
    HTTP/1.1" 200 186
  • adsl-70-132-27-24.dsl.snfc21.sbcglobal.net - -
    04/Sep/2006214427 0000 "GET
    /includes/nav/images/link_directions-over.gif
    HTTP/1.1" 200 161
  • adsl-70-132-27-24.dsl.snfc21.sbcglobal.net - -
    04/Sep/2006214427 0000 "GET
    /includes/nav/images/link_photo-over.gif
    HTTP/1.1" 200 174
  • HTTP redirect, the oldest trick in the book
  • Original page increments a counter, redirects to
    real page (possibly on different site)
  • Fat URLs a cheaper way to do redirection
    (Google does this)
  • More ambitious target page rewriting (Babelfish
    used to do this)
  • Cross-site cookies (eg Doubleclick)

20
Client Side Fun DOM, Javascript (or Enough Rope
To Hang Yourself)
  • Document Object Model most browsers since 1998
  • treats browser environment delivered page as
    hierarchical collection of objects (page,
    embedded images, paragraphs, tables, forms, ...)
  • Namespace of these objects is available to
    JavaScript
  • eg if (navigator.platformMacPPC) // do
    MacOS-specific stuff
  • Javascript interpreter embedded in most browsers
    c.1998
  • tags mark scripts embedded in HTML pages
    or fetched from server (like embedded images)
  • Events corresponding to document manipulation (eg
    onLoad) and UI actions (onClick, onMouseOver) are
    dispatched to JS handlers
  • Key modifying DOM elements changes the page
    content!
  • document.write(I appear in the page but not
    between HTML tags)
  • document.images0.locationhttp//.../foo.gif
    // load new image
  • Great delivery mechanism for malware, viruses,
    etc.
  • Tail wags dog MSDN homepage has 72 embedded
    javascripts

21
Other Client-side Fun
  • ActiveX controls (Microsoft IE only)
  • Native x86 code loaded into browsers address
    space
  • Can run with privileges of browser itself
  • Contrast Javascript is restricted in what
    operations are allowed
  • Browser plug-ins/helpers/etc.
  • Most browsers support all mutually incompatible

22
AJAX Asynchronous Javascript XML
  • Idea use Javascript to perform HTTP requests in
    the background (popularized by Google Maps)
  • Already done for things like loading animated
    images
  • XMLHTTPRequest (appeared in browsers c.2000)
    allows async HTTP request with callback browser
    plugin or native impl.
  • Eg, use for prefetching or for asynchronously
    filling in page content after page frame has
    loaded
  • Not new functionality, but a set of practices and
    libraries
  • Implications
  • Complete separation of req/resp from rendering
    means browser-UI-based apps can be structured
    like desktop apps
  • Breaks workload modeling that assumes think
    timeidle
  • DANGER collecting persistent input at client
    before committing to server

23
Workload Generation Testing
  • Stress testing, functional testing, benchmarking
  • Stress Subject system to high offered load, what
    happens?
  • Functional Does app code work and handle cases
    correctly?
  • Rails provides particularly good support for this
  • Benchmark How fast is my app/platform compared
    to alternate implementations?
  • Coverage have all code paths in app been tested?
  • What to look for in a stress generator
  • Open loop how many requests/second can be
    generated?
  • Closed loop How many concurrent users can be
    simulated?
  • How accurately? (eg transition probabilities,
    AJAX requests)
  • Ability to simulate hotspots?
  • Read-only, or read-write workload?

24
Balancing the 3-tier Pipeline
  • Horizontal scaling concerns at each tier
  • Latency measurement as seen by each tier

25
Canonical Graphs Parameters
  • Latency vs. offered load
  • Closed or open loop users with think times
  • Any system will eventually fall off cliff to
    high response times. The only question is what
    bottleneck was hit first.
  • Throughput is another metric that is less
    user-centric but important to admins
  • Bottlenecks
  • I/O in out of server
  • Virtual machine traps
  • Database performance
  • CPUexecuting mundane Ruby code (read the Ruby
    performance article for interesting notes on
    this)
  • RAMused by MySQL for caching table rows
    queries, by Rails for caching fragments pages,
    and to hold footprints of Ruby interpreters

26
Finding CPU-related bottlenecks
  • Scan logs to map queries to operations. Where are
    hotspots?
  • latency breakdown by part of system. Where are
    bottlenecks? Can you do more detailed profiling?
  • trace will take this to new heights
  • relationship of latency to table sizes
  • Are these operations fundamentally slow or just
    badly written?
  • Are there quadratic relationships that could be
    linear?
  • whats in the session object? (It has to be
    serialized/deserialized)
  • What about trading this for additional query by
    storing object ID only?

27
Key messages
  • Web RPC with same liabilities advantages.
  • Explicit state management has bred
    recovery-friendly app structure because you
    always know where the state is, and what kind of
    state it is.
  • Persistent/relational - business data
  • Persistent/single-key - user ID/profile
  • Session-lived - session state, shopping carts,
    etc
  • Transient (less than a session)
  • Dealing with replication is still a challenge,
    but at least now you know what could be
    replicated
  • Corollary the truth is on the server anything
    on the client is a hint. You generally can make
    guarantees about durability and integrity on the
    server side that you cant make on the client
    side.
  • Mashups and AJAX are rapidly obsoleting previous
    assumptions about user behavior, offered load,
    etc.

28
Observations
  • For the masses, shortest learning curve wins
  • PHPMySQL beat out EJB
  • AJAX/XMLHttpRequest beat out WSDL/SOAP
  • Didnt help that SOAP over HTTP is 10x slower
    than RMI/CORBA and not clear if anyone
    understands WSDL
  • Google Web Toolkit has made it even easier by
    providing Java-to-Javascript/DOM/Ajax compiler
  • Ruby on Rails may beat out PHP?

29
Lab 1 discussion
  • VM vs. raw hardware overhead
  • Using the logs to find hotspots
  • Caching
Write a Comment
User Comments (0)
About PowerShow.com