Web Site Management Based on Declarative Specifications - PowerPoint PPT Presentation

About This Presentation
Title:

Web Site Management Based on Declarative Specifications

Description:

Priority: Headline. Category: USA News. Images: im1.gif, im.gif ... CNN Web-site Query (part #2) CREATE RootPage {WHERE a - 'Priority' - 'headline' ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 33
Provided by: alona8
Learn more at: https://dsf.berkeley.edu
Category:

less

Transcript and Presenter's Notes

Title: Web Site Management Based on Declarative Specifications


1
Web Site Management Based on Declarative
Specifications
  • Alon Levy
  • University of Washington
  • Joint work with
  • Strudel Dana Florescu (INRIA), Mary Fernandez,
    Dan Suciu (ATT), Khaled Yagoub (INRIA)
  • Tiramisu Corin Anderson and Dan Weld (UW)

2
Problem Building Web sites
  • Building Web sites involves three tasks
  • Selecting and managing the sites content
  • Organizing the sites structure (pages and
    links)
  • Designing the graphical presentation of pages.
  • In current tools, these tasks are (mostly)
    interdependent.
  • Strudels key ideas
  • Separate the three tasks.
  • Manage content and structure declaratively.

3
Content Management and Graphical Presentation
  • Content may be derived from multiple sources
  • Databases relational, object-oriented
  • Semi-structured sources (XML, Word, Excel,
    bibtex).
  • Classical data integration problem!
  • (see Tsimmis, Garlic, Information Manifold,
    Tukwila)
  • Graphical presentation
  • Need to integrate with tools that create
    animations, images, Java applets.
  • Create sets of similar HTML pages using
    templates.

4
Web-Site Structure
  • The structure includes
  • Set of pages and contents of each page, and
  • Links between the pages.

5
Current practice
  • Current tools separate only content management
    from presentation
  • Content managed by database
  • Embed queries in HTML templates
  • Simple tools to view and modify structure at the
    extensional
  • level.
  • WYSIWYG tools for managing presentation.
  • But they still cannot
  • explicitly manage site's global structure, or
  • flexibly choose content-management system
  • As a result its hard to
  • modify the structure of a web-site, build
    multiple versions for
  • different classes of users, enforce integrity
    constraints.

6
Talk Outline
  • Problem definition
  • Strudel architecture
  • Advantages of declarative specifications
  • Specifying and verifying integrity constraints.
  • Automatic generation of run-time plans for
    managing data-intensive web sites.
  • Tiramisu
  • Separating the design tool from the
    implementation.
  • Using a collection of tools to build a site.

7
Strudel Evolution
Strudel (Nov. 96)ATT
Strudel ATT Release
Strudel-R (INRIA)
http//www.research.att.com/sw/tools/strudel
Tiramisu (Sept. 98) (U. Washington)
8

Strudel Architecture and System
9
Strudel
  • Features
  • Integrates content from multiple sources.
  • High-level declarative language for managing
    sites structure (StruQL).
  • Advantages
  • Derives multiple sites from the same data.
  • Supports easy restructuring and modification.
  • Provides platform for
  • Enforcing integrity constraints
  • Designing policies for efficient run-time
    management of sites.

10
Strudel Architecture
11
Data Model
  • Strudel is based on a semi-structured data model
  • labeled directed graphs.
  • nodes in the graph represent objects,
  • labels on arcs represent attribute names,
  • named collections.
  • Why semi-structured data?
  • raw data is often semi-structured (and I dont
    mean that its
  • embedded in HTML)
  • convenient for data integration (a la TSIMMIS)
  • web-sites are ultimately graphs.

12
The StruQL Query Language
  • A StruQL query is a function from a set of input
    graphs to an
  • output graph.
  • A StruQL expression contains two parts
  • A query component, and
  • A restructuring component.
  • Formally
  • INPUT graph names
  • WHERE conjunction of regular path expression
    atoms
  • CREATE name the nodes in the output graph using
    Skolem functions
  • LINK specify the links in the resulting
    graph.
  • StruQL evolved into XML-QL, (see WWW8 Conference)

13
Example Raw Data
  • Article 1
  • Date 8/1/97
  • Title Clinton announces new
  • Priority Headline
  • Category USA News
  • Images im1.gif, im.gif
  • Text President Clinton announced
  • Related article article2
  • Article 2
  • Date 8/2/97
  • Title FDA approves new cure for
  • Priority Top Story
  • Category Health
  • Video vid1.avi
  • Text The Federal Drug Administration

14
CNN Web-site Query (part 1)
Input graph of articles INPUT CNN-ARTICLES Create
web page for each article WHERE Articles(a),
note arc variable
l art - l - t, l in "Title",
"Abstract", "Date", "Text",
"Image", "Topimage", "RelatedSite", a -
"Category" - c CREATE ArticlePage(a) LINK
ArticlePage(a) - l - t WHERE a -
"RelatedArticle" - r LINK ArticlePage(a) -
"RelatedArticle" - ArticlePage(r)
15
CNN Site Schema
RootPage()
a- priority- headline
a- category-c
CategoryEntry(c)
RootPageEntry(a)
Data(t)- a - l -t l in title, top-image
CategoryPage(c)
a -category-c
ArticlePage(a)
Data(t) a - l - t, l in "Title",
"Abstract",
16
CNN Web-site Query (part 2)
CREATE RootPage WHERE a - "Priority" -
"headline", l in "Title",
"Date", "Topimage" CREATE RootEntry(a)
LINK RootPage - "HeadlineStory" -
RootEntry(a), Link each headline story to its
title, date, top image and full article
RootEntry(a) - "FullStory" - ArticlePage(a),
RootEntry(a) - l - t
17
HTML Templates
EMBED , EMBED related-article ORDERdescend KEYdate _at_a LINKtitle

18
CNN Sports Query
INPUT CNN WHERE TopCategory(c), c -
"CategoryName" - cn,
cn"Sports", c - "SubTopic" - top,
Articles(a), a - l - t, l in
"Title", "Abstract", "Date", "Text", "Image",
"Topimage", "RelatedSite", a
- "Category" - c, ctop CREATE
ArticlePage(a) LINK ArticlePage(a) - l - t
19
StruQL Details
  • Regular path expressions are constructed by a
    grammar
  • R _
  • Atoms in the WHERE clause are of the form X - R
    - Y or C(X)
  • The LINK clause includes atoms of the form
  • LINK f(X) -- new link -- g(X) or
  • LINK f(X) -- L -- g(X)
  • Queries can be nested, inheriting the WHERE
    clauses of
  • their outer blocks.
  • Note separation between querying part and
    restructuring part!

20
More on StruQL
  • Bare bones language for semi-structured data
    includes the essential features.
  • More expressive than Lorel or UnQL (e.g., can
    reverse graphs)
  • Conceptually and in practice separation between
    query component and restructuring component is
    important.
  • Containment is decidable for StruQL-WHERE
    (Florescu, Levy Suciu, PODS-98)

21

Advantages of Declarative Specifications
22
Enforcing Integrity Constraints
  • We often want to verify some constraints on site
    structure
  • all articles from the last two days are reachable
    from the root
  • all paths to confidential data must go through an
    authentication node
  • Good site design principles are summarized as
    integrity constraints Lohse, CACM, 98.
  • When site specs are long, constraints are hard to
    enforce.
  • Want to verify constraints intentionally.

23
Intentional IC Verification
  • Formally, we want to check whether
  • S(D) IC
  • S is the site specification (e.g., StruQL Query)
  • IC is a formula describing the constraint
  • ? a, Article(a) date(a) today-2
  • Root - -
    ArticlePage(a).
  • for any instance D of the underlying data.
  • Results
  • Sound and complete algorithms for verification of
    a class of integrity constraints (path
    constraints).
  • Algorithms will also propose corrections when
    ICs are violated.

24
Run-time Management of Sites
  • When do we compute web pages?
  • Static approach completely precompute site
  • Doesnt work for large sites, forms, hard to
    update.
  • Dynamic approach compute pages on request
  • Users may wait, a lot of repeated computation,
    structure of the site is not exploited.
  • Current tools use one of the extremes, or specify
    policy per collection of pages.
  • The specification is implicit in code.
  • Our goal use site specification to automatically
    find optimal strategy.

25
Possible Run-time Optimizations
  • View materialization
  • Function caching
  • when web sites represent hierarchically
    structured data, successive queries in the site
    differ only in their projected attributes.
  • Simplification under preconditions
  • previous queries on the path may have already
    verified some conditions for current query.
  • Lookahead computation
  • often it is possible with little cost to compute
    the data necessary for subsequent pages.

26
Problem Statement
  • Given
  • site specification
  • knowledge about browsing patterns
  • cost function
  • Produce
  • Operational plan
  • operational schema a set of queries to
    compute on a given page request.
  • Results (in Strudel-R) framework
  • Performance study of the optimizations.
  • Algorithm for generating operational plans.
  • Identification of many open problems.

27

Strudel Experience -- Tiramisu
28
Experiences with Strudel(except for the lousy
GUI)
  • Integrating data from multiple sources when
    building a Web site
  • is a prime concern. Sources are
    semi-structured!
  • Declarative specification of site structure is
    very important
  • because
  • site creation is a highly iterative process
  • site owners often need redesign after
    experience from
  • deployment
  • we often generate multiple versions of sites
    from the
  • same data.
  • Design of web-sites is done in a top-down
    fashion.
  • Strudel cant be the all encompassing web-site
    management tool.

29
Tiramisu the Second Generation
  • Strudel and its siblings (Araneus, YAT, WebOQL,
    WIRM) force the design and implementation of the
    site to be done in the same tool.
  • Furthermore, there will always be tools that are
    specialized for specific tasks.
  • Tiramisu
  • Separate design phase from implementation.
  • Allow the implementation to be done by a set of
    cooperating tools.

30
Tiramisu Architecture
mediator
data source
E/R style diagram of site (site schema)
data source
web site
Implementation manager
data source
wrapper
wrapper
wrapper
Tool (ASP)
Tool (FrontPage)
Tool (Strudel)
31
Screenshot of a TERD
32
Conclusions
  • Web-site management is an important area for
    Database research.
  • First-generation systems (Strudel, Araneus, YAT,
    WebOQL) offer important advantages
  • Easy modification, creation of multiple versions
  • enforcing constraints, run-time management
  • Second generation (Tiramisu)
  • Emphasize design phase of site
  • Implement with a collection of cooperating tools.
Write a Comment
User Comments (0)
About PowerShow.com