Web Site Management Based on Declarative Specifications - PowerPoint PPT Presentation

About This Presentation

Title:

Web Site Management Based on Declarative Specifications

Description:

Priority: Headline. Category: USA News. Images: im1.gif, im.gif ... CNN Web-site Query (part #2) CREATE RootPage {WHERE a - 'Priority' - 'headline' ... – PowerPoint PPT presentation

Number of Views:83

Avg rating:3.0/5.0

Slides: 33

Provided by: alona8

Learn more at: https://dsf.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Web Site Management Based on Declarative Specifications

1
Web Site Management Based on Declarative
Specifications

Alon Levy
University of Washington
Joint work with
Strudel Dana Florescu (INRIA), Mary Fernandez,
Dan Suciu (ATT), Khaled Yagoub (INRIA)
Tiramisu Corin Anderson and Dan Weld (UW)

2
Problem Building Web sites

Building Web sites involves three tasks
Selecting and managing the sites content
Organizing the sites structure (pages and
links)
Designing the graphical presentation of pages.
In current tools, these tasks are (mostly)
interdependent.
Strudels key ideas
Separate the three tasks.
Manage content and structure declaratively.

3
Content Management and Graphical Presentation

Content may be derived from multiple sources
Databases relational, object-oriented
Semi-structured sources (XML, Word, Excel,
bibtex).
Classical data integration problem!
(see Tsimmis, Garlic, Information Manifold,
Tukwila)
Graphical presentation
Need to integrate with tools that create
animations, images, Java applets.
Create sets of similar HTML pages using
templates.

4
Web-Site Structure

The structure includes
Set of pages and contents of each page, and
Links between the pages.

5
Current practice

Current tools separate only content management
from presentation
Content managed by database
Embed queries in HTML templates
Simple tools to view and modify structure at the
extensional
level.
WYSIWYG tools for managing presentation.
But they still cannot
explicitly manage site's global structure, or
flexibly choose content-management system
As a result its hard to
modify the structure of a web-site, build
multiple versions for
different classes of users, enforce integrity
constraints.

6
Talk Outline

Problem definition
Strudel architecture
Advantages of declarative specifications
Specifying and verifying integrity constraints.
Automatic generation of run-time plans for
managing data-intensive web sites.
Tiramisu
Separating the design tool from the
implementation.
Using a collection of tools to build a site.

7
Strudel Evolution
Strudel (Nov. 96)ATT
Strudel ATT Release
Strudel-R (INRIA)
http//www.research.att.com/sw/tools/strudel
Tiramisu (Sept. 98) (U. Washington)
8

Strudel Architecture and System
9
Strudel

Features
Integrates content from multiple sources.
High-level declarative language for managing
sites structure (StruQL).
Advantages
Derives multiple sites from the same data.
Supports easy restructuring and modification.
Provides platform for
Enforcing integrity constraints
Designing policies for efficient run-time
management of sites.

10
Strudel Architecture
11
Data Model

Strudel is based on a semi-structured data model
labeled directed graphs.
nodes in the graph represent objects,
labels on arcs represent attribute names,
named collections.
Why semi-structured data?
raw data is often semi-structured (and I dont
mean that its
embedded in HTML)
convenient for data integration (a la TSIMMIS)
web-sites are ultimately graphs.

12
The StruQL Query Language

A StruQL query is a function from a set of input
graphs to an
output graph.
A StruQL expression contains two parts
A query component, and
A restructuring component.
Formally
INPUT graph names
WHERE conjunction of regular path expression
atoms
CREATE name the nodes in the output graph using
Skolem functions
LINK specify the links in the resulting
graph.
StruQL evolved into XML-QL, (see WWW8 Conference)

13
Example Raw Data

Article 1
Date 8/1/97
Title Clinton announces new
Priority Headline
Category USA News
Images im1.gif, im.gif
Text President Clinton announced
Related article article2

Article 2
Date 8/2/97
Title FDA approves new cure for
Priority Top Story
Category Health
Video vid1.avi
Text The Federal Drug Administration

14
CNN Web-site Query (part 1)
Input graph of articles INPUT CNN-ARTICLES Create
web page for each article WHERE Articles(a),
note arc variable
l art - l - t, l in "Title",
"Abstract", "Date", "Text",
"Image", "Topimage", "RelatedSite", a -
"Category" - c CREATE ArticlePage(a) LINK
ArticlePage(a) - l - t WHERE a -
"RelatedArticle" - r LINK ArticlePage(a) -
"RelatedArticle" - ArticlePage(r)
15
CNN Site Schema
RootPage()
a- priority- headline
a- category-c
CategoryEntry(c)
RootPageEntry(a)
Data(t)- a - l -t l in title, top-image
CategoryPage(c)
a -category-c
ArticlePage(a)
Data(t) a - l - t, l in "Title",
"Abstract",
16
CNN Web-site Query (part 2)
CREATE RootPage WHERE a - "Priority" -
"headline", l in "Title",
"Date", "Topimage" CREATE RootEntry(a)
LINK RootPage - "HeadlineStory" -
RootEntry(a), Link each headline story to its
title, date, top image and full article
RootEntry(a) - "FullStory" - ArticlePage(a),
RootEntry(a) - l - t
17
HTML Templates
EMBED , EMBED related-article ORDERdescend KEYdate _at_a LINKtitle

18
CNN Sports Query
INPUT CNN WHERE TopCategory(c), c -
"CategoryName" - cn,
cn"Sports", c - "SubTopic" - top,
Articles(a), a - l - t, l in
"Title", "Abstract", "Date", "Text", "Image",
"Topimage", "RelatedSite", a
- "Category" - c, ctop CREATE
ArticlePage(a) LINK ArticlePage(a) - l - t
19
StruQL Details

Regular path expressions are constructed by a
grammar
R _
Atoms in the WHERE clause are of the form X - R
- Y or C(X)
The LINK clause includes atoms of the form
LINK f(X) -- new link -- g(X) or
LINK f(X) -- L -- g(X)
Queries can be nested, inheriting the WHERE
clauses of
their outer blocks.
Note separation between querying part and
restructuring part!

20
More on StruQL

Bare bones language for semi-structured data
includes the essential features.
More expressive than Lorel or UnQL (e.g., can
reverse graphs)
Conceptually and in practice separation between
query component and restructuring component is
important.
Containment is decidable for StruQL-WHERE
(Florescu, Levy Suciu, PODS-98)

21

Advantages of Declarative Specifications
22
Enforcing Integrity Constraints

We often want to verify some constraints on site
structure
all articles from the last two days are reachable
from the root
all paths to confidential data must go through an
authentication node
Good site design principles are summarized as
integrity constraints Lohse, CACM, 98.
When site specs are long, constraints are hard to
enforce.
Want to verify constraints intentionally.

23
Intentional IC Verification

Formally, we want to check whether
S(D) IC
S is the site specification (e.g., StruQL Query)
IC is a formula describing the constraint
? a, Article(a) date(a) today-2
Root - -
ArticlePage(a).
for any instance D of the underlying data.
Results
Sound and complete algorithms for verification of
a class of integrity constraints (path
constraints).
Algorithms will also propose corrections when
ICs are violated.

24
Run-time Management of Sites

When do we compute web pages?
Static approach completely precompute site
Doesnt work for large sites, forms, hard to
update.
Dynamic approach compute pages on request
Users may wait, a lot of repeated computation,
structure of the site is not exploited.
Current tools use one of the extremes, or specify
policy per collection of pages.
The specification is implicit in code.
Our goal use site specification to automatically
find optimal strategy.

25
Possible Run-time Optimizations

View materialization
Function caching
when web sites represent hierarchically
structured data, successive queries in the site
differ only in their projected attributes.
Simplification under preconditions
previous queries on the path may have already
verified some conditions for current query.
Lookahead computation
often it is possible with little cost to compute
the data necessary for subsequent pages.

26
Problem Statement

Given
site specification
knowledge about browsing patterns
cost function
Produce
Operational plan
operational schema a set of queries to
compute on a given page request.
Results (in Strudel-R) framework
Performance study of the optimizations.
Algorithm for generating operational plans.
Identification of many open problems.

27

Strudel Experience -- Tiramisu
28
Experiences with Strudel(except for the lousy
GUI)

Integrating data from multiple sources when
building a Web site
is a prime concern. Sources are
semi-structured!
Declarative specification of site structure is
very important
because
site creation is a highly iterative process
site owners often need redesign after
experience from
deployment
we often generate multiple versions of sites
from the
same data.
Design of web-sites is done in a top-down
fashion.
Strudel cant be the all encompassing web-site
management tool.

29
Tiramisu the Second Generation

Strudel and its siblings (Araneus, YAT, WebOQL,
WIRM) force the design and implementation of the
site to be done in the same tool.
Furthermore, there will always be tools that are
specialized for specific tasks.
Tiramisu
Separate design phase from implementation.
Allow the implementation to be done by a set of
cooperating tools.

30
Tiramisu Architecture
mediator
data source
E/R style diagram of site (site schema)
data source
web site
Implementation manager
data source
wrapper
wrapper
wrapper
Tool (ASP)
Tool (FrontPage)
Tool (Strudel)
31
Screenshot of a TERD
32
Conclusions

Web-site management is an important area for
Database research.
First-generation systems (Strudel, Araneus, YAT,
WebOQL) offer important advantages
Easy modification, creation of multiple versions
enforcing constraints, run-time management
Second generation (Tiramisu)
Emphasize design phase of site
Implement with a collection of cooperating tools.

Write a Comment

User Comments (0)