Title: XQuery and Hierarchical Naming
1XQuery and Hierarchical Naming
- Zachary G. Ives
- University of Pennsylvania
- CIS 455 / 555 Internet and Web Systems
- February 7, 2008
2Today
- Reminder Homework 1 due 2/12 _at_ 1159PM
- XQuery and joins
- Addressing vs. naming
- Hierarchical names
3XQuerys Basic Form
- The model bind nodes (or node sets) to
variables operate over each legal combination of
bindings produce a set of nodes - FLWOR statement pattern
- for iterators that bind variables
- let collections
- where conditions
- order by order-conditions
- return output constructor
4Example XML Data
Root
dblp
?xml
mastersthesis
inproceedings
university
mdate
school
key
country
key
author
title
year
mdate
name
2002
key
USA
1992
author
title
crossref
year
ee
ms/Brown92
2002..
PRPL
wisc
On
1997
wisc
Kurt Brown
conf/sigm../
sigmod-97
www
Wisconsin
Paul R.
5XQuery and Joins
- for i in doc (dblp.xml)/dblp/inproceedings,
r in i/crossref/text(), c in doc
(dblp.xml)/dblp/conf, n in c/_at_name - where c r
- return i, c
6Some Uses for Join in XML
- Translation between values
- SSN ? PennID
- Joining or combining information
- Amazon invoice info UPS tracking info
- Restructuring information
-
..?
- Here, we separate authors from books, then join
them back in upside-down fashion
7Changing Nesting of XML Content
- Re-nesting XML trees is a common operation
- Simply nest the query blocks and correlate them
similar to join - for u in doc(dblp.xml)/dblp/university, n
u/name/text(), - k u/_at_key
- where u/country USA
- return
- n
- for mt in u/../mastersthesis,
inst in mt/school/text() - where mt/year/text() 1992 and
_______________ - return mt/title
-
8Collections Aggregation in XQuery
- Given a collection, we can compute an average,
count, etc. of its members -
- for paper in doc(dblp.xml)/dblp/inproceedings
- let pauth paper/author
- return paper/title
- fncount(pauth)
-
-
a collection
9Sorting in XQuery
- We can order the sequence of result tuples
output by the return clause - for x in doc(dblp.xml)/proceedings
- order by x/title/text()
- return x
10Querying Defining Tags
- Can get a nodes name by querying node-name()
- for x in document(dblp.xml)/dblp/
- return node-name(x)
- Can construct elements and attributes using
computed names - for x in document(dblp.xml)/dblp/,
- year in x/year,
- title in x/title/text(),
- element node-name(x)
- attribute year- year title
-
11XQuery Summary
- Very flexible and powerful language for XML
- Focus is on database-style operations like joins
- Performs tasks that cant be done with XPath or
XSLT and that are tedious to program in Java - Integrating information from multiple sources
- Joins, based on correspondences of values
- Computing count, average, etc.
- Today, XQuery is available
- In RDBMSs (SQL Server, Oracle, DB2) and XML DBMS
systems (MarkLogic) - As the basis of research prototypes for XQuery
full text - As the basis of XQueryP a Web Services/AJAX
programming language based on XQuery but with
programming language features - http//2006.xmlconference.org/programme/presentati
ons/38.html - We will discuss data integration and middleware
later in the course
12Hierarchical Naming Schemes
- Thus far, weve seen XPath as a hierarchical
naming scheme - Content-based naming describe the structure
and values of a tree structure - Assumption XML tree resides in (or is being
sent to) one place - But hierarchy is often used for naming and
location
13How Do We Find Things on the Internet?
- Generally, using one of three means
- Addresses or locations specify where something
is, assuming that we understand how to navigate - Just like a physical address, we may still need a
map! - In the Internet, addresses are typically IP
addresses the routers know the map - Names are mapped into addresses via lookup
services - Best-known example on the Internet DNS name
- Cell phone numbers, email addresses, etc. are
becoming names - Content-based addressing/naming
- The actual data value is somehow used to find its
location - The basis of publish-subscribe systems and
peer-to-peer architectures
14The Simplest Way of Going fromNames or Content ?
Locations
- Directory-based lookup protocols are very common
- Examples
- Napster 1.0 peer-to-peer storage with central
directory - Inverted index used to look up keywords in
information retrieval - DNS distributed hierarchical directory
- LDAP hierarchical Directory Information Tree
15Napster 1.0, ca 2002
- Hybrid of peer-to-peer storage with central
directory showing whats currently available - What are the trade-offs implicit in this model?
Why did it fail?
Peer1
jjackson-lame.mp3
Directory
Napster.com
jjackson-lame bspears-oops
Peer2
bspears-oops.mp3
Peer3
jjackson-lame.mp3
16Other Services with Similar Directory Peer
Architectures
- FolderSync now owned by Microsoft
- Google Desktop Search with multiple machines
- BitTorrent trackers are quite similar (well
discuss BitTorrent more later)
17Inverted Indices
- A forward index documents to words
- The inverted index words to word-occurrences
- The basis of most information retrieval engines,
Google, etc. - Can handle positional predicates
- But how can we reconstruct previews?
18Naming People and Devices LDAP
- Lightweight Directory Access Protocol
- Hierarchical naming system that can be
partitioned and replicated
19LDAPs Schema
- LDAP information has an XML-like schema
- A unique name in LDAP is called a Distinguished
Name, dn and consists of a sequence of
attributes representing a hierarchy, from
most-specific to least-specific (as in DNS
names) - o organization dc domain component
- ou organizational unit
- uid user ID
- cn common name
- c country st state l locality
- Can also have objectClass the type of entity
20LDAP Hierarchy
Brad Marshall LDAP Tutorial, quark.humbug.au/publi
cations/ldap_tut.html
21Querying LDAP
- LDAP queries are mostly attribute-value
predicates - uidzives oupenn c usa
- ((cnSusan Davidson)(cnZachary Ives)(cnVal
Tannen)) - objectclassposixAccount
- (!cnVal Tannen)
- How does this differ from XPath?
- How might we process these queries?
22The Backbone of Internet NamingDomain Name
Service
- A simple, hierarchical name system with a
distributed database each domain controls its
own names
com
Top LevelDomains
edu
columbia
upenn
berkeley
amazon
www
www
cis
sas
www
www
www
23Top-Level Domains (TLDs)
- Mostly controlled by Network Solutions, Inc.
today - .com commercial
- .edu educational institution
- .gov US government
- .mil US military
- .net networks and ISPs (now also a number of
other things) - .org other organizations
- 244, 2-letter country suffixes, e.g., .us, .uk,
.cz, .tv, - and a bunch of new suffixes that are not very
common, e.g., .biz, .name, .pro,
24Finding the Root
- 13 root servers store entries for all top level
domains (TLDs) - DNS servers have a hard-coded mapping to root
servers so they can get started
25Excerpt from DNS Root Server Entries
- This file is made available by InterNIC
registration services under anonymous FTP as - file /domain/named.root
-
- formerly NS.INTERNIC.NET
-
- . 3600000 IN NS A.ROOT-SERVERS.NET.
- A.ROOT-SERVERS.NET. 3600000 A 98.41.0.4
-
- formerly NS1.ISI.EDU
-
- . 3600000 NS B.ROOT-SERVERS.NET.
- B.ROOT-SERVERS.NET. 3600000 A 128.9.0.107
-
- formerly C.PSI.NET
-
- . 3600000 NS C.ROOT-SERVERS.NET.
- C.ROOT-SERVERS.NET. 3600000 A 192.33.4.12
(13 servers in total, A through M)
26Supposing We Were to Build DNS
- How would we start? How is a lookup performed?
- (Hint what do you need to specify when you add
a client to a network that doesnt do DHCP?)
27Issues in DNS
- We know that everyone wants to be my-domain.com
- How does this mesh with the assumptions inherent
in our hierarchical naming system? - What happens if things move frequently?
- What happens if we want to provide different
behavior to different requestors (e.g., Akamai)?
28Next Time
- Well look at alternative mechanisms for finding
things - Publish-subscribe models
- Gossip protocols, such as in routers
- Flooding
- and soon, peer-to-peer or content-based routing