Title: OCL4 Oracle 10g: SQL
1OCL4 Oracle 10gSQL PL/SQLSession 8
- Matthew P. Johnson
- CISDD, CUNY
- January, 2005
2Agenda
- Security web apps
- RegEx support in 10g
- Oracle XML
- Data warehousing
- More on the PL/SQL labs
- Any more lab?
3Review Why security is hard
- Its a negative deliverable
- Its an asymmetric threat
- Tolstoy Happy families are all alike every
unhappy family is unhappy in its own way. - Analogs homeland, jails, debugging,
proof-reading, Popperian science, fishing, MC
algs - So fix biggest problems first
4DB users have privileges
- SELECT read access to all columns
- INSERT(col-name) can insert rows with
non-default values in this column - INSERT can insert rows with non-default values
in all columns - DELETE
- REFERENCES(col-name) can define foreign keys
that refer to (or other constraints that mention)
this column - TRIGGER triggers can reference table
- EXECUTE can run function/SP
5Granting privileges (Oracle)
- One method of setting access levels
- Creator of object automatically gets all
privileges to it - Possible objects tables, whole databases, stored
functions/procedures, etc. - ltDB-namegt. - all tables in DB
- A privileged user can grant privileges to other
users or groups
GRANT privileges ON object TO user ltWITH GRANT
OPTIONgt
GRANT SELECT ON mytable TO someone WITH GRANT
OPTION
6Granting and revoking
- Privileged user has privileges
- Privileged-WGO user can grant them, w/wo GO
- Granter can revoke privileges or GO
- Revocation cascades by default
- To prevent, use RESTRICT (at end of cmd)
- If would cascade, command fails
- Can change owner
ALTER TABLE my-tbl OWNER TO new-owner
7Granting and revoking
- What we giveth, we may taketh away
- mjohnson (effects?)
- george (effects?)
- mjohnson (effects?)
GRANT SELECT, INSERT ON my-table TO george WITH
GRANT OPTION
GRANT SELECT ON my-table TO laura
REVOKE SELECT ON my-table FROM laura
8Role-based authorization
- In SQL-1999, privileges assigned with roles
- For example
- Student role
- Instructor role
- Admin role
- Each role gets to do same (sorts of) things
- Privileges assigned by assigning role to users
GRANT SELECT ON my-table TO employee
GRANT employee TO billg
9Passwords
- DBMS recognizes your privileges because it
recognizes you - how?
- Storing passwords in the DB is a bad idea
10Hashed or digested passwords
- One-way hash function
- computing f(x) is easy
- Computing f-1(y) is hard/impossible
- Finding some x2 s.t. f(x2) f(x) is hard/imposs
- collisions
- Intuitively seeing f(x) gives little (useful)
info on x - x looks random
- PRNGs
- MD5, SHA-1
- RFID for cars http//www.rfidanalysis.org/
11Built-in accounts
- Many DBMSs (and OSs) have built-in demo accounts
by default - In some versions, must opt out
- MySQL root/(blank) (closed on sales)
- http//lists.seifried.org/pipermail/security/2004-
February/001782.html - Oracle scott/tiger (was open on sales last year)
- SQLServer sa/(blank/null)
- http//support.microsoft.com/default.aspx?scidkb
EN-US313418
12Query-related Injection attacks
- Heres a situation
- Prompt for user/pass
- Do lookup
- If found, user gets in
- test.user table in MySQL
- http//pages.stern.nyu.edu/mjohnson/dbms/php/logi
nphp.txt - http//pages.stern.nyu.edu/mjohnson/dbms/php/logi
n.php - Apart from no hashing, is this safe?
SELECT FROM users WHERE useru AND passwordp
13Injection attacks
- We expect to get input of something like
- user mjohnson
- pass secret
- ?
SELECT FROM users WHERE user u AND password
p
SELECT FROM users WHERE user 'mjohnson' AND
password 'secret'
14Injection attacks MySQL/Perl/PHP
- Consider another input
- user ' OR 11 OR user '
- pass ' OR 11 OR pass '
- ?
SELECT FROM users WHERE user u AND password
p
SELECT FROM users WHERE user '' OR 11 OR
user '' AND password '' OR 11 OR pass
''
http//pages.stern.nyu.edu/mjohnson/dbms/php/logi
n.php http//pages.stern.nyu.edu/mjohnson/dbms/eg
/injection.txt
SELECT FROM users WHERE user '' OR 11 OR
user '' AND password '' OR 11 OR pass ''
15Injection attacks MySQL/Perl/PHP
- Consider this one
- user your-boss' OR 11
- pass abc
- ?
SELECT FROM users WHERE user u AND password
p
http//pages.stern.nyu.edu/mjohnson/dbms/php/logi
n.php
SELECT FROM users WHERE user 'your-boss' OR
11 ' AND password 'abc'
SELECT FROM users WHERE user 'your-boss' OR
11 ' AND password 'abc'
16Injection attacks MySQL/Perl/PHP
- Consider another input
- user your-boss
- pass ' OR 11 OR pass '
- ?
SELECT FROM users WHERE user u AND password
p
http//pages.stern.nyu.edu/mjohnson/dbms/php/logi
n.php
SELECT FROM users WHERE user 'your-boss' AND
password '' OR 11 OR pass ''
SELECT FROM users WHERE user 'your-boss' AND
password '' OR 11 OR pass ''
17Multi-command inj. attacks (other DBs)
- Consider another input
- user ' DELETE FROM users WHERE user 'abc'
SELECT FROM users WHERE password ' - pass abc
- ?
SELECT FROM users WHERE user u AND password
p
SELECT FROM users WHERE user '' DELETE FROM
users WHERE user 'abc' SELECT FROM users WHERE
password '' AND password 'abc'
SELECT FROM users WHERE user '' DELETE FROM
users WHERE user 'abc' SELECT FROM users WHERE
password '' AND password 'abc'
18Multi-command inj. attacks (other DBs)
- Consider another input
- user ' DROP TABLE users SELECT FROM users
WHERE password ' - pass abc
- ?
SELECT FROM users WHERE user u AND password
p
SELECT FROM users WHERE user '' DROP TABLE
users SELECT FROM users WHERE password '' AND
password 'abc'
SELECT FROM users WHERE user '' DROP TABLE
users SELECT FROM users WHERE password '' AND
password 'abc'
19Multi-command inj. attacks (other DBs)
- Consider another input
- user ' SHUTDOWN WITH NOWAIT SELECT FROM users
WHERE password ' - pass abc
- ?
SELECT FROM users WHERE user u AND password
p
SELECT FROM users WHERE user '' SHUTDOWN
WITH NOWAIT SELECT FROM users WHERE password
'' AND password 'abc'
SELECT FROM users WHERE user '' SHUTDOWN
WITH NOWAIT SELECT FROM users WHERE password
'' AND password 'abc'
20Injection attacks MySQL/Perl/PHP
- Consider another input
- user your-boss
- pass ' OR 11 AND user 'your-boss
- ? Delete your boss!
DELETE FROM users WHERE user u AND password p
http//pages.stern.nyu.edu/mjohnson/dbms/php/user
s.php
DELETE FROM users WHERE user 'your-boss' AND
pass '' OR 11 AND user 'your-boss'
DELETE FROM users WHERE user 'your-boss' AND
pass ' ' OR 11 AND user 'your-boss'
21Injection attacks MySQL/Perl/PHP
- Consider another input
- user ' OR 11 OR user '
- pass ' OR 11 OR user '
- ? Delete everyone!
DELETE FROM users WHERE user u AND pass p
http//pages.stern.nyu.edu/mjohnson/dbms/php/user
s.php
DELETE FROM users WHERE user '' OR 11 OR
user '' AND pass '' OR 11 OR user ''
DELETE FROM users WHERE user '' OR 11 OR user
'' AND pass '' OR 11 OR user ''
22Preventing injection attacks
- Ultimate source of problem quotes
- Soln 1 dont allow quotes!
- Reject any entered data containing single quotes
- Q Is this satisfactory?
- Does Amazon need to sell OReilly books?
- Soln 2 escape any single quotes
- Replace any ' with a '' or \'
- In Perl, use taint mode wont show
- In PHP, turn on magic_quotes_gpc flag in
.htaccess - show both PHP versions
23Preventing injection attacks
- Soln 3 use prepare parameter-based queries
- Supported in JDBC, Perl DBI, PHP ext/mysqli
- http//pages.stern.nyu.edu/mjohnson/dbms/perl/log
insafe.cgi - http//pages.stern.nyu.edu/mjohnson/dbms/perl/use
rssafe.cgi - Very dangerous using tainted data to run
commands at the Unix command prompt - Semi-colons, prime char, etc.
- Safest define set if legal chars, not illegal
ones
24Preventing injection attacks
- When to do security checking for quotes, etc.?
- Natural choice in client-side data validation
- But not enough!
- As saw earlier can submit GET and POST params
manually - ? Must do security checking on server
25More Info
- phpGB MySQL Injection Vulnerability
- http//www.securiteam.com/unixfocus/6X00O1P5PY.htm
l - "How I hacked PacketStorm
- http//www.wiretrip.net/rfp/txt/rfp2k01.txt
26SQLPlus settings
SQLgt SET RECSEP OFF SQLgt COLUMN text FORMAT A60
27New topic Regular Expressions
- In automata theory, Finite Automata are the
simplest weakest of computer, Turing Machines the
strongest - Chomskys Hierarchy
- FA are equivalent to a regular expression
- Expressions that specify a pattern
- Can check whether a string matches the pattern
28RegEx matching
- Use REGEX_LIKE
- Metachar for any char is .
- First, get employee_comment table
- http//pages.stern.nyu.edu/mjohnson/oracle/empcom
m.sql - Now do search
- So far, like LIKE
SELECT emp_id, text FROM employee_comment WHERE
REGEXP_LIKE(text,'...-....')
29RegEx matching
- Can also pull out the matching text with
REGEXP_SUBSTR - If want only numbers, can specify a set of chars
rather than a dot
SELECT emp_id, REGEXP_SUBSTR(text,'...-....')
text FROM employee_comment WHERE
REGEXP_LIKE(text,'...-....')
SELECT emp_id, REGEXP_SUBSTR(text,
'0123456789..-...0123456789') text FROM
employee_comment WHERE REGEXP_LIKE(text,
'0123456789..-...0123456789')
30RegEx matching
- Or can specify a range of chars
- Or, finally, can state how many copies to match
SELECT emp_id, REGEXP_SUBSTR(text,
'0-9..-....') text FROM employee_comment WHERE
REGEXP_LIKE(text,'...-....')
SELECT emp_id, REGEXP_SUBSTR(text, '0-93-0-9
4') text FROM employee_comment WHERE
REGEXP_LIKE(text,'0-93-0-94')
31RegExp matching
- Other operators
- - 0 or more matches
- - 1 or more matches
- ? - 0 or 1 match
- Also, can OR options together with op
- Here some phone nums have area codes, some not,
so want to match both
SELECT emp_id, REGEXP_SUBSTR(text, '0-93-0-9
3-0-940-93-0-94') text FROM
employee_comment WHERE REGEXP_LIKE(text,'0-93-
0-93-0-940-93-0-94')
32RegExp matching
- Order of ORed together patterns matters
- First matching pattern wins
SELECT emp_id, REGEXP_SUBSTR(text, '0-93-0-9
40-93-0-93-0-94') text FROM
employee_comment WHERE REGEXP_LIKE(text,'0-93-
0-940-93-0-93-0-94')
33RegExp matching
- Theres a shared structure between the two, tho
- Area code is just optional
- Can use ? op
SELECT emp_id, REGEXP_SUBSTR(text, '(0-93-)?
0-93-0-94') text FROM employee_comment WHER
E REGEXP_LIKE(text,'(0-93-)?0-93-0-94'
)
34RegExp matching
- Also, different kinds of separators
- dash, dot, just blank
- Can OR together whole number patterns
- Better Just use set of choices of each sep.
SELECT emp_id, REGEXP_SUBSTR(text, '(0-93-.
)?0-93-. 0-94') text FROM
employee_comment WHERE REGEXP_LIKE(text,'(0-93
-. )?0-93-. 0-94')
35RegExp matching
- One other thing area codes in parentheses
- Of course, area codes are still optional
- Parentheses must be escaped - \( \)
SELECT emp_id, REGEXP_SUBSTR(text, '(0-93-.
\(0-93\) )?0-93-. 0-94')
text FROM employee_comment WHERE
REGEXP_LIKE(text,'(0-93-. \(0-93\)
)?0-93-. 0-94')
36And now for something completely different XML
- XML eXtensible Mark-up Language
- Very popular language for semi-structured data
- Mark-up language consists of elements composed
of tags, like HTML - Emerging lingua franca of the Internet, Web
Services, inter-vender comm
37Unstructured data
- At one end of continuum unstructured data
- Text files
- Stock market prices
- CIA intelligence intercepts
- Audio recordings
- Just one damn bit after another
- Henry Ford
- No (intentional, formal) patterns to the data
- Difficult to manage/make sense of
- Why we need data-mining
38Structured data
- At the other end structured data
- Tables in RDBMSs
- Data organized into semantic chunks
- entities
- Similar/related entities grouped together
- Relationships, classes
- Entities in same group have same structure
- Same fields/attributes/properties
- Easy to make sense of
- But sometimes too rigid a req.
- Difficult to sendconvert to tab-delimited
39Semi-structured data
- Not too random
- Data organized into entities
- Similar/related grouped to form other entities
- Not too structured
- Some attributes may be missing
- Size of attributes may vary
- Support of lists/sets
- Juuust Right
- Data is self-describing
40Semi-structured data
- Predominant examples
- HTML HyperText Mark-up Language
- XML eXtensible Mark-up Language
- NB both mark-up languages (use tags)
- Mark-up lends self of semi-structured data
- Demarcate boundaries for entities
- But freely allow other entities inside
41Data model for semi-structured data
- Usually represented as directed graphs
- Graph set of vertices (nodes) and edges
- Dots connected by lines not nec. a tree!
- In model,
- Nodes entities or fields/attributes
- Edges attribute-of/sub-entity-of
- Example publisher publishes gt0 books
- Each book has one title, one year, gt1 authors
- Draw publishers graph
42XML is a SSD language
- Standard published by W3C
- Officially announced/recommended in 1998
- XML ! HTML
- XML ! a replacement for HTML
- Both are mark-up languages
- Big diffs
- XML doesnt use predefined tags (!)
- But its extensible tags can be added
- HTML is about presentation ltIgt, ltBgt, ltPgt
- XML is about content ltbookgt, ltauthorgt
43XML syntax
- Like HTML in many respects but more strict
- All tags must be closed
- Cant have this is a lineltbrgt
- Every start tag has an end tag
- Although ltbr/gt style can replace both
- IS case-sensitive
- IS space-sensitive
- XML doc has a unique root element
44XML syntax
- Tags must be properly nested
- Not allowed ltbgtltigtIm not kiddinglt/bgtlt/igt
- Intuition file folders
- Elements may have quoted attributes
- ltMyelm myattmyvalgtlt/Myelmgt
- Comments same as in HTML
- lt!-- Pay no attention --gt
- Draw publishers XML
45Escape chars in XML
- Some chars must be escaped
- Distinguish content from syntax
- Can also declare value to be pure text
ltelmgt3 lt 5lt/elmgt
ltelmgtquotDonapost call me aposIshmaelapos!
quotlt/elmgt
ltaRealTaggt lt!CDATAltnotAtaggtjsdljsdltneitherAm
Ilt'gtltgtgtgtlt/aRealTaggt
46XML Namespaces
- Different schemas/DTDs may overlap
- XHTML and MathML share some tags
- Soln namespaces
- as in Java/C/C
ltbook xmlnsisbn"www.isbn-org.org/def"gt lttitlegt.
..lt/titlegt ltnumbergt15lt/numbergt ltisbnnumbergt...lt
/isbnnumbergt lt/bookgt
47From Relational Data to XML Data
persons
XML
persons
row
row
row
name
name
name
ssn
ssn
ssn
- ltpersonsgt
- ltrowgtltnamegtMichaellt/namegt
- ltssngt123lt/ssngtlt/rowgt
- ltrowgtltnamegtHilarylt/namegt
- ltssngt456lt/ssngtlt/rowgt
- ltrowgtltnamegtBilllt/namegt
- ltssngt789lt/ssngtlt/rowgt
- lt/personsgt
Michael
123
Hilary
Bill
456
789
48Semi-structured Data Explained
- List-valued attributes
- XML is not 1NF!
- Impossible in (single, BCNF) tables
ltpersonsgt ltrowgtltnamegtHilarylt/namegt
ltphonegt202-222-2222lt/phonegt
ltphonegt914-222-2222lt/phonegtlt/rowgt
ltrowgtltnamegtBilllt/namegt ltphonegt914-222-2222
lt/phonegt ltphonegt212-333-3333lt/phonegtlt/rowgt
lt/personsgt
? two phones!
???
49Object ids and References
ltmovieinfogt ltmovie id"o111"gt lttitlegtLost in
Translationlt/titlegt ltyeargt2003lt/yeargt ltstars
idref"o333 o444"/gt lt/moviegt ltmovie id"o222"gt
lttitlegtHamletlt/titlegt ltyeargt1999lt/yeargt
ltstars idref"o333"/gt lt/moviegt ltperson
id"o456"gt ltperson id"o111"gt ltnamegtBill
Murraylt/namegt ltmovies idref"o111
o222"/gt lt/persongt lt/movieinfogt
- SSD graph might not be trees!
- But XML docs must be
- Would cause much redundancy
- Soln same concept as pointers in C/C/J
- Object ids and references
- Graph example
- Movies Lost in Translation, Hamlet
- Stars Bill Murray, Scarlet Johansson
50What do we do with XML?
- Things done with XML
- Send to partners
- Parse XML received
- Convert to RDBMS rows
- Query for particular data
- Convert to other XML
- Convert to formats other than XML
- Lots of tools/standards for these
51DTDs understanding XML
- XML is extensible
- Advantage when creating, we can use any tags we
like - Disadv when reading, they can use any tags they
like - Using XML docs a priori is very difficult
- Solution impose some constraints
52DTDs
- DTD Document Type Definition
- You and partners/vertical industry/academic
discipline decide on a DTD/schema for your docs - Specify which entities you may use/must
understand - Specify legal relationships
- DTD specifies the grammar to be used
- DTD set of rules for creating valid entities
- DTD tells your software what to look for in doc
53DTD examples
- Well-formed XML v. valid XML
- Simple example
- http//pages.stern.nyu.edu/mjohnson/dbms/xml/note
.xml - http//pages.stern.nyu.edu/mjohnson/dbms/xml/badn
ote.xml - http//pages.stern.nyu.edu/mjohnson/dbms/xml/badn
ote2.xml - Copy from http//pages.stern.nyu.edu/mjohnson/db
ms/eg/xml.txt - Partial publisher example rules
- Root ? publisher
- Publisher ? name, book, author
- Book ? title, date, author
- Author ? firstname, middlename?, lastname
54Partial DTD example (typos!)
- lt?xml version1.0 encodingUTF-8 ?gt
- lt!DOCTYPE PUBLISHER
- lt!ELEMENT PUBLISHER (name, book, author)gt
- lt!ELEMENT name (PCDATA)gt
- lt!ELEMENT BOOK (title, date, author)gt
- lt!ELEMENT AUTHOR (firstname, middlename?,
lastnamegt - lt!ELEMENT firstname (PCDATA)gt
- lt!ELEMENT lastname (PCDATA)gt
- lt!ELEMENT middlename (PCDATA)gt
- DTD is not XML, but can be embedded in or ref.ed
from XML - Replacement for DTDs is XML Schema
55XML Applications/dialects
- MathML Mathematical Markup Language
- http//wwwasdoc.web.cern.ch/wwwasdoc/WWW/publicati
ons/ictp99/ictp99N8059.html - VoiceXML http//newmedia.purchase.edu/Jeanine/in
terfaces/rps.xml - ChemML Chemical Markup Language
- XHMTL HTML retrofitted as an XML application
56XML Applications/dialects
- VoiceXML
- http//newmedia.purchase.edu/Jeanine/interfaces/r
ps.xml - ATT Directory Assistance
- http//phone.yahoo.com/
57More XML Apps
- FIXML
- XML equiv. of FIX Financial Information eXchange
- swiftML
- XML equiv. of SWIFT Society for Worldwide
Interbank Financial Telecommunications message
format - Apaches Ant
- Scripting language for Java build management
- http//ant.apache.org/manual/using.html
- Many more
- http//www-106.ibm.com/developerworks/xml/library/
x-stand4/
58More XML Applications/Protocols
- RSS Rich Site Summary/Really Simple Syndication
- News sites, blogs
- http//slate.msn.com/rss/
- http//slashdot.org/index.rss
- Screenshot
- http//paulboutin.weblogger.com/pictures/viewer67
3 - More info http//slate.msn.com/id/2096660/
ltchannelgt lttitlegtmy channellt/titlegt ltitemgt
lttitlegtstory 1lt/titlegt ltlinkgtlt/linkgt lt/itemgt //
other items lt/channelgt
59More XML Applications/Protocols
- SOAP Simple Object Access Protocol
- XML-based messaging format
- Used by Google API http//www.google.com/apis/
- Amazon API http//amazon.com/gp/aws/landing.html
- Amazon light http//kokogiak.com/amazon/
- Other examples http//www.wired.com/wired/archive
/12.03/google.html?pg10topictopic_set - SOAP envelope with header and body
- Request sales tax for total
ltSOAPEnvelope xmlnsSOAP"urnschemas-xmlsoap-or
gsoap.v1"gt ltSOAPHeadergtlt/SOAPHeadergt
ltSOAPBodygt ltGetSalesTaxgt
ltSalesTotalgt100lt/SalesTotalgt
ltGetSalesTaxgt lt/SOAPBodygt lt/SOAPEnvelopegt
60More XML Applications/Protocols
- lt?xml version"1.0" encoding"UTF-8"?gt
- ltsoapEnvelope xmlnssoap"http//schemas.xmlsoap.
org/soap/envelope/"gt - ltsoapBodygt
- ltgsdoGoogleSearch xmlnsgs"urnGoogleSearch"
gt - ltkeygt(key)slt/keygt
- ltstartgt0lt/startgt
- ltmaxResultsgt10lt/maxResultsgt
- ltfiltergttruelt/filtergt
- ltrestrict/gt
- ltsafeSearchgtfalselt/safeSearchgt
- ltlr/gt
- lt/gsdoGoogleSearchgt
- lt/soapBodygt
- lt/soapEnvelopegt
61New topic XML in Oracle - purchase-order e.g
lt?xml version"1.0"?gt ltpurchase_ordergt
ltcustomer_namegtAlpha Techlt/customer_namegt
ltpo_numbergt11257gtlt/po_numbergt
ltpo_dategt2004-01-20lt/po_dategt ltpo_itemsgt
ltitemgt ltpart_numbergtAI5-4557lt/part_numbergt
ltquantitygt20lt/quantitygt lt/itemgt
ltitemgt ltpart_numbergtEI-T5-001lt/part_numbergt
ltquantitygt12lt/quantitygt lt/itemgt
lt/po_itemsgt lt/purchase_ordergt
62Storing XML data
- As of 9i, has XMLType data type
- By default, underlying storage is as CLOB
CREATE TABLE purchase_order( po_id number(5)
not null, customer_po_nbr varchar(20),
customer_inception_date date, order_nbr
number(5), purchase_order_doc xmltype,
constraint purchase_order_pk primary
key(po_id) )
63Loading XML into Oracle
- First, log in as sys
- Now scott can import
connect sys/junk as sysdba create directory
xml_data as '/xml' grant read, write on
directory xml_data to scott
connect scott/tiger declare bf1 bfile begin bf1
bfilename('XML_DATA', 'purch_ord.xml') insert
into purchase_order(po_id, purchase_order_doc)
values(1000, xmltype(bf1, nls_charset_id('we8mswi
n1252'))) end
64Loading XML into Oracle
- Not just loading raw text
- XMLType data must be well-formed
- Parsable as XML
- Try modifying customer_name open tag
65Accessing XML in Oracle
- Now can look at raw XML
- Can also use XPath to extract particular nodes
and values, with extract function
SQLgt SELECT purchase_order_doc FROM
purchase_order
SQLgt SELECT extract(purchase_order_doc,
'/purchase_order/customer_name') FROM
purchase_order
66XPath in Oracle
- Can also extract all nodes of one type,
underneath some node, with double-slash // - All purchase order items
- NB this is not valid XML
- No unique root
- Can request just one with bracket op
- Numbering starts at 1, not 0
- Wrong name/number ? no error, no results
SQLgt SELECT extract(purchase_order_doc,
'/purchase_order//item') FROM purchase_order
SQLgt SELECT extract(purchase_order_doc,
'/purchase_order/po_items/item2') FROM
purchase_order
67extract v. extractvalue
- extractvalue returns value, not whole node
- vs.
- extractvalue applies only to unique nodes
SQLgt SELECT extract(purchase_order_doc,
'/purchase_order/customer_name') FROM
purchase_order
SQLgt SELECT extractvalue(purchase_order_doc,
'/purchase_order/customer_name') FROM
purchase_order
SQLgt SELECT extractvalue(purchase_order_doc,
'/purchase_order/po_items') FROM purchase_order
68existsnode function
- Can check whether node/location exists with
existnode function - Returns 1 or 0
- Also applies to bracketed paths
SQLgt SELECT po_id FROM purchase_order WHERE
existsnode(purchase_order_doc, '/purchase_order/c
ustomer_name') 1
SQLgt SELECT po_id FROM purchase_order WHERE
existsnode(purchase_order_doc, '/purchase_order/p
o_items/item1') 1
69Moving data from XML to relations
- To move single values from XML to tables, can
simply use extractvalue in UPDATE statements
SQLgt UPDATE purchase_order SET order_nbr
7101, customer_po_nbr extractvalue(purchase_ord
er_doc, '/purchase_order/po_number'), customer_in
ception_date to_date(extractvalue(purchase_orde
r_doc, '/purchase_order/po_date'), 'yyyy-mm-dd')
70Moving data from XML to relations
- What about moving set of nodes
- The two item nodes
- Use xmlsequence to get a varray of items
- Use TABLE to convert to a relation
SQLgt SELECT extract(purchase_order_doc,
'/purchase_order//item') FROM purchase_order
SQLgt SELECT rownum, item. FROM TABLE( SELECT
xmlsequence(extract(purchase_order_doc,
'/purchase_order//item')) FROM purchase_order)
item
71Moving data from XML to relations
- Result is a two-row relation with XMLTypes
- Can use extractvalue to extract this data
- First, create destination table
CREATE TABLE LINE_ITEM( ORDER_NBR NUMBER(9)
NOT NULL, PART_NBR VARCHAR2(20) NOT NULL,
QTY NUMBER(5) NOT NULL, FILLED_QTY
NUMBER(5), CONSTRAINT line_item_pk PRIMARY
KEY (ORDER_NBR,PART_NBR))
72Moving data from XML to relations
SQLgt INSERT INTO line_item(order_nbr,part_nbr,qty)
SELECT 7109, extractvalue(column_value,
'/item/part_number'), extractvalue(column_value,
'/item/quantity') FROM TABLE( SELECT
xmlsequence(extract(purchase_order_doc,
'/purchase_order//item')) FROM
purchase_order )
73XML Schemas and Oracle
- By default, XML must be well-formed to be read
into the XMLType field - XML is valid if it conforms to a schema
- To use a schema with Oracle, must first register
it
declare bf1 bfile begin bf1
bfilename('XML_DATA', 'purch_ord.xsd') dbms_xmls
chema.registerschema( 'http//localhost8080/home/
xml/schemas/purch_ord.xsd', bf1) end
74XML Schemas and Oracle
- With schema registered, can apply it to an
XMLType field
CREATE TABLE purchase_order2 (po_id NUMBER(5)
NOT NULL, customer_po_nbr VARCHAR2(20),
customer_inception_date DATE, order_nbr
NUMBER(5), purchase_order_doc XMLTYPE,
CONSTRAINT purchase_order2_pk PRIMARY KEY
(po_id)) XMLTYPE COLUMN purchase_order_doc
XMLSCHEMA "http//localhost8080/home/xml/schemas/
purch_ord.xsd" ELEMENT "purchase_order"
75Importing to schema field
- Try to import xml file, get error
declare bf1 bfile begin bf1
bfilename('XML_DATA', 'purch_ord.xml') insert
into purchase_order2(po_id, purchase_order_doc)
values (2000, XMLTYPE(bf1, nls_charset_id('WE8MS
WIN1252'))) end
76Importing to schema field
- Root node of XML must specify the schema
- Change root to the following
- Now can import
- Also fails if extra or missing nodes
- Modify company_name node
- Add new comments node
ltpurchase_order xmlnsxsi"http//www.w3.org/2001
/XMLSchema-instance" xsinoNamespaceSchemaLocation
"http//localhost8080/home/xml/schemas/purch_ord
.xsd"gt
77Can check to see whether schema is used
- Can call isSchemaBased(), getSchemaURL() and
isSchemaValid() on XMLType fields
SQLgt select po.purchase_order_doc.isSchemaBased(),
po.purchase_order_doc.getSchemaURL(), po.purcha
se_order_doc.isSchemaValid() from purchase_order2
po
78Updating XMLType data
- Can update XMLType data with ordinary UPDATE
statements - Replaces whole XMLType object with new one
SQLgt UPDATE purchase_order po SET
po.purchase_order_doc XMLTYPE(BFILENAME('XML_
DATA', 'purch_ord_alt.xml'),
nls_charset_id('WE8MSWIN1252')) WHERE po.po_id
2000
79Updating XMLType data
- Can also modify the existing XMLType object
- By writing node values
- updateXML() function does search/replace
- But searches for node, not value
SQLgt SELECT extract(po.purchase_order_doc, '/purch
ase_order/customer_name') FROM purchase_order
po WHERE po_id 1000 SQLgt UPDATE
purchase_order po SET po.purchase_order_doc
updateXML(po.purchase_order_doc, '/purchase_order/
customer_name/text()', 'some other
company') WHERE po.po_id 1000
80Updating XMLType data
- Can also write whole node, using XMLType
- Validation/well-formedness is still checked
SQLgt UPDATE purchase_order po SET
po.purchase_order_doc updateXML(po.purchase_ord
er_doc, '/purchase_order/customer_name', XMLTYPE
('ltcustomer_namegtsome third companylt/customer_nam
egt')) WHERE po.po_id 1000 SQLgt SELECT
extract(po.purchase_order_doc, '/purchase_order/c
ustomer_name') FROM purchase_order po WHERE
po_id 1000
81Updating XMLType data
- And can update items in a collection
SQLgt SELECT extract(po.purchase_order_doc,
'/purchase_order//item') FROM purchase_order
po WHERE po.po_id 1000 SQLgt UPDATE
purchase_order po SET po.purchase_order_doc
updateXML(po.purchase_order_doc,
'/purchase_order/po_items/item1',
XMLTYPE('ltitemgtltpart_numbergtT-1000lt/part_numbergtltq
uantitygt33lt/quantitygtlt/itemgt')) WHERE po.po_id
1000
82Converting relational data to XML
- Saw how to put XML in a table
- Conversely, can convert ordinary relational data
to XML - XMLElement() generates an XML node
- First, create supplier table
CREATE TABLE SUPPLIER( SUPPLIER_ID NUMBER(5)
NOT NULL, NAME VARCHAR2(30) NOT NULL,
PRIMARY KEY (SUPPLIER_ID)) insert into supplier
values(1, 'Acme') insert into supplier values(2,
'Tilton') insert into supplier values(3,
'Eastern')
83Converting relational data to XML
- Now can call XMLElement function to wrap values
in tags - And can build it up
- Dont concatenate! Turns to strings, escapes lt gt
- Error in book
SELECT XMLElement("supplier_id", s.supplier_id)
XMLElement("name", s.name) xml_fragment FROM
supplier s
SELECT XMLElement("supplier", XMLElement("supplie
r_id", s.supplier_id), XMLElement("name",
s.name)) FROM supplier s
84XMLForest()
- More simply, can use XMLForest() function
SELECT XMLElement("supplier",
XMLForest(s.supplier_id, s.name)) FROM supplier s
85XMLAgg()
- Can use XMLAgg() to put nodes together inside
another node
SELECT XMLElement("supplier_list",
XMLAgg(XMLElement("supplier",
XMLElement("supplier_id", s.supplier_id),
XMLElement("name", s.name) )))
xml_document FROM supplier s
86New topic Data Warehousing
- Physical warehouse stores different kinds of
items - combined from different sources in supply chain
- access items as a combined package
- Synergy
- DW is the sys containing the data from many DBs
- OLAP is the system for easily querying the DW
- Online analytical processing
- front-end to DW stats
87Integrating Data
- Ad hoc combination of DBs from different sources
can be problematic - Data may be spread across many systems
- geographically
- by division
- different systems from before mergers
88Conversion/scrubbing/merging
- Lots of issues
- different types of data
- Varchar(255) v. char(30)
- Different values for data
- GREEN/GR/2
- Semantic differences
- Cars v. Automobiles
- Missing values
- Handle with nulls or XML
89Federated DBs
- Situ n different DBs must work together
- One idea write programs for each to talk to each
other one - How many programs required?
- Like ambassadors for each country
90Federated DBs
- Better idea introduce another DB
- write programs for it to talk to each other DB
- Now how many programs?
- English in business, French in diplomacy
- ? Warehousing
- Refreshed nightly
91OLTP v. OLAP
- DWs usually not updated in real-time
- data is usually not live
- but care about higher-level, longer-term patterns
- For knowledge workers/decision-makers
- Live data is in system used by OLTP
- online transaction processing
- E.g., airline reservations
- OLTP data loaded into DW periodically, say nightly
92Utilizing Data
- Situ each time manager has hunch
- ? requests custom reports
- ? direct programmers to write/modify SQL app to
produce these results - on higher or lower levels, for different
specifics - Problem too difficult/expensive/slow
- too great a time lag
93EISs
- Could just write queries at command-prompt
- But decision makes arent (all) SQL programmers
- Soln create an executive information system
- provides friendly front-end to common, important
queries - basically a simple DB front-end
- your project part 5
- GROUP BY queries are particularly applicable
94EISs v. OLAP
- Okay for fixed set of queries
- But what if queries are open-ended?
- Q Whats driving sales in the Northeast?
- Whats the source cause?
- Result from one query influences next query tried
- OLAP systems are interactive
- run query
- analyze results
- think of new query
- repeat
95Star Schemas
- Popular schema for DW data
- One central DB surrounded by specific DBs
- Center fact table
- Extremities data tables
- Fields in fact table are foreign keys to data
tables - Normalization ? Snowflake Schema
- May not be worthwhile
96Dates and star schemas
- OLAP behaves as though you had a Days table, with
every possible row - Dates(day, week, month, year, DID)
- (5, 27, 7, 2000)
- Can join on Days like any other table
97Dates and star schemas
- E.g. products x salesperson x region x date
- Products sold by salespeople in regions on dates
- Regular dim tables
- Product(PID, name, color)
- Emp(name, SSN, sal)
- Region(name, RID)
- Fact table
- Sales(PID, DID, SSN, RID)
- Interpret as a cube (cross product of all
dimensions) - Can have both data and stats
98Drill-down roll-up
- Imagine notice some regions sales way up
- Why? Good salesperson? Some popular product
there? - Maybe need to search by month, or month and
product, abstract back up to just product - slicing dicing
99OLAP and data warehousing
- Could write GROUP BY queries for each
- OLAP systems provide simpler, non-SQL interface
for this sort of thing - Vendors MicroStrategy, SAP, etc.
- Otoh DW-style operators have been added to SQL
and some DBMSs
100DW extensions in SQL ROLLUP (Oracle)
- Suppose have orders table (from two years), with
region and date info - Can select total sales
- Examples derived/from Mastering Oracle SQL, 2e
(OReilly) - Get data here http//examples.oreilly.com/mastora
sql2/mosql2_data.sql
SQLgt column month format a10 SQLgt
_at_mosql2_data SQLgt describe all_orders
SELECT sum(o.tot_sales) FROM all_orders o join
region r ON r.region_id o.region_id
101DW extensions in SQL ROLLUP (Oracle)
- Can write GROUP BY queries for year or region or
both
SELECT r.name region, o.year, sum(o.tot_sales) FRO
M all_orders o join region r ON r.region_id
o.region_id GROUP BY (r.name, o.year)
102DW extensions in SQL ROLLUP (Oracle)
- ROLLUP operator
- Extension of GROUP BY
- Does GROUP BY on several levels, simultaneously
- Order matters
- Get sales totals for each region/year pair each
region, and the grand total
SELECT r.name region, o.year, sum(o.tot_sales) FRO
M all_orders o join region r ON r.region_id
o.region_id GROUP BY ROLLUP (r.name, o.year)
103DW extensions in SQL ROLLUP (Oracle)
- Change the order of the group fields to get a
different sequence of groups - To get totals for each year/region pair, each
year, and the grand total, and just reverse
group-by order
SELECT o.year, r.name region, sum(o.tot_sales) FRO
M all_orders o join region r ON r.region_id
o.region_id GROUP BY ROLLUP (o.year, r.name)
104DW extensions in SQL ROLLUP (Oracle)
- Adding more dimensions, like month, is easy
(apart from formatting) - NB summing happens on each level
SELECT o.year, to_char(to_date(o.month,
'MM'),'Month') month, r.name region,
sum(o.tot_sales) FROM all_orders o join region
r ON r.region_id o.region_id GROUP BY ROLLUP
(o.year, o.month, r.name)
105DW extensions in SQL ROLLUP (Oracle)
- If desired, can combine fields for the sake of
grouping
SELECT o.year, to_char(to_date(o.month,
'MM'),'Month') month, r.name region,
sum(o.tot_sales) FROM all_orders o join region
r ON r.region_id o.region_id GROUP BY ROLLUP
((o.year, o.month), r.name)
106DW extensions in SQL CUBE (Oracle)
- Another GROUP BY extension CUBE
- Subtotals all possible combins of group-by fields
(powerset) - Syntax ROLLUP ? CUBE
- Order of fields doesnt matter (apart from
ordering) - To get subtotals for each region/month pair, each
region, each month, and the grand total
SELECT to_char(to_date(o.month, 'MM'),'Month')
month, r.name region, sum(o.tot_sales) FROM
all_orders o join region r ON r.region_id
o.region_id GROUP BY CUBE (o.month, r.name)
107DW extensions in SQL CUBE (Oracle)
- Again, can easily add more dimensions
SELECT o.year, to_char(to_date(o.month,
'MM'),'Month') month, r.name region,
sum(o.tot_sales) FROM all_orders o join region
r ON r.region_id o.region_id GROUP BY CUBE
(o.year, o.month, r.name)
108DW SQL exts GROUPING SETS (Oracle)
- Thats a lot of rows
- Instead of a cube of all combinations, maybe we
just want the totals for each individual field
SELECT o.year, to_char(to_date(o.month,
'MM'),'Month') month, r.name region,
sum(o.tot_sales) FROM all_orders o join region
r ON r.region_id o.region_id GROUP BY GROUPING
SETS (o.year, o.month, r.name)
109Thats all, folks!
- Selected solutions to exercises
- sqlzoo Answers on sqlzoo.net
- PL/SQL solutions http//pages.stern.nyu.edu/mjoh
nson/oracle/archive/fall04/plsql/ - Files from my machine
- http//pages.stern.nyu.edu/mjohnson/oracle/machin
e/ - mpjohnson-at-gmail.com
- Final evals!