Title: JDBC and Java Access to DBMS
1JDBC and Java Access to DBMSIntroduction to
Data Warehouses
- University of California, Berkeley
- School of Information
- IS 257 Database Management
2Lecture Outline
- Review
- Object-Relational DBMS
- OR features in Oracle
- OR features in PostgreSQL
- Extending OR databases (examples from PostgreSQL)
- Java and JDBC
- Introduction to Data Warehouses
3Lecture Outline
- Object-Relational DBMS
- OR features in Oracle
- OR features in PostgreSQL
- Extending OR databases (examples from PostgreSQL)
- Java and JDBC
- Introduction to Data Warehouses
4Object Relational Data Model
- Class, instance, attribute, method, and integrity
constraints - OID per instance
- Encapsulation
- Multiple inheritance hierarchy of classes
- Class references via OID object references
- Set-Valued attributes
- Abstract Data Types
5Object Relational Extended SQL (Illustra)
- CREATE TABLE tablename OF TYPE TypenameOF NEW
TYPE typename (attr1 type1, attr2 type2,,attrn
typen) UNDER parent_table_name - CREATE TYPE typename (attribute_name type_desc,
attribute2 type2, , attrn typen) - CREATE FUNCTION functionname (type_name,
type_name) RETURNS type_name AS sql_statement
6Object-Relational SQL in ORACLE
- CREATE (OR REPLACE) TYPE typename AS OBJECT
(attr_name, attr_type, ) - CREATE TABLE OF typename
7Example
- CREATE TYPE ANIMAL_TY AS OBJECT (Breed
VARCHAR2(25), Name VARCHAR2(25), Birthdate DATE) - Creates a new type
- CREATE TABLE Animal of Animal_ty
- Creates Object Table
8Constructor Functions
- INSERT INTO Animal values (ANIMAL_TY(Mule,
Frances, TO_DATE(01-APR-1997,
DD-MM-YYYY))) - Insert a new ANIMAL_TY object into the table
9PostgreSQL Classes
- The fundamental notion in Postgres is that of a
class, which is a named collection of object
instances. Each instance has the same collection
of named attributes, and each attribute is of a
specific type. Furthermore, each instance has a
permanent object identifier (OID) that is unique
throughout the installation. Because SQL syntax
refers to tables, we will use the terms table and
class interchangeably. Likewise, an SQL row is an
instance and SQL columns are attributes.
10Creating a Class
- You can create a new class by specifying the
class name, along with all attribute names and
their types - CREATE TABLE weather (
- city varchar(80),
- temp_lo int, -- low
temperature - temp_hi int, -- high
temperature - prcp real, --
precipitation - date date
- )
11PostgreSQL
- Postgres can be customized with an arbitrary
number of user-defined data types. Consequently,
type names are not syntactical keywords, except
where required to support special cases in the
SQL92 standard. - So far, the Postgres CREATE command looks exactly
like the command used to create a table in a
traditional relational system. However, we will
presently see that classes have properties that
are extensions of the relational model.
12Inheritance
- CREATE TABLE cities (
- name text,
- population float,
- altitude int -- (in ft)
- )
- CREATE TABLE capitals (
- state char(2)
- ) INHERITS (cities)
-
13Inheritance
- In Postgres, a class can inherit from zero or
more other classes. - A query can reference either
- all instances of a class
- or all instances of a class plus all of its
descendants
14Non-Atomic Values - Arrays
- The preceding SQL command will create a class
named SAL_EMP with a text string (name), a
one-dimensional array of int4 (pay_by_quarter),
which represents the employee's salary by quarter
and a two-dimensional array of text (schedule),
which represents the employee's weekly schedule - Now we do some INSERTSs note that when appending
to an array, we enclose the values within braces
and separate them by commas.
15PostgreSQL Extensibility
- Postgres is extensible because its operation is
catalog-driven - RDBMS store information about databases, tables,
columns, etc., in what are commonly known as
system catalogs. (Some systems call this the data
dictionary). - One key difference between Postgres and standard
RDBMS is that Postgres stores much more
information in its catalogs - not only information about tables and columns,
but also information about its types, functions,
access methods, etc. - These classes can be modified by the user, and
since Postgres bases its internal operation on
these classes, this means that Postgres can be
extended by users - By comparison, conventional database systems can
only be extended by changing hardcoded procedures
within the DBMS or by loading modules
specially-written by the DBMS vendor.
16Rules System
- CREATE RULE name AS ON event
- TO object WHERE condition
- DO INSTEAD action NOTHING
- Rules can be triggered by any event (select,
update, delete, etc.)
17Views as Rules
- Views in Postgres are implemented using the rule
system. In fact there is absolutely no difference
between a - CREATE VIEW myview AS SELECT FROM mytab
- compared against the two commands
- CREATE TABLE myview (same attribute list as for
mytab) - CREATE RULE "_RETmyview" AS ON SELECT TO myview
DO INSTEAD - SELECT FROM mytab
18Extensions to Indexing
- Access Method extensions in Postgres
- GiST A Generalized Search Trees
- Joe Hellerstein, UC Berkeley
19Indexing in OO/OR Systems
- Quick access to user-defined objects
- Support queries natural to the objects
- Two previous approaches
- Specialized Indices (ABCDEFG-trees)
- redundant code most trees are very similar
- concurrency control, etc. tricky!
- Extensible B-trees R-trees (Postgres/Illustra)
- B-tree or R-tree lookups only!
- E.g. WHERE movie.video lt Terminator 2
20GiST Approach
- A generalized search tree. Must be
- Extensible in terms of queries
- General (B-tree, R-tree, etc.)
- Easy to extend
- Efficient (match specialized trees)
- Highly concurrent, recoverable, etc.
21GiST Applications
- New indexes needed for new apps...
- find all supersets of S
- find all molecules that bind to M
- your favorite query here (multimedia?)
- ...and for new queries over old domains
- find all points in region from 12 to 2 oclock
- find all text elements estimated relevant to a
query string
22Lecture Outline
- Review
- Object-Relational DBMS
- OR features in Oracle
- OR features in PostgreSQL
- Extending OR databases (examples from PostgreSQL)
- Java and JDBC
- Introduction to Data Warehouses
23Java and JDBC
- Java is probably the high-level language used in
instruction and development today one of the
earliest enterprise additions to Java was JDBC - JDBC is an API that provides a mid-level access
to DBMS from Java applications - Intended to be an open cross-platform standard
for database access in Java - Similar in intent to Microsofts ODBC
24JDBC Architecture
- The goal of JDBC is to be a generic SQL database
access framework that works for any database
system with no changes to the interface code
Java Applications
JDBC API
JDBC Driver Manager
Driver
Driver
Driver
Oracle
MySQL
Postgres
25JDBC
- Provides a standard set of interfaces for any
DBMS with a JDBC driver using SQL to specify
the databases operations.
26JDBC Simple Java Implementation
import java.sql. import oracle.jdbc. public
class JDBCSample public static void
main(java.lang.String args) try //
this is where the driver is loaded
//Class.forName("jdbc.oracle.thin")
DriverManager.registerDriver(new
OracleDriver()) catch (SQLException e)
System.out.println("Unable to load driver
Class") return
27JDBC Simple Java Impl.
try //All DB access is within the
try/catch block... // make a connection to
ORACLE on Dream Connection con
DriverManager.getConnection(
"jdbcoraclethin_at_dream.sims.berkel
ey.edu1521dev", mylogin",
myoraclePW") // Do an SQL statement...
Statement stmt con.createStatement()
ResultSet rs stmt.executeQuery("SELECT NAME
FROM DIVECUST")
28JDBC Simple Java Impl.
// show the Results... while(rs.next())
System.out.println(rs.getString("NAME"))
// Release the database
resources... rs.close()
stmt.close() con.close() catch
(SQLException se) // inform user of
errors... System.out.println("SQL Exception
" se.getMessage()) se.printStackTrace(Syst
em.out)
29JDBC
- Once a connection has been made you can create
three different types of statement objects - Statement
- The basic SQL statement as in the example
- PreparedStatement
- A pre-compiled SQL statement
- CallableStatement
- Permits access to stored procedures in the
Database
30JDBC Resultset methods
- Next() to loop through rows in the resultset
- To access the attributes of each row you need to
know its type, or you can use the generic
getObject() which wraps the attribute as an
object
31JDBC GetXXX() methods
SQL data type Java Type GetXXX()
CHAR String getString()
VARCHAR String getString()
LONGVARCHAR String getString()
NUMERIC Java.math.BigDecimal GetBigDecimal()
DECIMAL Java.math.BigDecimal GetBigDecimal()
BIT Boolean getBoolean()
TINYINT Byte getByte()
32JDBC GetXXX() Methods
SQL data type Java Type GetXXX()
SMALLINT Integer (short) getShort()
INTEGER Integer getInt()
BIGINT Long getLong()
REAL Float getFloat()
FLOAT Double getDouble()
DOUBLE Double getDouble()
BINARY Byte getBytes()
VARBINARY Byte getBytes()
LONGVARBINARY Byte getBytes()
33JDBC GetXXX() Methods
SQL data type Java Type GetXXX()
DATE java.sql.Date getDate()
TIME java.sql.Time getTime()
TIMESTAMP Java.sql.Timestamp getTimeStamp()
34Large Object Handling
- Large binary data can be read from a resultset as
streams using - getAsciiStream()
- getBinaryStream()
- getUnicodeStream()
ResultSet rs stmt.executeQuery(SELECT IMAGE
FROM PICTURES WHERE
PID 1223)) if
(rs.next()) BufferedInputStream gifData new
BufferedInputSteam(
rs.getBinaryStream(IMAGE)) byte buf
new byte41024 // 4K buffer int len while
((len gifData.read(buf,0,buf.length)) ! -1)
out.write(buf, 0, len)
35JDBC Metadata
- There are also methods to access the metadata
associated with a resultSet - ResultSetMetaData rsmd rs.getMetaData()
- Metadata methods include
- getColumnCount()
- getColumnLabel(col)
- getColumnTypeName(col)
36JDBC access to MySQL
- The basic JDBC interface is the same, the only
differences are in how the drivers are loaded
public class JDBCTestMysql public static
void main(java.lang.String args) try
// this is where the driver is loaded
Class.forName("com.mysql.jdbc.Driver").newInstance
() catch (InstantiationException i)
System.out.println("Unable to load driver
Class") return catch
(ClassNotFoundException e)
System.out.println("Unable to load driver
Class")
37JDBC for MySQL
try //All DB access is within the
try/catch block... // make a connection to
MySQL on Dream Connection con
DriverManager.getConnection(
"jdbcmysql//localhost/ (this
is really one line) MyDatabase?userMyLogin
passwordMySQLPW") // Do an SQL
statement... Statement stmt
con.createStatement() ResultSet rs
stmt.executeQuery("SELECT NAME FROM DIVECUST")
- Otherwise everything is the same as in the Oracle
example - For connecting to the machine you are running
the program on, you can use localhost instead
of the machine name
38Demo JDBC for MySQL
- Demo of JDBC code on Harbinger
- Code is available on class web site
39Lecture Outline
- Review
- Object-Relational DBMS
- OR features in Oracle
- OR features in PostgreSQL
- Extending OR databases (examples from PostgreSQL)
- Java and JDBC
- Introduction to Data Warehouses
40Overview
- Data Warehouses and Merging Information Resources
- What is a Data Warehouse?
- History of Data Warehousing
- Types of Data and Their Uses
41Problem Heterogeneous Information Sources
Heterogeneities are everywhere
Personal Databases
World Wide Web
Scientific Databases
Digital Libraries
- Different interfaces
- Different data representations
- Duplicate and inconsistent information
Slide credit J. Hammer
42Problem Data Management in Large Enterprises
- Vertical fragmentation of informational systems
(vertical stove pipes) - Result of application (user)-driven development
of operational systems
Sales Planning
Suppliers
Num. Control
Stock Mngmt
Debt Mngmt
Inventory
...
...
...
Sales Administration
Finance
Manufacturing
...
Slide credit J. Hammer
43Goal Unified Access to Data
Personal Databases
Digital Libraries
Scientific Databases
- Collects and combines information
- Provides integrated view, uniform user interface
- Supports sharing
Slide credit J. Hammer
44The Traditional Research Approach
- Query-driven (lazy, on-demand)
Clients
Metadata
Integration System
. . .
Wrapper
Wrapper
Wrapper
. . .
Source
Source
Source
Slide credit J. Hammer
45Disadvantages of Query-Driven Approach
- Delay in query processing
- Slow or unavailable information sources
- Complex filtering and integration
- Inefficient and potentially expensive for
frequent queries - Competes with local processing at sources
- Hasnt caught on in industry
Slide credit J. Hammer
46The Warehousing Approach
- Information integrated in advance
- Stored in WH for direct querying and analysis
Slide credit J. Hammer
47Advantages of Warehousing Approach
- High query performance
- But not necessarily most current information
- Doesnt interfere with local processing at
sources - Complex queries at warehouse
- OLTP at information sources
- Information copied at warehouse
- Can modify, annotate, summarize, restructure,
etc. - Can store historical information
- Security, no auditing
- Has caught on in industry
Slide credit J. Hammer
48Not Either-Or Decision
- Query-driven approach still better for
- Rapidly changing information
- Rapidly changing information sources
- Truly vast amounts of data from large numbers of
sources - Clients with unpredictable needs
Slide credit J. Hammer
49Data Warehouse Evolution
Building the DW Inmon (1992)
Data Replication Tools
Relational Databases
Company DWs
2000
1995
1990
1985
1980
1960
1975
Information- Based Management
Data Revolution
Middle Ages
Prehistoric Times
TIME
PCs and Spreadsheets
End-user Interfaces
1st DW Article
DW Confs.
Vendor DW Frameworks
Slide credit J. Hammer
50What is a Data Warehouse?
- A Data Warehouse is a
- subject-oriented,
- integrated,
- time-variant,
- non-volatile
- collection of data used in support of management
decision making processes. - -- Inmon Hackathorn, 1994 viz. Hoffer, Chap 11
51DW Definition
- Subject-Oriented
- The data warehouse is organized around the key
subjects (or high-level entities) of the
enterprise. Major subjects include - Customers
- Patients
- Students
- Products
- Etc.
52DW Definition
- Integrated
- The data housed in the data warehouse are defined
using consistent - Naming conventions
- Formats
- Encoding Structures
- Related Characteristics
53DW Definition
- Time-variant
- The data in the warehouse contain a time
dimension so that they may be used as a
historical record of the business
54DW Definition
- Non-volatile
- Data in the data warehouse are loaded and
refreshed from operational systems, but cannot be
updated by end-users
55What is a Data Warehouse?A Practitioners
Viewpoint
- A data warehouse is simply a single, complete,
and consistent store of data obtained from a
variety of sources and made available to end
users in a way they can understand and use it in
a business context. - -- Barry Devlin, IBM Consultant
Slide credit J. Hammer
56A Data Warehouse is...
- Stored collection of diverse data
- A solution to data integration problem
- Single repository of information
- Subject-oriented
- Organized by subject, not by application
- Used for analysis, data mining, etc.
- Optimized differently from transaction-oriented
db - User interface aimed at executive decision makers
and analysts
57 Contd
- Large volume of data (Gb, Tb)
- Non-volatile
- Historical
- Time attributes are important
- Updates infrequent
- May be append-only
- Examples
- All transactions ever at WalMart
- Complete client histories at insurance firm
- Stockbroker financial information and portfolios
Slide credit J. Hammer
58Warehouse is a Specialized DB
- Standard DB
- Mostly updates
- Many small transactions
- Mb - Gb of data
- Current snapshot
- Index/hash on p.k.
- Raw data
- Thousands of users (e.g., clerical users)
- Warehouse
- Mostly reads
- Queries are long and complex
- Gb - Tb of data
- History
- Lots of scans
- Summarized, reconciled data
- Hundreds of users (e.g., decision-makers,
analysts)
Slide credit J. Hammer
59Summary
Business Information Guide
Business Information Interface
Data Warehouse
Data Warehouse Catalog
Data Warehouse Population
Operational Systems
Enterprise Modeling
Slide credit J. Hammer
60Warehousing and Industry
- Warehousing is big business
- 2 billion in 1995
- 3.5 billion in early 1997
- Predicted 8 billion in 1998 Metagroup
- Wal-Mart is said to have the largest warehouse
- 1000-CPU, 583 Terabyte, Teradata system
(InformationWeek, Jan 9, 2006) - Half a Petabyte in warehouse (Ziff Davis
Internet, October 13, 2004) - 1 billion rows of data or more are updated every
day (InformationWeek, Jan 9, 2006) - Some Government and Scientific database are
larger, however
Slide credit J. Hammer
61Other Large Data Warehouses
- Not including Wal-Mart and Ebay
(InformationWeek, Jan 9, 2006)
62Types of Data
- Business Data - represents meaning
- Real-time data (ultimate source of all business
data) - Reconciled data
- Derived data
- Metadata - describes meaning
- Build-time metadata
- Control metadata
- Usage metadata
- Data as a product - intrinsic meaning
- Produced and stored for its own intrinsic value
- e.g., the contents of a text-book
Slide credit J. Hammer
63Next Time
- More on Data Warehouses
- Introduction to data mining