Title: Information Resources Management
1Information Resources Management
2Agenda
- Administrivia
- Object-Oriented Databases
- Data Warehousing
- Data Mining
- SQL Extensions
- XML
3Administrivia
- Homework 8
- Homework 9
- Current Scores
- Final Review Session?
4OODBMS vs. ORDBMS
- OODBMS - Object-Oriented
- ORDBMS - Object-Relational
- Appendix A
5OODBMS
- Persistent Objects
- By class
- By creation
- By marking
- By reference
- Storage/Retrieval Methods
6OODBMS - Benefits
- Match
- Programming
- Methodology
- Data types structures
- Ease of programming
- Inheritance
7OODBMS - Challenges
- Standards
- ODMG - Object Database Management Group
- Performance
- Database vs. persistent language
- Loss of integrity, queries
- Storage Space
- Maturity
8ORDBMS
- Extensions to relational model
- Complex data types
- Inheritance
- References
- Migration path
- Use existing applications and knowledge base
9ORDBMS - Benefits
- SQL
- Existing Systems
- Vendors
10ORDBMS - Challenges
- Standards
- Fit with the development language
- Programming Complexity
11Using a relational database to store data from an
object-oriented system has been likened to
parking your car in your garage. With an OODBMS
you park the car in the garage. If a (O)RDBMS is
used, to park your car in the garage, you must
first completely disassemble it and put each part
in its specific location on a shelf. This
process must then be reversed the next time you
want to go for a drive.
12OODBMS/ORDBMS Products
13OODBMS/ORDBMS Products
14Other Links
- Object Database Management Group
- www.odmg.org
- Object Database Newsgroup
- comp.databases.object
15Data Mining
- Corporations have collosal amounts of data
- Usually only used for very specific purposes
(operations) - Automated attempt to learn from the data
- Find statistical rules and patterns in the data
- Example Giant Eagle Advantage Card
16Goals of Data Mining
- Explanatory - Why?
- Confirmatory - Is it?
- Exploratory - ???
17Approaches to Data Mining
- Classification
- identify rules that create groups
- Association
- find related conditions or events
- Correlation
- relationships between values
- User Guided
- hypothesis driven
- Automatic
- data driven - AI based
18Data Warehouse
- A subject-oriented, integrated, time-variant,
nonvolatile collection of data - Usually all data for a corporation
- Multidimensional database
19Data Warehousing
- Single location
- Long-term storage
- Greater availability
- Separate data processing from day-to-day
operations (performance) - All data is historical
- Support data mining, et al.
20Data Warehousing Questions
- What data needs to be kept?
- Where is it from?
- How good is it?
- How long should it be kept?
- Can it be summarized? When?
- Will it make sense? What is the schema?
- When is it updated?
21Data Warehousing - Benefits
- Support for decision making tools
- DSS, EIS, Data Mining
- Separation of information and day-to-day
processing - Unification - Centralization
- Improved quality and consistency
22Data Warehousing - Challenges
- Costs Storage, Setup, Maintenance
- Historical data issues
- Defining the warehouse schema
- Doing the conversion
- Implementation every time
- Keeping up with operational system changes
- Answering the questions
23Multidimensional Databases
- Two views
- Multidimensional tables
- Star schema
- Multidimensional table
- each cell is attribute
- dimensions are interesting categories
24Multidimensional Table
- Cell - sales
- Dimensions
- day
- person
- store
- item
25Star Schema
- Multiple tables
- Central table - data item (cell)
- Surrounding tables - information about each
category (dimensions)
26Star Schema
Person
Day
Sales
Store
Item
27Star Schema
- Sales (Day, Person, Store, Item, sales)
- Day (Day, day info)
- Person (Person, person info)
- Store (Store, store info)
- Item (Item, item info)
28Building/Maintaining a Data Warehouse
- 1. Capture
- 2. Scrub
- 3. Transform
- 4. Load and Index
29Data Marts
- Making specific data available
- Different ones for different needs
DM1
DW
Operational Systems
DM2
30Data Mining
- Corporations have collosal amounts of data
- Usually only used for very specific purposes
(operations) - Automated attempt to learn from the data
- Find statistical rules and patterns in the data
- Example Giant Eagle Advantage Card
31Goals of Data Mining
- Explanatory - Why?
- Confirmatory - Is it?
- Exploratory - ???
32Approaches to Data Mining
- Classification
- identify rules that create groups
- Association
- find related conditions or events
- Correlation
- relationships between values
- User Guided
- hypothesis driven
- Automatic
- data driven - AI based
33Data Mining - Benefits
- Use data
- Learn new things
- Improve decision making
34Data Mining - Challenges
- Time (human and/or computer)
- Spurious results
- Separating the wheat from the chaff
- Availability of data
- Amount of data
- Changes in tools and technologies
- Validity over time
35Enhanced Data Analysis
- Beyond SUM, COUNT, and AVG
- SQL extensions (suggested)
- GROUP BY AS PERCENTILE
- Specific percentiles
- GROUP BY WITH CUBE
- Cross-tabulations
- Statistical package interface
- SAS, S, others
36Enhanced Data Analysis - Benefits
- Greater functionality
- Improved decision making
37Enhanced Data Analysis - Challenges
- Lack of standards
- Understandability
- Processing requirements
- Cost of poorly written queries
- ad hoc queries arent reviewed
38Extending Relational DBs
- Spatial and Geographic Databases
- Multimedia Databases
- Changing the data stored while retaining the
benefits of relational databases
39Spatial Geographic DBs
- Spatial - CAD
- Geographic - GIS
- Similar issue
- How to store and retrieve such data
40Spatial Databases
- Geometric objects (2 or 3 dimensions)
- Locations
- Connections
- Nonspatial information about each object
- Substructures
- Spatial integrity constraints
- Two things cant occupy the same space
41GIS Databases
- Raster Data (fractal data)
- Pictures - possibly over time
- Maps
- Vector Data
- Locations
- Connections
- Nongeographic information
42Spatial Geographic DB -Benefits
- DBMS
- Specialized queries
- Spatial Geographic Data
- Standard Data
- Mix of the two
- Integrity constraints
43Spatial Geographic DB - Challenges
- Space requirements
- Level of detail
- Understandability - Complexity
- Processing requirements
- Compatibility between systems
- Lack of standards
44Multimedia Databases
- Images, Audio, Video
- Nonmultimedia data (text) about each
- Database Enhancements
- BLOBs (Binary Large Objects)
- Similarity-based queries
- Guaranteed steady rate
- Synchronization of audio and video
45Multimedia Databases - Benefits
- DBMS
- Greater compression may be possible
- Paperless office - document imaging
- Workflow redesign - improvements
- Greater availability
46Multimedia Databases - Challenges
- S T O R A G E
- Specialized DBMS
- Unity of database and network
- Usually requires ATM
- Specialized hardware
- juke boxes
- optical disks
47XML
- What is it?
- What isnt it?
- What are the goals?
- Who controls it?
- Whos using it?
- Beyond XML
48What is XML?
- eXtensible Markup Language
- Markup language for structured information
- structured - content role of that content
- markup - identify structures
- meta language for describing markup languages
49Huh?
- Storing structured data in a text file
- spreadsheet, address book, transactions (think
EDI) - Looks like HTML, lttagsgt, but isnt
- Text is universal, but not efficient
- Does disk space matter?
- What about network capacity?
- XML is license-free platform-independent
50What XML isnt
- HTML
- SGML - Standard Generalized Markup Language -
printing - Limited to current definitions (tags)
- XML is the way to add new definitions
- A relational database management system
- A database, or is it?
51Goals of XML
- Easy to use over Internet
- Wide variety of applications
- Compatible with SGML (subset)
- Easy to write programs that use XML documents
- No (or few) optional features
- Human-legible if necessary
52Goals of XML (2)
- Standards developed quickly
- Formal and concise
- Easy to create documents
- No need for shortcuts
53Who Controls XML?
- W3 Consortium
- www.w3.org/XML
- XML 1.0 specification
54Whos Using XML?
- Financial Products Markup Language
- FpML
- FpML.org
- A standard for financial derivatives
business-to-business e-Commerce - Others?
55Beyond XML
- Xlink - hyperlinks in XML
- XPointer Xfragments - point to parts of an XML
document - CSS - style sheet language
- XML and HTML
- XSL - advanced language for style sheets
- XSLT - XSL transformation language
56Beyond XML (2)
- DOM - standard function calls for manipulating
XML (and HTML) from programs - XML Namespaces - link a URL with every tag and
attribute - XML Schemas 1 2 - help in precisely developing
own XML-based formats
57Homework 10
- Last One! (No HW 11)
- Research and evaluate products
- 100 points
58Final
- Next Tuesday, 5/1
- Approximately 1/3 from 4/3 - 4/24
- Remainder - comprehensive
59