Title: Style Report Analytic Edition Product Demo
1Data Mashups Defined and the Differences from
Traditional Data Integration Approaches
Byron Igoe Product Manager InetSoft Technology
for the Minnesota Chapter of The Data Management
Association
2Presentation Outline
- Traditional Data Integration
- ETL EII
- Spreadmarts
- Meaning and Origins of Data Mashup
- In-Memory Data Federation
- Combining Formal and Informal Data Sources
- Differences from Traditional Techniques
- Data Management and Data Mashup
- Data Warehousing
- Meta Data
- Data Governance
- Enterprise Content Management
- Data Modeling
3Traditional Data Integration ETL
- Extract, Transform and Load
- a well-understood convention for preparing data
for analysis - reasons for being
- reorganization
- conversion
- cleansing
- mapping
- pre-calculations of business metrics
- transformations
- aggregations
- save processing resources during analyses
- ensure data quality
4ETL (continued)
- Data warehousing trends
- growth in number of data sources
- range of 3 to 30 official data sources
currently - users desire to use data sources discovered via
the Web - using reports or feeds from vendors partners
- growth in data¹
- Annual global data production 5 exabytes
- 5,000,000,000,000,000,000 18 zeroes
- Equivalent of 37K US Libraries of Congress
- Almost 1 GB per person on earth
- Growing at 30 per year
- 1 zetabyte by 2010 21 zeroes
- what are the data sizes and growth rates at your
enterprise?
4
¹Source UC Berkeley study, 2003
5ETL (continued)
- Limitations and challenges of traditional ETL
data warehousing - cumbersome to add data sources
- bottleneck for ever increasing user demands
- overkill for some data sources, especially
transient ones - rigidity of business metric definitions
- inflexibility to process changes
- lag in data availability
5
6Traditional Data Integration EII
- Enterprise Information Integration
- same principle as ETL, creating a single data
source from many - arose from data warehouses limitation of data
timeliness - difference from data warehousing a virtual data
warehouse - benefits
- data is "real-time"
- more adaptable to changes in definitions/processes
- limitations
- bottlenecks and slow turnaround time to
incorporate changes to definitions and processes - still relies on IT efforts to respond to demands
6
7Spreadmarts
- The bane of the business intelligence
specialist! - the use of spreadsheets to store copies of
enterprise data - arose from users frustrations with
- lack of any business intelligence front-end
application, or - too-hard-to-use versions of early (and some
current) applications - graphical charting limitations of a BI app
- tedious change request form processes
- slow turnaround times to change requests
- not having a way to bring in external data
7
8Spreadmarts (continued)
- The current position in business intelligence
- now BI vendors and enterprises are learning to
accept the spreadsheet as a very user-friendly
tool - but still aim to reign in the use of spreadmarts
per se because they are - error prone
- institutionalizing labor inefficiency
- can become corrupted
- have data size limitations
- are not ideal for sharing
- knowledge is locked up
- dont have governance controls
- violate Sarbanes-Oxley requirements
- in search of the right solution
8
9Meaning and Origins of Data Mashup
- A mashup is the creation of a new work from two
sources that were not initially designed to be
combined" - first used in music in the early 00s,
especially rap music - next used in Web 2.0 environment, especially Web
portals, like My Yahoo - next entered enterprise application space,
limited to screen scraping - now we define data mashup as data
transformation and integration that can be done
by users with minimal skills - examples
- joining two datasets that werent previously
combined - creating a new business metric on the fly
- importing external or user-created data
9
10The Differences from Traditional Techniques
- its the middle ground between "IT controlled"
and "User defined - collaboration" is born
- in the traditional models, IT defines how
multiple sources are connected - painstaking process especially for mergers,
process changes, etc. - with data mashup, the connections are created on
the fly
10
11The Business Case Benefits of Mashups
- Higher ROI on BI investment
- higher success rate of deployment due to higher
- end-user satisfaction
- usage rates
- adoption rates
- greater number of actionable learnings leading
to - more sales and/or
- greater efficiency
- increased speed of
- decisions
- competitive responses
- reactions to customer feedback
11
12The Business Case Benefits of Mashups
- Lower TCO
- reduced personnel needed to support a BI
solution - end-user self-service
- save on change request processes
- save on manpower to code requests
- reduce report request backlog
- reduced number of highly-skilled analysts or
DBAs needed to satisfy business demands - end-users meet their own needs more often
12
13The Advent of In-Memory Data Federation
- Moores law, increasing power, lower costs of
CPU memory allow in-memory transformation,
pre-aggregation and caching - Enables data mashup as well
13
14The Trade-offs of these Techniques
Technique Development Time Development Skill Latency Performance Adaptability
ETL high high high high low
Data Federation high high low medium low
Spreadsheet low low high low high
Data Mashup low low low medium high
14
15Combining Formal and Informal Data Sources
- how a data mashup works
- similar to what a user is doing in Excel
- creating new formulas
- bringing in external data
- doing what-if scenarios
- live connections to the enterprise sources are
maintained - data mashup "refreshes" automatically on each
use - can save it to a shared folder for re-use and
collaboration
15
16Data Management and Data Mashup
- Relative to Data Warehousing
- data mashups can be seen as an expedient
alternative to data warehousing is some cases - data mashup can be a precursor to data
warehousing - allows quick and inexpensive experimentation
- when satisfied, codify the mashup into a data
warehouse for performance benefits
16
17Data Management and Data Mashup
- Relative to Impact on Pre-Aggregation
- pre-aggregation improves downstream processing
- with many traditional techniques
- pre-aggregations are designed before reports and
dashboards - usage of pre-aggregated data is explicit
- in the data mashup model, pre-aggregation can be
built into the engine
17
18Data Management and Data Mashup
- Importance of Meta Data
- creation of mashups depend on meta data data
type compatibility - transformation options, like grouping and
aggregation, differ based on the field type
18
19Data Management and Data Mashup
- Relative to Data Governance
- data mashups are a major improvement over
spreadmarts - data quality is enhanced
- live data is used
- no copying pasting
- changes to master data mappings take effect
immediately - data security is enhanced
- security defined at source system level
- all derived mashups automatically secured
- overcome limitations of Excels security
- concern is it giving too much power to users?
- no different than what users will do inevitably
in Excel
19
20Data Management and Data Mashup
- Relative to Enterprise Content Management
- data mashups are re-usable shareable
- data integrity is always maintained
- more easily embedded in other applications,
portals
20
21Data Management and Data Mashup
- Relative to Data Modeling
- data mashups situated on top of various data
sources - data mashups can use
- physical tables
- pre-defined SQL, or
- logical models
21
22Questions and Discussion
22