Title: Edward Trettel
1Contact Info
- Edward Trettel
- Northwest Airlines, Inc.
- edd.trettel_at_nwa.com
- edd.trettel_at_comcast.net
- 612-726-7434 (w)
- 763-780-3941 (h)
- 763-438-6244 (m)
2Click to Add Title
FORECASTING DATABASE DISK SPACE REQUIREMENTS A
POOR MANS APPROACH
- THE UNSUCCESSFUL
- SELF-TREATMENT OF
- A CASE OF
- "WRITER'S BLOCK"'
3 Personality Inventory
- Which Shape do you prefer?
Linear thinker. Precise. Accurate. Analytical.
Tactical.
Holistic thinker. Creative. Artistic.
Strategic.
Obsessed with Queuing Theory
4- Northwest Airlines is the world's fourth largest
airline - With its global travel partners, NWA serves over
- 750 destinations
- In 120 countries
- On 6 continents.
- The U.S. system spans 49 states and DC
- Hub Cities
- Amsterdam
- Detroit
- Memphis
- Minneapolis/St. Paul
- Tokyo
5 Necessity, who is the mother of
invention.Plato, The Republic
- NWAs Distributed Database Environment
- Sybase, Oracle, UDB, MS SQL Server
- 200 Database Instances
- 400 Databases
- Hosting Operating Systems
- Sun Solaris
- IBM Aix
- Windows Server
6The Problem
- 1,660 pager events per year on distributed
database issues. - 1,041 (62) of these were for databases exceeding
their 95 disk space full limits. - At 100 full the database stops processing.
- Management of these 400 databases disk space
needs was being done in a reactive, day-to-day
manner by staff looking at the individual values
inside each databases, using DBMS-specific
interfaces.
7 Databases, Tablespaces, andFile Systems
File System
Tablespace
8Scope of the Problem
- 200 Database Instances
- Hosting 400 Databases
- Consisting of 2,800 tablespaces
- Made up of 5,100 OS files
- These resources were managed reactively using
good ol IEB-eyeball
9The SolutionOctober 2000 to Present
- Gather a small number of database disk space size
metrics from each of these databases on an
automatic, daily, unattended basis and put them
into a database. - Apply regression analysis techniques to see if
there were any consistent growth (or decline)
rates over the course of a year. - Create forecasts on a per data holder basis,
beaming out six months in the future. - Leverage other descriptive statistics as well.
- Provide for multi-dimensional analyses.
10 The Data Holder Concept
- Introduced as a a common construct across the
disparate Sybase, Oracle, and UDB architectures. - Refers to
- Tablespaces (and datafiles, containers) in
Oracle and UDB - Database devices and databases in Sybase.
11The Collected Data
- The date and time of the collection
- The instance name
- The DBMS type (Sybase, Oracle, UDB)
- The tablespace or database name
- The number of bytes_allocated to this tablespace
or database - The number of bytes_free in this tablespace or
database - The number of bytes_used (derived as the
difference between the number allocated and
number free) - (All gathered into a single table in a database)
12The Collectors
- Straightforward SELECT statement
- Gather size information from an instances data
dictionary or system catalog - Zero-maintenance. Have run daily and unattended
for 6 years - Simplicity
13Regression Analysis
- Used MS OLAP Services
- Vetted time against Bytes Used on a per data
holder, per instance basis. - Used a years worth of daily observations.
- y mx b
- Bytes Used slope(date) constant
- Where will we be six months from now?
- y' m(current date180) b
- Will we have a surplus or a deficit in disk space
then? - Current Date Bytes Allocated - y'
- 100GB - 87GB 13GB (surplus)
- 78GB - 87GB -9GB (deficit)
14 Benefits of Having the Data
- A Number of Derived Measures
- Bytes Used(bytes_allocated bytes_free)
- Percent Used(Bytes_used/bytes_allocated)100
- Percent Free(bytes_free/bytes_allocated)100
15Correlation
- The Pearson product moment correlation
coefficient (R2) - Values ranged all over the place from 0 to 1.
- Since this number equates to the percent of the
variance observed in the dependent variable
(bytes used) thats accounted for by the
independent variable (time), we were able assess
the reliability (and usability) of our forecasts.
16Beyond ForecastingAdditional Insights Provided
by the Data
- Pivot tables of Time (along the x-axis) vs. Bytes
Used (along the y-axis) were constructed along
these dimensions - DBMS Name (Sybase, Oracle, UDB)
- Instance Name
- Data Holder Name
- This permitted dicing-and-slicing the data in a
number of ways.
17Questions Askedand Answered
- Whats the pattern of bytes_used over the past
year for - All Oracle instances?
- All Sybase instances?
- All UBD instances?
- Oracle and Sybase combined?
- Oracle and UDB combined?
- Sybase and UDB combined?
- Sybase and Oracle and UDB combined?
18Questions Askedand Answered
- Whats the pattern of bytes_used over the past
year for - Any individual instance?
- Any combination of instances? (Note this also
permits any combinations of instances of
interest, regardless of the DBMS thats hosting
them.)
19Questions Askedand Answered
- Whats the pattern of bytes_used over the past
year for - Any individual data holder? (Note that one must
enter an instance name for this to be meaningful.
Otherwise it would show the total value for all
data holders that have that name, regardless of
the instance name.) - Any combination of data holders?
20 Pivot Chart of Bytes Used
21 Pivot Chart of Percent Used
22Benefits
- With this data now published on a regular monthly
basis to the intranet, the consumers of it have
gained considerable insights into the seasonal
and other variations in their data usage
patterns. - The work group which is responsible for acquiring
disk space for the entire IS organization can now
set realistic budget values for next years disk
space requirements, based upon the higher level
rollups of the bytes_used data. - Pager call reduction the 1,041 pages that were
previously issued per year for database disk
space problems dropped to only a handful.
23 Benefits The Sequel
- The rates of growth of the various applications
or business systems at the organization were now
quantified and published. This allowed the IT
organization to compare those rates between
applications, year-over-year, etc. - The organization can now identify any anomalous
rates that might indicate that an application
change (intended or not) or business driver
variation was having a significant impact on the
rate at which data was being accrued in a
database. - Descriptive statistics can be compared between
data holders to better understand their central
tendencies and dispersion characteristics.
24- Statistics is the grammar of science.
- Karl Pearson
- British mathematician and statistician
- (1857 - 1936)
25Questions ?
- Thank you !!!
- Pizza !!!!!!!!!