Title: An Outlier Detection Methodology with Consideration for an Inefficient Frontier
1An Outlier Detection Methodology with
Consideration for an Inefficient Frontier
- By
- Andy Johnson
- Leon McGinnis
2Outline
- Background and motivation
- Current two-stage DEA method
- Proposed changes to two-stage DEA
- Definitions Relative to an Inefficient Frontier
- Leave-one-out method / threshold value
- Iterative Outlier Detection
- Second Stage Bootstrap
- Summary
3Background
- Outlier detection in a non-parametric framework
is important because many of these methods do not
consider measurement error or random fluctuations
when constructing a frontier - Thus allowing over stated data to be included in
the reference set can bias not only one
efficiency estimate, but several efficiency
estimates if the over stated observation is used
to construct the frontier
4Motivation
- The iDEAs project for warehouse performance
benchmarking - On-line data collection requires more scrutiny
than data collected and analyzed by a single
person - What could be the/a cause of a negatively skewed
efficiency distribution
5Motivation
- To investigate the impact of environmental
characteristics on efficiency the two-stage DEA
method has been developed - A data set is identified as using a similar
technology - In the first stage efficiency estimates are
calculated - In the second stage the estimates are regressed
against environmental characteristics - In this setting over stated observations are a
problem in the first stage, but both overstated
and understated observations are a problem in the
second stage
6Current Two-stage DEA Method when Outlier
Detection is Considered
- Consider the problem of understanding sources of
inefficiency - Identify outliers relative to the efficient
frontier and remove them - Calculate efficiency estimates
- Regress efficiency estimates against
environmental variables
7An Outlier Detection Methodology with
Consideration for an Inefficient Frontier
Output
input
8An Outlier Detection Methodology with
Consideration for an Inefficient Frontier
- A proposed improvement on the current method
requires an outlier methodology - First, identify outliers relative to both the
efficient and inefficient frontier - Use a two-stage DEA method where DEA estimates
are calculated in the first stage and regressed
against environmental variables in the second
stage - Use bootstrapping in the second stage to deal
with the problem of correlation among the error
terms
9An Outlier Detection Methodology with
Consideration for an Inefficient Frontier
output
input
10Definitions Relative to an Inefficient Frontier
- The production possibility set when the
inefficient frontier is included - Shephards input inefficient distance function
can be defined as - Shephards output inefficient distance function
can be defined as
11Inefficiency Measures Relative to an Inefficient
Frontier
- The Single-Output Inefficient Production
Frontiers and the Measure of Technical Efficiency - An input oriented inefficiency measure relative
to an inefficient frontier as - An output-oriented measure of inefficiency
relative to an inefficient frontier is given by
the function
12Definition of an Inefficient Frontier
- The Multiple-Output Inefficient Production
Frontiers and the Measure of Technical Efficiency
- The inefficient frontier with respect to the
subset X(y) can be found by
13Linear Program for Calculating Inefficiency
- The Multiple-Output Inefficient Production
Frontiers and the Measure of Technical Efficiency
- The inefficiency estimate calculated from the
input perspective can be found by solving the
following linear program
14Definition of an Inefficient Frontier
- The Multiple-Output Inefficient Production
Frontiers and the Measure of Technical Efficiency
- The inefficient frontier with respect to the
subset Y(x) can be found by
15Linear Program for Calculating Inefficiency
- The Multiple-Output Inefficient Production
Frontiers and the Measure of Technical Efficiency
- The inefficiency estimate calculated from the
output perspective can be found by solving the
following linear program
16Outlier Detection
- As suggested by Simar 2003 an outlier needs to be
identified by both an input and an output
oriented detection method
17Leave-One-Out DEA Linear Program
- The leave-one-out DEA inefficiency estimate is
18Threshold Value for Identifying Outliers
Input oriented Threshold Value
the worst observation or convex combination of
bad observations in the reference set excluding
the observation under evaluation can produce the
same level of output as the given observation
using half the inputs
Output oriented Threshold Value
the worst observation or convex combination of
bad observations in the reference set excluding
the observation under evaluation can use the same
level of input as the given observation and
produce twice the output
The reciprocal of the input threshold
19Iterative Outlier Detection
- Identify outliers based on agreement between both
the input and output orientation detection method - Remove identified outliers
- Rerun outlier detection method
- Continue for the number of outlier detected is
below some limit or for a set number of iterations
20Bootstrapping method for the second stage of the
two-stage DEA method
- Necessary because of the correlation among error
terms in the second stage regression - Sample n observations, call this set b, with
replacement from the set of input/output data - Calculate efficiency estimates for each of the
original n observations relative to the set b - Repeat these two steps 2000 times
- Construct confidence intervals and bias estimates
for each of the original n efficiency estimates
21Second Stage Bootstrap
- Bootstrapping method for the second stage of the
two-stage DEA method - Because of bias present, the corrected efficiency
estimate - Confidence interval estimates
22Banker and Morey Data
- When outliers are only determined based on the
efficient frontier, the finding is population has
no impact on efficiency - When outliers are determined based on an
efficient and an inefficient frontier, the
finding is population is negative correlated with
efficiency at the 95 confidence level
23Summary
- By not considering an inefficient frontier the
second stage results of the two-stage method are
biased. This bias is not corrected for by the
bootstrapping method. - Developed a more comprehensive outlier detection
methodology for non-parametric efficiency methods
24Thank You