Title: SEED Center for Data Farming Overview
1SEED Center for Data Farming Overview
- Tom Lucas and Susan Sanchez
- Operations Research Department
- Naval Postgraduate School
- Monterey, CA.
Mission Advance the collaborative development
and use of simulation experiments and efficient
designs to provide decision makers with timely
insights on complex systems and operations
2Simulation studies underpin many DoD decisions
- DoD uses complex, high-dimensional, simulation
models as an important tool in its
decision-making process. - Used when too difficult or costly to experiment
on real systems - Needed for future systemswe shouldnt wait until
theyre operational to decide on appropriate
capabilities and operational tactics, or evaluate
their potential performance - Investigate the impact of randomness and other
uncertainties -
-
Many complex simulations involve hundreds or
thousands of factors that can be set to
different levels.
3Design of experiments
- An experimental design is the complete
specification of input settings and runs - The choice of design constrains the information
we can extract from the model - A table (matrix) of factor levels describes the
design - Each column corresponds to a factor (input
variable) - Each row to a design point (combination of
factors settings)
2100 is foreverGeneral Jasper Welch
4A simple example
- Without examining multiple factors
simultaneously, we - Limit the insights possible (cant look for
interactions - places where interesting things
happen for specific combinations of factors) so - only tell part of the story
- Less chance for surprises
Ex which is more important, stealth or
range?
Ex suppose your factors include
fuel, air and spark. Youll NEVER find
fire by examining only two at a time.
excursions from base case wouldnt show anything
5The traditional view
Philosophy The three primary objectives of
computer experiments are (i) Predicting the
response at untried inputs, (ii) Optimizing a
function of the input factors, or (iii)
Calibrating the computer code to physical
data. --Sacks, Welch, Mitchell, and Wynn
(1989)
For many (military) applications, these can be
problematic!
- Approach
- Limit yourself to just a few factors or scenario
alternatives - Fix all other factors in the simulation to
specified values - At each design point, run the experiment a small
number of times (once for deterministic
simulations)
The purpose of computing is insight, not
numbersHamming
6The new view
- We contend that appropriate goals are
- (i) Developing a basic understanding of a
particular model or system - seeking insights into high-dimensional space.
- identifying significant factors and interactions.
- finding regions, ranges, and thresholds where
interesting things happen. - (ii) Finding robust decisions, tactics, or
strategies - (iii) Comparing the merits of various decisions
or policies Kleijnen, Sanchez, Lucas Cioppa
2005
Models are for thinking Sir Maurice Kendall
Once you have invested the effort to build (and
perhaps verify, validate accredit) a
simulation model, its time to let the model work
for you!
7These goals mean fewer assumptions...
- Traditional DOE
- Assumptions
- Small/ moderate of factors
- Univariate response
- Homogeneous error
- Linear
- Sparse effects
- Higher order interactions negligible
- Normal errors
- Black box model
- Assumptions for Defense
- Homeland security Simulations
- Large of factors
- Many output measures of interest
- Heterogeneous error
- Non-linear
- Many significant effects
- Significant higher order interactions
- Varied error structure
- Substantial expertise exists
The idea behind Monte Carlo simulationis to
replace theory by experiment whenever the
former faltersHammersley and Handscomb
We use simulations to avoid making Type III
errorsworking on the wrong modelW. David Kelton
8...that, in turn, call for different designs
We have focused on Latin hypercubes
and sequential approaches
Efficient R5 FF and CCD
Factorial (gridded) designs are most familiar
9Choosing an experimental design
- Plant the seeds for successful data farming
- Explore landscapes by running experiments while
varying factors - Sequential process, human in the loop to
interpret results, plan new experiments - Where you should plant depends on what you want
to harvest! - For developing an understanding, you may wish to
identify - important factors e.g., fractional factorial or
sequential screening - what factors matter? (when interactions are NOT
sizeable) - interactions e.g., higher resolution fractional
factorial - are stealth and range synergistic?
- quadratic effects e.g., central composite
- does increasing USVs have diminishing returns?
- thresholds, change points, and robust regions
Latin hypercubes - What does the landscape look like? What decision
factors, interactions, and higher-order terms
matter?
We seek designs that allow one to fit a
variety of models and provide information about
all portions of the experimental
region. --Santner, Williams, and Notz (2003)
10An all-purpose experimental design
- We have found Latin hypercubes a very good
all-purpose design, particularly when factors are
quantitative and there is considerable a priori
uncertainty about the response, because of - Efficiency
- Space-filling (if we look at any group of
factors, well find a variety of combinations of
levels) - Design flexibility
- few restrictions on factors, levels, sampling
budget - Analysis flexibility
- good at screening
- many cameras on the landscape
- accidental VV A
- allow you to fit many different types of complex
metamodels to multiple MOEs - Orthogonal, nearly-orthogonal, and space-filling
Latin hypercubes have advantages in fitting
landscapes to the data
11So, what is a Latin hypercube?
A 6-run, 2 factor design
- In its basic form, each column in an n-run,
k-factor LH is a permutation of the integers
1,2,,n - The n integers correspond to levels across the
range of the factor - For exploratory purposes, we use a uniform spread
over the range (but may round to integer values) - slightly different designs arise if you force
sampling at the low and high values
Pairwise projection
5 15 25 35 45 55
Factor 2
0 12 24 36 48 60
Low 0
High 60
Factor 1
12Nearly orthogonal and space-filling Latin
hypercubes
- The pairwise projections for a 17-run, 7-factors
orthogonal LH show - Orthogonality (no pairwise correlations)
- space-filling behavior (points fill the
sub-plots) - 17 total runs!
13Other possibilities
- Very large resolution V fractional factorials and
central composite designs - Standard DOE literature 211-3
- New an easy way to catalogue and generate up to
2443-423 - Two-phase adaptive sequential procedure for
factor screening - New procedure that requires fewer assumptions,
improves efficiency - Frequency domain experiments
- Naturally samples factors at coarser/finer levels
- Crossed/combined designs to identify robust
decision factor settings
14Our portfolio of designs
- Kleijnen, J. P. C., S. M. Sanchez, T. W. Lucas,
and T. M. Cioppa, A Users Guide to the Brave
New World of Designing Simulation Experiments,
INFORMS Journal on Computing, Vol. 17, No. 3,
2005, pp. 263-289. - Cioppa, T. M. and T. W. Lucas, Efficient Nearly
Orthogonal and Space-filling Latin Hypercubes,
Technometrics, forthcoming. - Sanchez, S. M. and P. J. Sanchez, "Very Large
Fractional Factorials and central composite
designs," ACM Transactions on Modeling and
Computer Simulation, Vol. 15, No. 4, 2005, pp.
362-377. - Sanchez, S. M., H. Wan, and T. W. Lucas, "A
Two-phase Screening Procedure for Simulation
Experiments," Proc. 2005 Winter Simulation
Conference, eds. M. E. Kuhl, N. M. Steiger, F. B.
Armstrong, and J. A. Joines, Institute of
Electrical and Electronic Engineers, Piscataway,
New Jersey, 2005, pp. 223-230. - Sanchez, S. M., F. Moeeni, and P. J. Sanchez, "So
Many Factors, So Little TimeSimulation
experiments in the frequency domain,"
International Journal of Production Economics,
Vol. 103, 2006, pp. 149-165.
15Other publications
- Sanchez, Lucas, Agent-based Simulations Simple
Models, Complex Analyses, Invited paper, Proc.
2002 Winter Simulation Conference, 116-126. - Lucas, Sanchez, Brown, Vinyard, Better Designs
for High-Dimensional Explorations of
Distillations, Maneuver Warfare Science 2002,
Marine Corps Combat Development Command, 2002,
17-46. - Vinyard, Lucas, Exploring Combat Models for
Non-monotonicities and Remedies, PHALANX, 35,
No. 1, March 2002, 19, 36-38. - Lucas, McGunnigle, When is Model Complexity Too
Much? Illustrating the Benefits of Simple Models
with Hughes Salvo Equations, Naval Research
Logistics, Vol. 50, April 2003, 197-217. - Lucas, Sanchez, Cioppa, Ipekci, Generating
Hypotheses on Fighting the Global War on
Terrorism, Maneuver Warfare Science 2003,
Marine Corps Combat Development Command, 2003,
117-137. - Lucas, Sanchez, Smart Experimental Designs
Provide Military Decision-Makers With New
Insights From Agent-Based Simulations, Naval
Postgraduate School RESEARCH, 13, 2, 20-21,
57-59, 63. - Lucas, Sanchez, NPS Hosts the Marine Corps
Warfighting Laboratorys Sixth Project Albert
International Workshop, Lucas, T.W. and S.M.
Sanchez, Naval Postgraduate School RESEARCH, 13,
2, 45-46. - Sanchez, Wu, Frequency-Based Designs for
Terminating Simulation Experiments A
Peace-enforcement Example, Proc. 2003 Winter
Simulation Conference, 952-959. - Brown, Cioppa, Objective Force Urban Operations
Agent Based Simulation Experiment, Technical
Report TRAC-M-TR-03-021, Monterey, CA, June 2003. - Cioppa, Brown, Jackson, Muller, Allison,
Military Operations in Urban Terrain Excursions
and Analysis With Agent-Based Models, Maneuver
Warfare Science 2003, Quantico, VA, 2003.
- Cioppa, Advanced Experimental Designs for
Military Simulations, Technical Report
TRAC-M-TR-03-011, Monterey, CA, February 2003. - Brown, Cioppa, Lucas, Agent-based Simulation
Supporting Military Analysis, PHALANX, Vol. 37,
No. 3, Sept 2004.Cioppa, Lucas, Sanchez,
Military Applications of Agent-based
Simulation, Proc. 2004 Winter Simulation
Conference. - Cioppa, Lucas, Sanchez, Military Applications of
Agent-Based Simulations, Proceedings of the 2004
Winter Simulation Conference, 171-179 - Allen, Buss, Sanchez, Assessing Obstacle
Location Accuracy in the REMUS Unmanned
Underwater Vehicle, Proceedings of the 2004
Winter Simulation Conference, 940-948. - Cioppa, An Efficient Screening Methodology For a
Priori Assessed Non-Influential Factors, Proc.
2004 Winter Simulation Conference, 171-180. - Sanchez, Work Smarter, Not Harder Guidelines
for Designing Simulation Experiments. Proc. of
the 2005 Winter Simulation Conference,
forthcoming. - Wolf, Sanchez, Goerger, Brown, Using Agents to
Model Logistics, under revision for Military
Operations Research. - Baird, Paulo, Sanchez, Crowder, Measuring
Information Gain in the Objective Force, under
revision for Military Operations Research.
16Student thesesnote the breadth of applications
- 2000 Brown (Captain, USMC)
- Human Dimension of Combat
- 2001 Vinyard (Major, USMC)
- Reducing Non-monotonicities in Combat Models,
- MORS/Tisdale Winner, MORS Walker Award
- 2002 Erlenbruch (Captain, German Army)
- German Peacekeeping Operations, MORS/Tisdale
Finalist - 2002 Pee (Singapore DSTA)
- Information Superiority and Battle Outcomes,
MORS/Tisdale Finalist - 2002 Wan (Major, Singapore Army)
- Effects of Human Factors on Combat Outcomes
- Dickie (Major, Australian Army)
- Swarming Unmanned Vehicles, MORS/Tisdale
Finalist - 2002 Ipekci (1st Lieutenant, Turkish Army)
- Guerrilla Warfare, MORS/Tisdale Winner
- 2002 Wu (Lieutenant, USN)
- Spectral Analysis and Sonification of Simulation
Data - 2002 Cioppa (Lieutenant Colonel, US Army, PhD)
- Experimental Designs for High-dimensional
Complex Models,ASA 3rd Annual Prize for Best
Student Paper Applying Stat. to Defense
- 2004 Berner (LCDR, US Navy)
- Multiple UAVs in Maritime Search and Control
- 2004 Tan (Singapore ST)
- Checkpoint Security
- 2005 Babilot (USMC)
- DO versus Traditional Force in Urban Terrain
- 2005 Bain (USMC)
- Logistics Support for Distributed Ops,
MORS/Tisdale Finalist - 2005 Gun (Turkish Army)
- Sunni Participation in Iraqi Elections
- 2005 McMindes (USMC)
- UAV Survivability
- 2005 Sanders (USMC)
- Marine Expeditionary Rifle Squad
- 2005 Ang (Singapore Technologies Engineering)
- Increasing Participation and Decreasing
Escalation in Elections - 2005 Chang (Singapore DSTA)
- Edge vs. Hierarchical Organizations for
Collaborative Tasks - 2005 Liang (Singapore DSTA)
17An environment for exploration requires
- Flexible models or tools to build them
- High-performance computing
- Experimental design (already talked about)
- Data analysis and visualization
18Agent-based distillations from Project Albert
- DistillationsFast, robust, easy to use,
transparent agent-based simulations that focus on
specific aspects of operational scenarios (only
strive to capture the essence) - Project Albert collaborators have developed six
(agent-based) distillation models which are
currently implemented in a data farming
environment - ISAAC/EINSTein Socrates NetLogo
- Pythagoras Mana PAX
4
19The Big Iron
- Hardware
- Maui High Performance Computing Center (MHPCC)
An Air Force Research Laboratory Center Managed
by the University of Hawai'i - Enabling Software
- Web-based Maui High Performance Computing Center
- Local Resources OldMcData
20Interpreting the results
- Standard statistical graphics tools (regression
trees, 3-D scatter plots, contour plots, plots of
average results for a single factors, interaction
profiles) can be used to gain insights from the
data - Step-wise regression and regression trees
identify important factors, interactions, and
thresholds
21Decision Tree Time and routing (Raffetto, 2004)
MOE Proportion of enemy classified
Most Important Factor Needs to fly over 7 hours
Think like the enemy! Rt.2 planned with intel
In either case, throw more forces/capabilities at
it next if available
22Example Regression analysis (Raffetto, 2004)
- Across the noise factors, the regression models
produce R-Square values from .906 to .921 with
seven to nine terms for 1-3 UAVs - Provides a means to compare expected effects of
different configurations - Parameter estimates are put into a simple Excel
spreadsheet GUI to allow decision makers to view
relative effects of configurations within this
scenario
23Example Interactions (Steele, 2004) camera
range and speed
- At low speeds, camera range is unimportant
- At higher speeds, camera range has big impact
- One of several technological challenges for
systems design
24Example One-way analysis (Hakola, 2004)
25Example MART (Ipekci, 2002)
Relative Variable Importance
Blue Casualties
Relative Variable Importance
Red Casualties
26Example Contour plot (Allen, 2004)
27Resources Seed Center for Data Farming
- http//harvest.nps.navy.mil
- Check here for
- lists of student theses (available online)
- spreadsheets software
- pdf files for several of our publications,
publication info for the rest - links to other resources
- updates
All models are wrong, but some are usefulGeorge
Box
28Questions?
SEED Center for Data Farming Mission Advance the
collaborative development and use of simulation
experiments and efficient designs to provide
decision makers with timely insights on complex
systems and operations. Primary
Sponsors International Collaborators A
pplications Include Peacekeeping operations,
convoy protection, networked future forces,
unmanned vehicles, anti-terror emergency
response, urban operations, humanitarian relief,
and more Products Include New downloadable
experimental designs, plus over 40 student
thesesand a dozen articles
http//diana.gl.nps.navy.mil/SeedLab/