Optimal Decision Making in Football MS - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Optimal Decision Making in Football MS

Description:

On fourth down with 50 time periods left and a tied score, the optimal strategy ... For each time period, sample 1,000 states according to a series of distributions ... – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 22

Provided by: bradle99

Category:

more less

Transcript and Presenter's Notes

Title: Optimal Decision Making in Football MS

1
Optimal Decision Making in FootballMSE 339
Project
Rick Johnston Brad Null Mark Peters
2
Presentation Overview

Project Objectives
Football Primer
Literature Review
Problem Formulation
Approximate Approaches
Conclusions

3
Project Objectives

Use dynamic programming techniques to answer two
primary questions about decision-making in
football.
What is the optimal policy to follow for deciding
whether to run an offensive play, punt or kick a
field goal at each possible situation that could
arise in the course of a football game?
If you implemented such a policy, how much of a
performance improvement would you realize when
competing against an opponent playing a standard
strategy?

4
Football Primer

Key rules
2 Teams, 60 minute game (2 halves), highest score
wins
Basic scoring plays Touchdown (7 points), Field
Goal (3 points)
Field is 100 yards long
Advancing the ball 4 plays (downs) to gain 10
yards
if successful, down will reset to 1st down
if unsuccessful, other team will gain possession
of the ball
Teams have the option of punting the ball to the
other team (typically reserved for 4th down)
which gives the other team possession but in a
worse position on the field
Teams can attempt to kick a field goal at any
point
Common Strategies
Coaches typically rely on common rules of thumb
to make these decisions
Motivating Situation
4th down and 2 yards to go from the opponents 35
yard line
Chance of successfully kicking field goal is 40
Chance of gaining 2 yards is 60
Expected punt distance would be 20 yards
Which is the right decision? And when?

5
Brief Literature Review

Sackrowitz (2000)
Refining the Point(s)-After-Touchdown Decision
Backwards induction (based on the number of
possessions remaining) to find optimal policy
No quantitative assessment of the difference
between optimal strategy and the decisions
actually implemented by NFL coaches
Romer (2003)
Its Fourth Down and what does Bellmans
Equation Say?
Uses play-by-play data for 3 years of NFL play to
solve a simplified version of the problem to
determine what to do on fourth down
Key assumption is that the decision is made in
the first quarter
Results are that NFL coaches should generally go
for the first down more frequently
Others
Carter and Machor (1978)
Bertsekas and Tsitiklis (1996)
Carroll, Palmer and Thorn (1998)

6
Problem Formulation

Model setup
Model one half of a game
Approximately 500,000 states. One for each
combination of
Score differential
Team in possession of ball
Ball position on field
Down
Distance to go for first down
Time remaining
The half was modeled as consisting of 60 time
periods (equivalent to 60 plays)
Reward value created for each state
represents the probability that team 1 will win
the game
Transition probabilities
We estimated all probabilities required for the
model
Solution approach
Backwards induction to find optimal decision at
each state

7
Solution Technique
The below diagram illustrates how the decisions
of the two teams are determined at each state.
Team 1 (Optimal Policy)
Team 2 (Heuristic Policy)
Time t
Time t-1
Time t
Time t-1
States xr,t-1 Reward known
Run
State x Reward chosen as the maximum of the
expectations of the three actions
State x Reward chosen as the expectation of the
given action
States xr,t-1 Reward known
States xr,t-1 Reward known
Punt
Punt
Kick
States xr,t-1 Reward known
Heuristic policy will instruct team which action
to take given that they are in state x
8
Optimal vs. Heuristic
Under our assumptions, the use of the optimal
strategy increases the probability of winning by
about 6.5 across the range of likely starting
field positions.
Probability of Winning
Starting Field Position
Note Starting Position (time left 60, score
is even, 1th down, 10 yards to go)
9
Optimal vs. Heuristic
The longer a player has to implement the optimal
policy, the larger increase can be expected in
probability of winning.
Probability of Winning
Time Left
Note Starting position (1st down, 10 yards to
go, 50 yard line, score even)
10
Comparison of Play Selection
Overall, the optimal policy calls for the team to
be more aggressive than the heuristic in terms of
kicking substantially fewer field goals.
Optimal Policy
Heuristic Policy
4th Down Play Selection
Runs
325,113
258,327
Punts
386,457
354,945
Field Goals
73,500
171,798
Total
785,070
785,070
11
Results
4th Down Decisions
Lets compare the range of fourth down decisions
for a typical situation in the game. In this
instance, the score is tied and the time left is
50.
Optimal Policy
Heuristic Policy
Key
Run
Kick
Punt
Distance to First Down
Distance to First Down
Distance to TD
Distance to TD
12
Near Goal Results
On fourth down with 50 time periods left and a
tied score, the optimal strategy is more
aggressive when close to the goal line.
13
Model Limitations
Simplifying Restrictions
Possible Enhancements

Limited outcomes of running plays
All plays set to a duration of 20 seconds
Excluding kickoffs
Assume extra points

Which play to run
offensive
defensive
special teams
Probabilities conditional on specific teams or
players
real-time applications

The model might be made significantly more
powerful by expanding the state space to a point
where backwards induction is extremely
cumbersome. This provides a motivation to
explore approximate DP approaches.
14
Approximate DP Approach
Estimate state reward values by finding a linear
combination of basis functions to calculate Q
values.

Estimating reward values
State sampling
For each time period, sample 1,000 states
according to a series of distributions that
should represent the most commonly reached states
at certain points in an actual game
Outcome sampling
For each feasible action in each state, sample
one possible outcome for each action and set the
Q value corresponding to that action equal to the
samples Q value
The states Q value is set to the maximum Q value
returned

15
Approximate DP Approach

Estimating reward values (continued)
Fitting basis functions
Given our sample of 1,000 states with Q values,
we fit linear coefficients to our basis functions
to solve the least squares problem
The basis functions that we employed were
Team in Possession of Ball
Position of ball
Point differential
Score indicators
Winning by more than 7
Winning by less than 7
Score tied
Losing by less than 7
Down indicators
3rd down for us
3rd down for them
4th down for us
4th down for them

16
Basis Functions
Coefficient value for different times to games
end
17
ADP vs. Exact Solution
Comparing the approximate policy to the exact
solution.

Determining approximate policy
Using the basis functions, can calculate Q values
for all states
Iterate through all states and determine the
optimal action at each state based on the
relevant Q values for the potential states that
we could transition to.
Comparison to heuristic policy
Employ backwards induction to solve for the exact
reward values for all states given that team 1 is
playing the approximate policy and team 2 is
playing the heuristic policy

18
ADP v. Exact Results
The approximated dynamic program captures 55 of
the improvement over the heuristic that the
optimal policy achieves.
Improvement in Winning Probability Over Heuristic
Starting Field Position
Note Starting Position (time left 60, score
is even, 1th down, 10 yards to go)
19
Comparison of Play Selection
The approximated policy has a greater number of
runs than the heuristic but also punts more
frequently.
Optimal Policy
Approximated Policy
Heuristic Policy
4th Down Play Selection
Runs
325,113
280,540
258,327
Punts
386,457
374,981
354,945
Field Goals
73,500
129,585
171,798
Total
785,070
785,070
785,070
20
Comparison of Performance

The approximate dynamic program runs about 15x
faster
Potential applications
Similar simple models may have some real-time
applications
More complex models could become significantly
more manageable

Minutes to Complete
21
Conclusions

Optimal Policy
The implementation of the optimal policy resulted
in an average increased winning percentage of
6.5 in the initial states which we considered
representative
The algorithm was able to run on a PC in 32
minutes (incorporating some restrictions on the
state space to achieve this performance)
Approximate Policy
The implementation of the approximate policy
resulted in an average increased winning
percentage of 3.5 in initial representative
states
The algorithm ran in 2.3 minutes
Next Steps
Get transition probabilities from real data
Incorporate more decisions
Improve the heuristic and basis functions