Title: Optimal Decision Making in Football MS
1Optimal Decision Making in FootballMSE 339
Project
Rick Johnston Brad Null Mark Peters
2Presentation Overview
- Project Objectives
- Football Primer
- Literature Review
- Problem Formulation
- Approximate Approaches
- Conclusions
-
3Project Objectives
- Use dynamic programming techniques to answer two
primary questions about decision-making in
football. - What is the optimal policy to follow for deciding
whether to run an offensive play, punt or kick a
field goal at each possible situation that could
arise in the course of a football game? - If you implemented such a policy, how much of a
performance improvement would you realize when
competing against an opponent playing a standard
strategy?
4Football Primer
- Key rules
- 2 Teams, 60 minute game (2 halves), highest score
wins - Basic scoring plays Touchdown (7 points), Field
Goal (3 points) - Field is 100 yards long
- Advancing the ball 4 plays (downs) to gain 10
yards - if successful, down will reset to 1st down
- if unsuccessful, other team will gain possession
of the ball - Teams have the option of punting the ball to the
other team (typically reserved for 4th down)
which gives the other team possession but in a
worse position on the field - Teams can attempt to kick a field goal at any
point - Common Strategies
- Coaches typically rely on common rules of thumb
to make these decisions - Motivating Situation
- 4th down and 2 yards to go from the opponents 35
yard line - Chance of successfully kicking field goal is 40
- Chance of gaining 2 yards is 60
- Expected punt distance would be 20 yards
- Which is the right decision? And when?
5Brief Literature Review
- Sackrowitz (2000)
- Refining the Point(s)-After-Touchdown Decision
- Backwards induction (based on the number of
possessions remaining) to find optimal policy - No quantitative assessment of the difference
between optimal strategy and the decisions
actually implemented by NFL coaches - Romer (2003)
- Its Fourth Down and what does Bellmans
Equation Say? - Uses play-by-play data for 3 years of NFL play to
solve a simplified version of the problem to
determine what to do on fourth down - Key assumption is that the decision is made in
the first quarter - Results are that NFL coaches should generally go
for the first down more frequently - Others
- Carter and Machor (1978)
- Bertsekas and Tsitiklis (1996)
- Carroll, Palmer and Thorn (1998)
-
6Problem Formulation
- Model setup
- Model one half of a game
- Approximately 500,000 states. One for each
combination of - Score differential
- Team in possession of ball
- Ball position on field
- Down
- Distance to go for first down
- Time remaining
- The half was modeled as consisting of 60 time
periods (equivalent to 60 plays) - Reward value created for each state
- represents the probability that team 1 will win
the game - Transition probabilities
- We estimated all probabilities required for the
model - Solution approach
- Backwards induction to find optimal decision at
each state
7Solution Technique
The below diagram illustrates how the decisions
of the two teams are determined at each state.
Team 1 (Optimal Policy)
Team 2 (Heuristic Policy)
Time t
Time t-1
Time t
Time t-1
States xr,t-1 Reward known
Run
State x Reward chosen as the maximum of the
expectations of the three actions
State x Reward chosen as the expectation of the
given action
States xr,t-1 Reward known
States xr,t-1 Reward known
Punt
Punt
Kick
States xr,t-1 Reward known
Heuristic policy will instruct team which action
to take given that they are in state x
8Optimal vs. Heuristic
Under our assumptions, the use of the optimal
strategy increases the probability of winning by
about 6.5 across the range of likely starting
field positions.
Probability of Winning
Starting Field Position
Note Starting Position (time left 60, score
is even, 1th down, 10 yards to go)
9Optimal vs. Heuristic
The longer a player has to implement the optimal
policy, the larger increase can be expected in
probability of winning.
Probability of Winning
Time Left
Note Starting position (1st down, 10 yards to
go, 50 yard line, score even)
10Comparison of Play Selection
Overall, the optimal policy calls for the team to
be more aggressive than the heuristic in terms of
kicking substantially fewer field goals.
Optimal Policy
Heuristic Policy
4th Down Play Selection
Runs
325,113
258,327
Punts
386,457
354,945
Field Goals
73,500
171,798
Total
785,070
785,070
11Results
4th Down Decisions
Lets compare the range of fourth down decisions
for a typical situation in the game. In this
instance, the score is tied and the time left is
50.
Optimal Policy
Heuristic Policy
Key
Run
Kick
Punt
Distance to First Down
Distance to First Down
Distance to TD
Distance to TD
12Near Goal Results
On fourth down with 50 time periods left and a
tied score, the optimal strategy is more
aggressive when close to the goal line.
13Model Limitations
Simplifying Restrictions
Possible Enhancements
- Limited outcomes of running plays
- All plays set to a duration of 20 seconds
- Excluding kickoffs
- Assume extra points
-
- Which play to run
- offensive
- defensive
- special teams
- Probabilities conditional on specific teams or
players - real-time applications
The model might be made significantly more
powerful by expanding the state space to a point
where backwards induction is extremely
cumbersome. This provides a motivation to
explore approximate DP approaches.
14Approximate DP Approach
Estimate state reward values by finding a linear
combination of basis functions to calculate Q
values.
- Estimating reward values
- State sampling
- For each time period, sample 1,000 states
according to a series of distributions that
should represent the most commonly reached states
at certain points in an actual game - Outcome sampling
- For each feasible action in each state, sample
one possible outcome for each action and set the
Q value corresponding to that action equal to the
samples Q value - The states Q value is set to the maximum Q value
returned
15Approximate DP Approach
- Estimating reward values (continued)
- Fitting basis functions
- Given our sample of 1,000 states with Q values,
we fit linear coefficients to our basis functions
to solve the least squares problem - The basis functions that we employed were
- Team in Possession of Ball
- Position of ball
- Point differential
- Score indicators
- Winning by more than 7
- Winning by less than 7
- Score tied
- Losing by less than 7
- Down indicators
- 3rd down for us
- 3rd down for them
- 4th down for us
- 4th down for them
16Basis Functions
Coefficient value for different times to games
end
17ADP vs. Exact Solution
Comparing the approximate policy to the exact
solution.
- Determining approximate policy
- Using the basis functions, can calculate Q values
for all states - Iterate through all states and determine the
optimal action at each state based on the
relevant Q values for the potential states that
we could transition to. - Comparison to heuristic policy
- Employ backwards induction to solve for the exact
reward values for all states given that team 1 is
playing the approximate policy and team 2 is
playing the heuristic policy
18ADP v. Exact Results
The approximated dynamic program captures 55 of
the improvement over the heuristic that the
optimal policy achieves.
Improvement in Winning Probability Over Heuristic
Starting Field Position
Note Starting Position (time left 60, score
is even, 1th down, 10 yards to go)
19Comparison of Play Selection
The approximated policy has a greater number of
runs than the heuristic but also punts more
frequently.
Optimal Policy
Approximated Policy
Heuristic Policy
4th Down Play Selection
Runs
325,113
280,540
258,327
Punts
386,457
374,981
354,945
Field Goals
73,500
129,585
171,798
Total
785,070
785,070
785,070
20Comparison of Performance
- The approximate dynamic program runs about 15x
faster - Potential applications
- Similar simple models may have some real-time
applications - More complex models could become significantly
more manageable
Minutes to Complete
21Conclusions
- Optimal Policy
- The implementation of the optimal policy resulted
in an average increased winning percentage of
6.5 in the initial states which we considered
representative - The algorithm was able to run on a PC in 32
minutes (incorporating some restrictions on the
state space to achieve this performance) - Approximate Policy
- The implementation of the approximate policy
resulted in an average increased winning
percentage of 3.5 in initial representative
states - The algorithm ran in 2.3 minutes
- Next Steps
- Get transition probabilities from real data
- Incorporate more decisions
- Improve the heuristic and basis functions