Using Adalines to Approximate Q-functions in Reinforcement Learning - PowerPoint PPT Presentation

About This Presentation
Title:

Using Adalines to Approximate Q-functions in Reinforcement Learning

Description:

Using Adalines to Approximate Q-functions in Reinforcement Learning Steven Wyckoff ... My Algorithm Use a neural network instead of dynamic programming Good: ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 12
Provided by: Steven1091
Learn more at: https://www.cs.hmc.edu
Category:

less

Transcript and Presenter's Notes

Title: Using Adalines to Approximate Q-functions in Reinforcement Learning


1
Using Adalines to Approximate Q-functions in
Reinforcement Learning
  • Steven Wyckoff
  • December 6, 2006

2
The Problem
  • Timing traffic lights for optimal traffic flow is
    hard
  • It would be really nice if there was a good way
    to have the traffic lights learn the best timing

3
Green Light District
  • Intelligent Traffic Light Control
  • Wiering, van Veenen, Vreeken, Koopman
  • www.cs.uu.nl
  • Built a test-bed for traffic light controller
    algorithms
  • Based on Reinforcement Learning

4
Green Light District
  • TLController fills out a table with the gains
    for each lane
  • SimModel picks the best legal light configuration
  • Cars are allowed to move (or not) and the
    TLController gets to listen in on their movement
  • Repeat

5
Existing Algorithms
  • Random
  • Totally random gains
  • Most Cars
  • Based on presence of at least one car
  • TC-1
  • Real-Time Dynamic Programming
  • Based on probabilities of progress / reward
  • GenNeural
  • Genetically evolve a 3-layer network
  • Uses only traffic densities
  • (And more)

6
My Algorithm
  • Use a neural network instead of dynamic
    programming
  • Good
  • Network can deal with continuous input
  • Might be able to recognize traffic patterns that
    are not available using a table lookup
  • Bad
  • Hard to tell what the network will learn
  • Hard to figure out useful input
  • Hard to tell what the right output is for
    training

7
Pitfalls / Solutions
  • Dont know if we will be red or green
  • Two adalines to predict reward if the light is
    red or greengain is the difference
  • Input
  • (for each lane) number of cars, traffic density,
    is a given lane full
  • Rewards
  • Reward for cars moving, passing through
    intersections
  • Shared reward for other lanes in the intersection

8
Results Split
  • Adaline did slightly better than Most Cars
  • TC-1 did the best

9
Results Complex
  • Adaline did the worst
  • TC-1 did the best

10
What I Wish Was Different
  • Infrastructure
  • Inputs and rewards are all discrete
  • Seems like the network would do better with
    access to the light configurations
  • Rewards
  • It would be nice to give rewards for no waiting
  • Network
  • Arguably a multi-layer network could perform
    better

11
Demo Time
Write a Comment
User Comments (0)
About PowerShow.com