Title: Agents
1Agents Background
2An Agent in its Environment
AGENT
action output
Sensor Input
ENVIRONMENT
3- Agent enjoys the following properties
- autonomy - agents operate without the direct
intervention of humans or others, and have some
kind of control over their actions and internal
state - social ability - agents interact with other
agents (and possibly humans) via some kind of
agent-communication language - reactivity agents perceive their environment and
respond in a timely fashion to changes that occur
in it - pro-activeness agents do not simply act in
response to their environment, they are able to
exhibit goal-directed behaviour by taking
initiative. (Wooldridge and Jennings, 1995)
4Agents
- Need for computer systems to act in our best
interests - The issues addressed in Multiagent systems have
profound implications for our understanding of
ourselves. Wooldridge - Example how do you make a decision about buying
a car
5Agent Environments
- not have complete control (influence only)
- (Ex elevators in Old Main)
- deterministic vs. non-deterministic effect
- accessible (get complete state info) vs
inaccessible environment (Ex. stock market) - episodic (single episode, independent of others)
vs. non-episodic (history sensitive) (Ex. grades
in class)
6Exercise
- There are three blue hats and two brown hats.
- The men are lined up such that one man can see
the backs of the other two, the middle man can
see the back of the front man, and the front man
cant see anybody. - One of the five hats is placed on each man's
head. The remaining two hats are hidden away. - The men are asked what color of hat they are
wearing. Time passes. - Front man correctly guesses the color of his hat.
- What color was it, and how did he guess
correctly?
7Concept
- Everyone else is as smart as you
8Game of Chicken
- Consider another type of encounter the game of
chicken(Think of James Dean in Rebel
without a Cause swerving coop, driving
straight defect.) - Difference to prisoners dilemma Mutual
defection is most feared outcome.
9Question
- How do we communicate our desires to an agent?
- May be muddy You want to graduate with a 4.0,
have a job making 100K a year, have
opportunities for growth, and have quality of
life. - If you cant have it all, what is most valued?
10Answer Utilities
- Assume we have just two agents Ag i, j
- Agents are assumed to be self-interested they
have preferences over how the environment is - Assume W w1, w2, is the set of outcomes
that agents have preferences over - We capture preferences by utility functions which
map an outcome to a rational number. - Utility functions lead to preference orderings
over outcomes.
11What is Utility?
- Utility is not money (but it is a useful analogy)
- Typical relationship between utility money
12Dominant Strategies
- Recall that
- Agents utilities depend on what strategies other
agents are playing - Agents are expected utility maximizers
- A dominant strategy is a best-response for player
i - They do not always exist
- Inferior strategies are called dominated
13Dominant Strategy Equilibrium
- A dominant strategy equilibrium is a strategy
profile where the strategy for each player is
dominant (so neither wants to change) - Known as DUH strategy.
- Nice Agents do not need to counter speculate
(reciprocally reason about what others will do)!
14Prisoners dilemma
Ned
Two people are arrested for a crime. If neither
suspect confesses, both get light sentence. If
both confess, then they get sent to jail. If one
confesses and the other does not, then the
confessor gets no jail time and the other gets a
heavy sentence.
Dont Confess
Confess
Confess
Kelly
Dont Confess
15Prisoners dilemma
Kelly will confess. Same holds for Ned.
Ned
Dont Confess
Confess
Confess
Kelly
Dont Confess
16Prisoners dilemma
So the only outcome that involves each player
choosing their dominant strategies is where they
both confess. Solve by iterative elimination of
dominant strategies
Ned
Dont Confess
Confess
Confess
Kelly
Dont Confess
17Example Prisoners Dilemma
- Two people are arrested for a crime. If neither
suspect confesses, both get light sentence. If
both confess, then they get sent to jail. If one
confesses and the other does not, then the
confessor gets no jail time and the other gets a
heavy sentence. - (Actual numbers vary in different versions of the
problem, but relative values are the same)
Pareto optimal
Dont Confess
Confess
Confess
Dont Confess
18Example Bach or Stravinsky
- A couple likes going to concerts together. One
loves Bach but not Stravinsky. The other loves
Stravinsky but not Bach. However, they prefer
being together than being apart.
B
S
No dominant strategy equilibrium
B
S
19Example Paying for Bus fare
- Getting back to the Gatwick airport. Steve had
planned to pay for all of us, but left to find
son. Came for funds. Do I pay, or say my
husband will?
Pay for 2
Pay for 4
No dominant strategy equilibrium
Pay for 2
Not Pay
20Research Questions
- Can we apply game theory to solve seemingly
unrelated problems? - Ex traffic control
- Ex sharing Operating System resources
21Exercise
- You participate in a game show in which prizes of
varying values occur at equal frequency. Two of
you win a prize. - There are 10 types of prizes of varying values.
Assume, a prize of type 10 is the best and a
prize of type 1 is the worst. - Without knowing the others prize, both asked if
they want to exchange the prizes they were given.
- If both want to exchange, the two exchange
prizes. - What is your strategy?
22Employee Monitoring
- Employees can work hard or shirk
- Salary 100K unless caught shirking
- Cost of effort 50K
- Managers can monitor or not
- Value of employee output 200K
- Profit if employee doesnt work 0
- Cost of monitoring 10K
23What is your strategy?
24Employee Monitoring
Manager
- No equilibrium in pure strategies
- What do the players do?
25Mixed Strategies
- Randomize surprise the rival
- Mixed Strategy
- Specifies that an actual move be chosen randomly
from the set of pure strategies with some
specific probabilities.
26Research question
- What features does a good solution have?
27Pareto Efficient Solutions f represents possible
solutions for two players
U2
f 1
f 2
f 3
f 4
U1
28Pareto Efficient Solutions
U2
f 1
f 2 Pareto dominates f 3
f 2
f 3
f 4
U1
29Auctions
- Dutch
- English
- First Price Sealed Bid
- Second Price Sealed Bid
30Auction Parameters
- Goods can have
- private value (Aunt Bessies Broach)
- public/common value (oil field to oil companies)
- correlated value (partially private, partially
values of others) consider the resale value - Winner pays
- first price (highest bidder wins, pays highest
price) - second price (to person who bids highest, but pay
value of second price) - Bids may be
- open cry
- sealed bid
- Bidding may be
- one shot
- ascending
- descending
31Dutch (Aalsmeer) flower auction
32(No Transcript)
33Research Questions
- How can we design an agent to function in the
electronic marketplace? - Give the new possibilities, made possible via an
electronic auction, what mechanisms can be
designed to elicit desirable properties?
34How do you counter speculate?
- Consider a Dutch auction
- While you dont know what the others valuation
is, you know a range and guess at a distribution
(uniform, normal, etc.) - For example, suppose there is a single other
bidder whose valuation lies in the range a,b
with a uniform distribution. If your valuation
of the item is v, what price should you bid? - Thinking about this logically, if you bid above
your valuation, you lose. If you bid lower than
your valuation, you increase profit. - If you bid very low, you lower the probability
that you will ever get it.
35What is expected profit (Dutch auction)?
- Try to maximize your expected profit.
- Expected profit (as a function of a specific bid)
is the probability that you will win the bid
times the amount of your profit at that price. - Let p be the price you bid for an item. v be
your valuation. a,b be the uniform range of
others bid. - The probability that you win the bid at this
price is the fraction of the time that the other
person bids lower than p. (p-a)/(b-a) - The profit you make at p is v-p
- Expected profit as a function of p is the
function - (v-p)(p-a)/(b-a) 0(1- (p-a)/(b-a))
36Finding maximum profit is a simple calculus
problem
- Expected profit as a function of p is the
function (v-p)(p-a)/(b-a) - Take the derivative with respect to p and set
that value to zero. Where the slope is zero, is
the maximum value. (as second derivative is
negative) - f(p) 1/(b-a) (vp -va -p2pa)
- f(p) 1/(b-a) (v-2pa) 0
- p(av)/2 (half the distance between your bid
and the min range value)
37Ultimatum Bargaining with Incomplete Information
38Ultimatum Bargaining withIncomplete Information
- Player 1 begins the game by drawing a chip from
the bag. Inside the bag are 30 chips ranging in
value from 1.00 to 30.00. - Both must agree to split the amount. Player 2
does not see the chip. - Player 1 then makes an offer to Player 2. The
offer can be any amount in the range from 0.00
up to the value of the chip. - Player 2 can either accept or reject the offer.
If accepted,Player 1 pays Player 2 the amount of
the offer and keeps the rest. If rejected, both
players get nothing.
39Experimental Results
- Questions
- How much should Player 1 offer Player 2?
- Does the amount of the offer depend on the size
of the chip? - 2) What should Player 2 do?
- Should Player 2 accept all offers or only offers
above a specified amount? - Explain.
40Coalition Formation
- Tasks need the skills of several workers
- Tasks have various worth
- Agents have various costs
- How do you decide who works together?
- What do you pay each one?
41Research Questions
- Computing the optimal coalition is NP-hard. How
do you form good coalitions in an efficient
manner? - How do you form coalitions when the information
is incomplete? - How do you form coalitions in a dynamic
environment with agents entering/leaving?
42Voting Mechanisms
- How do we make decisions that respond to various
individuals preference funtions? - Ex selecting new faculty based on various
different evaluations - Want to decide what to serve for refreshments the
last day of class. How do we decide?
43Borda Paradox remove loser, winner
changes(notice, c is always ahead of removed
item)
- a gt b gt c gtd
- b gt c gt d gta
- c gt d gt a gt b
- a gt b gt c gt d
- b gt c gt dgt a
- c gtd gt a gtb
- a ltb ltc lt d
- a18, b19, c20, d13
- a gt b gt c
- b gt c gta
- c gt a gt b
- a gt b gt c
- b gt c gt a
- c gt a gtb
- a ltb ltc
- a15,b14, c13
When loser is removed, next loser becomes winner!
44Research Question
- Do individuals always act the way the theory says
they should? - If not, why not? Is the theory wrong?
45Allais Paradox
- In 1953, Maurice Allais published a paper
regarding a survey he had conducted in 1952, with
a hypothetical game. - Subjects "with good training in and knowledge of
the theory of probability, so that they could be
considered to behave rationally", routinely
violated the expected utility axioms. - The game itself and its results have now become
famous as the "Allais Paradox".
46The most famous structure is the following
- Subjects are asked to choose between the
following 2 gambles, i.e. which one they would
like to participate in if they couldGamble A
A 100 chance of receiving 1 million.Gamble B
A 10 chance of receiving 5 million, an 89
chance of receiving 1 million, and a 1 chance
of receiving nothing.After they have made their
choice, they are presented with another 2 gambles
and asked to choose between themGamble C An
11 chance of receiving 1 million, and an 89
chance of receiving nothing.Gamble D A 10
chance of receiving 5 million, and a 90 chance
of receiving nothing.
47- This experiment has been conducted many, many
times, and most people invariably prefer A to B,
and D to C. - So why is this a paradox?.
48- The expected value of A is 1 million, while the
expected value of B is 1.39 million. By
preferring A to B, people are presumably
maximizing expected utility, not expected value.
By preferring A to B, we have the following
expected utility relationshipu(1) gt 0.1 u(5)
0.89 u(1) 0.01 u(0), i.e.0.11 u(1) gt
0.1 u(5) 0.1 u(0)Adding 0.89 u(0) to
each side, we get0.11 u(1) 0.89 u(0) gt
0.1 u(5) 0.90 u(0), implying that an
expected utility maximizer consistent with the
first choice must prefer C to D. - The expected value of C is 110,000, while the
expected value of D is 500,000, so if people
were maximizing expected value, they should in
fact prefer D to C. However, their choice in the
first stage is inconsistent with their choice in
the second stage, and herein lies the paradox.