Agents - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Agents

Description:

Need for computer systems to act in our best interests ' ... Salary: $100K unless caught shirking. Cost of effort: $50K. Managers can monitor or not ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 49

Provided by: adinamag

Category:

more less

Transcript and Presenter's Notes

Title: Agents

1
Agents Background

Vicki H. Allan

2
An Agent in its Environment
AGENT
action output
Sensor Input
ENVIRONMENT
3

Agent enjoys the following properties
autonomy - agents operate without the direct
intervention of humans or others, and have some
kind of control over their actions and internal
state
social ability - agents interact with other
agents (and possibly humans) via some kind of
agent-communication language
reactivity agents perceive their environment and
respond in a timely fashion to changes that occur
in it
pro-activeness agents do not simply act in
response to their environment, they are able to
exhibit goal-directed behaviour by taking
initiative. (Wooldridge and Jennings, 1995)

4
Agents

Need for computer systems to act in our best
interests
The issues addressed in Multiagent systems have
profound implications for our understanding of
ourselves. Wooldridge
Example how do you make a decision about buying
a car

5
Agent Environments

not have complete control (influence only)
(Ex elevators in Old Main)
deterministic vs. non-deterministic effect
accessible (get complete state info) vs
inaccessible environment (Ex. stock market)
episodic (single episode, independent of others)
vs. non-episodic (history sensitive) (Ex. grades
in class)

6
Exercise

There are three blue hats and two brown hats.
The men are lined up such that one man can see
the backs of the other two, the middle man can
see the back of the front man, and the front man
cant see anybody.
One of the five hats is placed on each man's
head. The remaining two hats are hidden away.
The men are asked what color of hat they are
wearing. Time passes.
Front man correctly guesses the color of his hat.
What color was it, and how did he guess
correctly?

7
Concept

Everyone else is as smart as you

8
Game of Chicken

Consider another type of encounter the game of
chicken(Think of James Dean in Rebel
without a Cause swerving coop, driving
straight defect.)
Difference to prisoners dilemma Mutual
defection is most feared outcome.

9
Question

How do we communicate our desires to an agent?
May be muddy You want to graduate with a 4.0,
have a job making 100K a year, have
opportunities for growth, and have quality of
life.
If you cant have it all, what is most valued?

10
Answer Utilities

Assume we have just two agents Ag i, j
Agents are assumed to be self-interested they
have preferences over how the environment is
Assume W w1, w2, is the set of outcomes
that agents have preferences over
We capture preferences by utility functions which
map an outcome to a rational number.
Utility functions lead to preference orderings
over outcomes.

11
What is Utility?

Utility is not money (but it is a useful analogy)
Typical relationship between utility money

12
Dominant Strategies

Recall that
Agents utilities depend on what strategies other
agents are playing
Agents are expected utility maximizers
A dominant strategy is a best-response for player
i
They do not always exist
Inferior strategies are called dominated

13
Dominant Strategy Equilibrium

A dominant strategy equilibrium is a strategy
profile where the strategy for each player is
dominant (so neither wants to change)
Known as DUH strategy.
Nice Agents do not need to counter speculate
(reciprocally reason about what others will do)!

14
Prisoners dilemma
Ned
Two people are arrested for a crime. If neither
suspect confesses, both get light sentence. If
both confess, then they get sent to jail. If one
confesses and the other does not, then the
confessor gets no jail time and the other gets a
heavy sentence.
Dont Confess
Confess
Confess
Kelly
Dont Confess
15
Prisoners dilemma
Kelly will confess. Same holds for Ned.
Ned
Dont Confess
Confess
Confess
Kelly
Dont Confess
16
Prisoners dilemma
So the only outcome that involves each player
choosing their dominant strategies is where they
both confess. Solve by iterative elimination of
dominant strategies
Ned
Dont Confess
Confess
Confess
Kelly
Dont Confess
17
Example Prisoners Dilemma

Two people are arrested for a crime. If neither
suspect confesses, both get light sentence. If
both confess, then they get sent to jail. If one
confesses and the other does not, then the
confessor gets no jail time and the other gets a
heavy sentence.
(Actual numbers vary in different versions of the
problem, but relative values are the same)

Pareto optimal
Dont Confess
Confess
Confess
Dont Confess
18
Example Bach or Stravinsky

A couple likes going to concerts together. One
loves Bach but not Stravinsky. The other loves
Stravinsky but not Bach. However, they prefer
being together than being apart.

B
S
No dominant strategy equilibrium
B
S
19
Example Paying for Bus fare

Getting back to the Gatwick airport. Steve had
planned to pay for all of us, but left to find
son. Came for funds. Do I pay, or say my
husband will?

Pay for 2
Pay for 4
No dominant strategy equilibrium
Pay for 2
Not Pay
20
Research Questions

Can we apply game theory to solve seemingly
unrelated problems?
Ex traffic control
Ex sharing Operating System resources

21
Exercise

You participate in a game show in which prizes of
varying values occur at equal frequency. Two of
you win a prize.
There are 10 types of prizes of varying values.
Assume, a prize of type 10 is the best and a
prize of type 1 is the worst.
Without knowing the others prize, both asked if
they want to exchange the prizes they were given.
If both want to exchange, the two exchange
prizes.
What is your strategy?

22
Employee Monitoring

Employees can work hard or shirk
Salary 100K unless caught shirking
Cost of effort 50K
Managers can monitor or not
Value of employee output 200K
Profit if employee doesnt work 0
Cost of monitoring 10K

23
What is your strategy?

Work hard?
Shirk?

24
Employee Monitoring
Manager

No equilibrium in pure strategies
What do the players do?

25
Mixed Strategies

Randomize surprise the rival
Mixed Strategy
Specifies that an actual move be chosen randomly
from the set of pure strategies with some
specific probabilities.

26
Research question

What features does a good solution have?

27
Pareto Efficient Solutions f represents possible
solutions for two players
U2
f 1
f 2
f 3
f 4
U1
28
Pareto Efficient Solutions
U2
f 1
f 2 Pareto dominates f 3
f 2
f 3
f 4
U1
29
Auctions

Dutch
English
First Price Sealed Bid
Second Price Sealed Bid

30
Auction Parameters

Goods can have
private value (Aunt Bessies Broach)
public/common value (oil field to oil companies)
correlated value (partially private, partially
values of others) consider the resale value
Winner pays
first price (highest bidder wins, pays highest
price)
second price (to person who bids highest, but pay
value of second price)
Bids may be
open cry
sealed bid
Bidding may be
one shot
ascending
descending

31
Dutch (Aalsmeer) flower auction
32
(No Transcript)
33
Research Questions

How can we design an agent to function in the
electronic marketplace?
Give the new possibilities, made possible via an
electronic auction, what mechanisms can be
designed to elicit desirable properties?

34
How do you counter speculate?

Consider a Dutch auction
While you dont know what the others valuation
is, you know a range and guess at a distribution
(uniform, normal, etc.)
For example, suppose there is a single other
bidder whose valuation lies in the range a,b
with a uniform distribution. If your valuation
of the item is v, what price should you bid?
Thinking about this logically, if you bid above
your valuation, you lose. If you bid lower than
your valuation, you increase profit.
If you bid very low, you lower the probability
that you will ever get it.

35
What is expected profit (Dutch auction)?

Try to maximize your expected profit.
Expected profit (as a function of a specific bid)
is the probability that you will win the bid
times the amount of your profit at that price.
Let p be the price you bid for an item. v be
your valuation. a,b be the uniform range of
others bid.
The probability that you win the bid at this
price is the fraction of the time that the other
person bids lower than p. (p-a)/(b-a)
The profit you make at p is v-p
Expected profit as a function of p is the
function
(v-p)(p-a)/(b-a) 0(1- (p-a)/(b-a))

36
Finding maximum profit is a simple calculus
problem

Expected profit as a function of p is the
function (v-p)(p-a)/(b-a)
Take the derivative with respect to p and set
that value to zero. Where the slope is zero, is
the maximum value. (as second derivative is
negative)
f(p) 1/(b-a) (vp -va -p2pa)
f(p) 1/(b-a) (v-2pa) 0
p(av)/2 (half the distance between your bid
and the min range value)

37
Ultimatum Bargaining with Incomplete Information
38
Ultimatum Bargaining withIncomplete Information

Player 1 begins the game by drawing a chip from
the bag. Inside the bag are 30 chips ranging in
value from 1.00 to 30.00.
Both must agree to split the amount. Player 2
does not see the chip.
Player 1 then makes an offer to Player 2. The
offer can be any amount in the range from 0.00
up to the value of the chip.
Player 2 can either accept or reject the offer.
If accepted,Player 1 pays Player 2 the amount of
the offer and keeps the rest. If rejected, both
players get nothing.

39
Experimental Results

Questions
How much should Player 1 offer Player 2?
Does the amount of the offer depend on the size
of the chip?
2) What should Player 2 do?
Should Player 2 accept all offers or only offers
above a specified amount?
Explain.

40
Coalition Formation

Tasks need the skills of several workers
Tasks have various worth
Agents have various costs
How do you decide who works together?
What do you pay each one?

41
Research Questions

Computing the optimal coalition is NP-hard. How
do you form good coalitions in an efficient
manner?
How do you form coalitions when the information
is incomplete?
How do you form coalitions in a dynamic
environment with agents entering/leaving?

42
Voting Mechanisms

How do we make decisions that respond to various
individuals preference funtions?
Ex selecting new faculty based on various
different evaluations
Want to decide what to serve for refreshments the
last day of class. How do we decide?

43
Borda Paradox remove loser, winner
changes(notice, c is always ahead of removed
item)

a gt b gt c gtd
b gt c gt d gta
c gt d gt a gt b
a gt b gt c gt d
b gt c gt dgt a
c gtd gt a gtb
a ltb ltc lt d
a18, b19, c20, d13

a gt b gt c
b gt c gta
c gt a gt b
a gt b gt c
b gt c gt a
c gt a gtb
a ltb ltc
a15,b14, c13

When loser is removed, next loser becomes winner!

44
Research Question

Do individuals always act the way the theory says
they should?
If not, why not? Is the theory wrong?

45
Allais Paradox

In 1953, Maurice Allais published a paper
regarding a survey he had conducted in 1952, with
a hypothetical game.
Subjects "with good training in and knowledge of
the theory of probability, so that they could be
considered to behave rationally", routinely
violated the expected utility axioms.
The game itself and its results have now become
famous as the "Allais Paradox".

46
The most famous structure is the following

Subjects are asked to choose between the
following 2 gambles, i.e. which one they would
like to participate in if they couldGamble A
A 100 chance of receiving 1 million.Gamble B
A 10 chance of receiving 5 million, an 89
chance of receiving 1 million, and a 1 chance
of receiving nothing.After they have made their
choice, they are presented with another 2 gambles
and asked to choose between themGamble C An
11 chance of receiving 1 million, and an 89
chance of receiving nothing.Gamble D A 10
chance of receiving 5 million, and a 90 chance
of receiving nothing.

This experiment has been conducted many, many
times, and most people invariably prefer A to B,
and D to C.
So why is this a paradox?.

The expected value of A is 1 million, while the
expected value of B is 1.39 million. By
preferring A to B, people are presumably
maximizing expected utility, not expected value.
By preferring A to B, we have the following
expected utility relationshipu(1) gt 0.1 u(5)
0.89 u(1) 0.01 u(0), i.e.0.11 u(1) gt
0.1 u(5) 0.1 u(0)Adding 0.89 u(0) to
each side, we get0.11 u(1) 0.89 u(0) gt
0.1 u(5) 0.90 u(0), implying that an
expected utility maximizer consistent with the
first choice must prefer C to D.
The expected value of C is 110,000, while the
expected value of D is 500,000, so if people
were maximizing expected value, they should in
fact prefer D to C. However, their choice in the
first stage is inconsistent with their choice in
the second stage, and herein lies the paradox.