вход по аккаунту

Инструкция по заполнению бланков ответов участников;pdf

код для вставкиСкачать
Applying Genetic Programming to Economic Mechanism
Design: Evolving a Pricing Rule for a Continuous Double
Steve Phelps
Peter McBurney
Dept of Computer Science
University of Liverpool
Chadwick Building
Liverpool L69 7ZF, UK.
sphelps,[email protected]
Simon Parsons
Elizabeth Sklar
Dept of Computer and
Information Science
Brooklyn College
2900 Bedford Avenue
Brooklyn, NY 11210, USA.
[email protected]
Dept of Computer Science
Columbia University
1214 Amsterdam Avenue
New York, NY 10027, USA.
[email protected]
Categories and Subject Descriptors
1.2.11 [Artiial Intelligene℄: Distributed Artiial Intelligene: Multi-agent systems
General Terms
Design, Eonomis
ognitive game theory, ontinuous double aution, geneti
programming, pareto optimisation, mehanism design, multiagent systems (MAS), reinforement learning, trading strategies
The aution mehanism design problem has attrated muh
interest in reent years, and eonomists have had onsiderable suess in applying tehniques from game theory to the
design of aution-based markets for deregulated ommodity markets (e.g., California's deregulated eletriity market)
and the sale of government assets (e.g., autions of eletromagneti spetrum for mobile phones). Alvin Roth has suggested that this is akin to an engineering proess in whih
eonomists design the rules of a market mehanism in order
to meet partiular soio-eonomi requirements (e.g., maximising the eÆieny of alloating ommodities in a market).
The engineering of aution mehanisms is of partiular
importane to agent-based eletroni ommere and multiagent systems in general. E-ommere has enabled onsumers to at as prie-makers instead of just prie-takers
in large aution-based markets and has stimulated the use
of personalised bidding agents to empower those onsumers
even more. In addition, aution mehanisms are seen as
a promising means of solving many distributed resourealloation problems in multi-agent systems and grid tehnology.
One approah to mehanism design is to use tehniques
from mahine learning to explore the spae of possible ways
in whih agents might at in partiular markets. For example, reinforement learning has been used to explore bidding
patterns in autions and to establish the ways in whih priesetting behavior an aet onsumer markets. Another approah is to use tehniques from evolutionary omputing,
e.g., o-evolutionary mahine-learning. Our earlier work has
explored the use of o-evolutionary GP to determine aution
mehanism rules automatially.
In that work, mehanism rules and bidding strategies were
o-evolved in ways that sought to maximise, on the one hand
overall market eÆieny, and on the other hand the prots
of individual agents. This approah lends itself to studying
the dynamis of the evolution of negotiation mehanisms in
a setting where mehanisms hange inrementally in a real
environment, but it is problemati when we wish to derive
optimal mehanism designs
In our later work we view mehanism design as a multiobjetive optimisation problem. The key issue that we address is determining the tness of individual points in the
mehanism design-spae. This is non-trivial in the general
ase, sine assessing the tness of an individual mehanism
involves reasoning about how agents might atually behave
under the proposed negotiation rules. In the following setion we desribe our approah to prediting agents behaviour
for arbitrary mehanisms, and in the nal setion we present
a summary of our results where we apply this method to
mapping the tness landsape for a k-CDA.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
AAMAS’03, July 14–18, 2003, Melbourne, Australia.
Copyright 2003 ACM 1-58113-683-8/03/0007 ...$5.00.
When evaluating a mehanism design, the designer must
take into aount the set of trading strategies that are likely
to be played by agents trading in the mehanism under onsideration. Deriving the set of the strategies likely to be
played for a partiular market game, that is \solving" the
game, is a non-trivial problem in the general ase. This
is beause there is often no lear dominant strategy whih
1. Agents with limited omputational power (i.e., \bounded rationality" onstraints) may be unable to ompute
their Nash-equilibrium strategy;
2. Even with vast amounts of omputational and analyti
power, many games defy solution; e.g., in the ase of
the k-double-aution, analytial tehniques have yet to
yield a solution;
3. Empirial evidene shows that human agents often fail
to oordinate on Nash-equilibria for very simple games
whose solution is easily derivable under bounded-rationality assumptions; and
4. Often a given game will yield a multitude of Nash solutions and there is little guidane for pratitioners
on hoosing plausible subsets thereof as predited outomes.
These diÆulties with the standard theory of games have
led to the development of a eld known as ognitive game
theory, in whih models of learning play a entral role in explaining and prediting strategi behaviour. Erev and Roth
show how simulations of agents equipped with a simple reinforement learning algorithm an explain and predit the
experimental data observed when human agents play a diverse range of trading games. Suh multi-agent reinforement learning models form the basis of our solution onept
for optimising mehanism designs. Rather than omputing
the theoretial equilibria for a given point in the mehanism
searh spae, we run a number of multi-agent simulations
using agents equipped with a learning algorithm that determines their bidding strategies. The stationary points in
these simulations { the states where the learning algorithms
of all agents have onverged { orrespond to the equilibria
of lassial game theory, and the market outomes in these
stable states an be viewed as preditions.
Mean fitness with standard deviation vs k for DP NS=6 NB=6
onstitutes best play; rather the best strategy to play depends entirely on the strategies played by other agents. Nash
dened a solution onept in whih the strategy adopted
by any given agent is a best-response to the best-response
strategies adopted by all other agents, and proved that all
n-player, non-zero-sum games admitted solutions so dened.
Nash's solution onept is widely adopted in theoretial
eonomis. Thus when evaluating an eonomi mehanism,
the designer omputes the Nash equilibria of strategies for
the given mehanism; and this forms the basis of preditions
about how people will atually behave under the rules of this
mehanism. The designer an then analyse market outomes
in equilibria and quantitively assess, for example, the likely
aet on overall market-eÆieny that a given hange in the
mehanism rules will yield. Thus the role of the designer is
to ensure that the Nash equilibria orrespond to situations
in whih high market eÆieny is obtained.
We an view mehanism design as a multi-objetive optimization problem. We onsider as a separate dimension
eah problem variable we are interested in maximising (for
example, market eÆieny, seller revenue and so on), and the
diÆulty lies in simultaneously maximising as many dimensions as possible. The designer's task is to hoose mehanism
rules whih pareto-optimise dierent market variables when
traders play Nash-equilibrium strategies. However, there are
a number of problems beginning with omputing the Nash
Figure 1:
Fitness plotted against k for a
disriminatory-prie k-CDA with 6 buyers and 6
Note that we are not attempting to nd theoretially optimal strategies for our agents1 . Rather, we are attempting
to predit how boundedly-rational agents, who have no prior
knowledge of an equilibrium solution nor the means to alulate one, might atually play against the mehanism we
are (automatially) designing. For this reason, we hose to
use the Roth-Erev algorithm , sine it forms the basis of a
ognitive model of how people atually behave in strategi
environments. In partiular it models two important priniples of learning psyhology:
Thorndike's law of eet | hoies that have led to
good outomes in the past are more likely to be repeated in the future; and
The power law of pratie | learning urves tend to
be steep initially, and then atter.
Figure 1 shows the mean tness of a set of k-CDA mehanisms for 100 values of k in the interval [0, 1℄ at intervals of
0.01. Eah sample onsisted of 100,000 runs of the aution
simulation with dierent seeds for the random number generator. In eah simulation we pit 6 buyers against 6 sellers
over a period of 1,000 rounds of trading. Fitness is dened
as a linear sum of buyer market-power, seller market-power
and overall market eÆieny, normalised to lie in the range
[0, 1℄ where higher values indiate better mehanisms. The
software used to run this experiment is available for download at:
Also available at the same URL is the full version of this
paper, in whih we demonstrate the evolution of a k=0.5
CDA using geneti programming.
In other words Nash equilibrium strategies
Пожаловаться на содержимое документа