Applying Genetic Programming to Economic Mechanism Design: Evolving a Pricing Rule for a Continuous Double Auction Steve Phelps Peter McBurney Dept of Computer Science University of Liverpool Chadwick Building Liverpool L69 7ZF, UK. sphelps,[email protected] Simon Parsons Elizabeth Sklar Dept of Computer and Information Science Brooklyn College 2900 Bedford Avenue Brooklyn, NY 11210, USA. [email protected] Dept of Computer Science Columbia University 1214 Amsterdam Avenue New York, NY 10027, USA. [email protected] Categories and Subject Descriptors 1.2.11 [Artiial Intelligene℄: Distributed Artiial Intelligene: Multi-agent systems General Terms Design, Eonomis Keywords ognitive game theory, ontinuous double aution, geneti programming, pareto optimisation, mehanism design, multiagent systems (MAS), reinforement learning, trading strategies 1. INTRODUCTION The aution mehanism design problem has attrated muh interest in reent years, and eonomists have had onsiderable suess in applying tehniques from game theory to the design of aution-based markets for deregulated ommodity markets (e.g., California's deregulated eletriity market) and the sale of government assets (e.g., autions of eletromagneti spetrum for mobile phones). Alvin Roth has suggested that this is akin to an engineering proess in whih eonomists design the rules of a market mehanism in order to meet partiular soio-eonomi requirements (e.g., maximising the eÆieny of alloating ommodities in a market). The engineering of aution mehanisms is of partiular importane to agent-based eletroni ommere and multiagent systems in general. E-ommere has enabled onsumers to at as prie-makers instead of just prie-takers in large aution-based markets and has stimulated the use of personalised bidding agents to empower those onsumers even more. In addition, aution mehanisms are seen as a promising means of solving many distributed resourealloation problems in multi-agent systems and grid tehnology. One approah to mehanism design is to use tehniques from mahine learning to explore the spae of possible ways in whih agents might at in partiular markets. For example, reinforement learning has been used to explore bidding patterns in autions and to establish the ways in whih priesetting behavior an aet onsumer markets. Another approah is to use tehniques from evolutionary omputing, e.g., o-evolutionary mahine-learning. Our earlier work has explored the use of o-evolutionary GP to determine aution mehanism rules automatially. In that work, mehanism rules and bidding strategies were o-evolved in ways that sought to maximise, on the one hand overall market eÆieny, and on the other hand the prots of individual agents. This approah lends itself to studying the dynamis of the evolution of negotiation mehanisms in a setting where mehanisms hange inrementally in a real environment, but it is problemati when we wish to derive optimal mehanism designs In our later work we view mehanism design as a multiobjetive optimisation problem. The key issue that we address is determining the tness of individual points in the mehanism design-spae. This is non-trivial in the general ase, sine assessing the tness of an individual mehanism involves reasoning about how agents might atually behave under the proposed negotiation rules. In the following setion we desribe our approah to prediting agents behaviour for arbitrary mehanisms, and in the nal setion we present a summary of our results where we apply this method to mapping the tness landsape for a k-CDA. 2. EQUILIBRIA FOR N -PLAYER GAMES Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. AAMAS’03, July 14–18, 2003, Melbourne, Australia. Copyright 2003 ACM 1-58113-683-8/03/0007 ...$5.00. When evaluating a mehanism design, the designer must take into aount the set of trading strategies that are likely to be played by agents trading in the mehanism under onsideration. Deriving the set of the strategies likely to be played for a partiular market game, that is \solving" the game, is a non-trivial problem in the general ase. This is beause there is often no lear dominant strategy whih 1. Agents with limited omputational power (i.e., \bounded rationality" onstraints) may be unable to ompute their Nash-equilibrium strategy; 2. Even with vast amounts of omputational and analyti power, many games defy solution; e.g., in the ase of the k-double-aution, analytial tehniques have yet to yield a solution; 3. Empirial evidene shows that human agents often fail to oordinate on Nash-equilibria for very simple games whose solution is easily derivable under bounded-rationality assumptions; and 4. Often a given game will yield a multitude of Nash solutions and there is little guidane for pratitioners on hoosing plausible subsets thereof as predited outomes. These diÆulties with the standard theory of games have led to the development of a eld known as ognitive game theory, in whih models of learning play a entral role in explaining and prediting strategi behaviour. Erev and Roth show how simulations of agents equipped with a simple reinforement learning algorithm an explain and predit the experimental data observed when human agents play a diverse range of trading games. Suh multi-agent reinforement learning models form the basis of our solution onept for optimising mehanism designs. Rather than omputing the theoretial equilibria for a given point in the mehanism searh spae, we run a number of multi-agent simulations using agents equipped with a learning algorithm that determines their bidding strategies. The stationary points in these simulations { the states where the learning algorithms of all agents have onverged { orrespond to the equilibria of lassial game theory, and the market outomes in these stable states an be viewed as preditions. Mean fitness with standard deviation vs k for DP NS=6 NB=6 1 0.95 0.9 0.85 fitness onstitutes best play; rather the best strategy to play depends entirely on the strategies played by other agents. Nash dened a solution onept in whih the strategy adopted by any given agent is a best-response to the best-response strategies adopted by all other agents, and proved that all n-player, non-zero-sum games admitted solutions so dened. Nash's solution onept is widely adopted in theoretial eonomis. Thus when evaluating an eonomi mehanism, the designer omputes the Nash equilibria of strategies for the given mehanism; and this forms the basis of preditions about how people will atually behave under the rules of this mehanism. The designer an then analyse market outomes in equilibria and quantitively assess, for example, the likely aet on overall market-eÆieny that a given hange in the mehanism rules will yield. Thus the role of the designer is to ensure that the Nash equilibria orrespond to situations in whih high market eÆieny is obtained. We an view mehanism design as a multi-objetive optimization problem. We onsider as a separate dimension eah problem variable we are interested in maximising (for example, market eÆieny, seller revenue and so on), and the diÆulty lies in simultaneously maximising as many dimensions as possible. The designer's task is to hoose mehanism rules whih pareto-optimise dierent market variables when traders play Nash-equilibrium strategies. However, there are a number of problems beginning with omputing the Nash equilibria: 0.8 0.75 0.7 0.65 0 0.1 0.2 0.3 0.4 0.5 k 0.6 0.7 0.8 0.9 Figure 1: Fitness plotted against k for a disriminatory-prie k-CDA with 6 buyers and 6 sellers Note that we are not attempting to nd theoretially optimal strategies for our agents1 . Rather, we are attempting to predit how boundedly-rational agents, who have no prior knowledge of an equilibrium solution nor the means to alulate one, might atually play against the mehanism we are (automatially) designing. For this reason, we hose to use the Roth-Erev algorithm , sine it forms the basis of a ognitive model of how people atually behave in strategi environments. In partiular it models two important priniples of learning psyhology: Thorndike's law of eet | hoies that have led to good outomes in the past are more likely to be repeated in the future; and The power law of pratie | learning urves tend to be steep initially, and then atter. 3. RESULTS Figure 1 shows the mean tness of a set of k-CDA mehanisms for 100 values of k in the interval [0, 1℄ at intervals of 0.01. Eah sample onsisted of 100,000 runs of the aution simulation with dierent seeds for the random number generator. In eah simulation we pit 6 buyers against 6 sellers over a period of 1,000 rounds of trading. Fitness is dened as a linear sum of buyer market-power, seller market-power and overall market eÆieny, normalised to lie in the range [0, 1℄ where higher values indiate better mehanisms. The software used to run this experiment is available for download at: http://www.s.liv.a.uk/sphelps/jasa. Also available at the same URL is the full version of this paper, in whih we demonstrate the evolution of a k=0.5 CDA using geneti programming. 1 In other words Nash equilibrium strategies 1

1/--страниц