|
|
Csaji, B.C. & Monostori, L., 2008. Value Function Based Reinforcement Learning in Changing Markovian Environments, Journal of Machine Learning Research, 9, p. 1679–1709.
Abstract: The paper investigates the possibility of applying value function based reinforcement learning (RL) methods in cases when the environment may change over time. First, theorems are presented which show that the optimal value function of a discounted Markov decision process (MDP) Lipschitz continuously depends on the immediate-cost function and the transition-probability function. Dependence on the discount factor is also analyzed and shown to be non-Lipschitz. Afterwards, the concept of (e;d)-MDPs is introduced, which is a generalization of MDPs and e-MDPs. In this model the environment may change over time, more precisely, the transition function and the cost function may vary from time to time, but the changes must be bounded in the limit. Then, learning algorithms in changing environments are analyzed. A general relaxed convergence theorem for stochastic iterative algorithms is presented. We also demonstrate the results through three classical RL methods: asynchronous value iteration, Q-learning and temporal difference learning. Finally, some numerical experiments concerning changing environments are presented.
Keywords: ReinforcementLearning
|
|
|
|
Dempster, M.A.H. & Leemans, V., 2004. An Automated FX Trading System Using Adaptive Reinforcement Learning. Cambridge University Press.
Abstract: This paper introduces adaptive reinforcement learning (ARL) as the basis for a fully automated trading system application. The system is designed to trade FX markets and relies on a layered structure consisting of a machine learning algorithm, a risk management overlay and a dynamic utility optimization layer. An existing machine-learning method called recurrent reinforcement learning (RRL) was chosen as the underlying algorithm for ARL. One of the strengths of our approach is that the dynamic optimization layer makes a ¯xed choice of model tuning parameters unnecessary. It also allows for a risk-return trade-o® to be made by the user within the system. The trading system is able to make consistent gains out-of-sample while avoiding large draw-downs.
Keywords: ReinforcementLearning
|
|
|
|
Gold, C., 2003. FX Trading via Recurrent Reinforcement Learning.
Abstract: This study investigates high frequency currency trading with neural networks trained via Recurrent Reinforcement Learning (RRL). We compare the performance of single layer networks with networks having a hidden layer, and examine the impact of the fixed system parameters on performance. In general, we conclude that the trading systems may be effective, but the performance varies widely for different currency markets and this variability cannot be explained by simple statistics of the markets. Also we find that the single layer network outperforms the two layer network in this application.
Keywords: ReinforcementLearning
|
|
|
|
Kassahun, Y. & Sommer, G.. Efficient Reinforcement Learning Through Evolutionary Acquisition of Neural Topologies.
Abstract: In this paper we present a novel method, called Evolutionary Acquisition of Neural Topologies (EANT), of evolving the structure and weights of neural networks. The method introduces an ecient and compact genetic encoding of a neural network onto a linear genome that enables one to evaluate the network without decoding it. The method explores new structures whenever it is not possible to further exploit the structures found so far. This enables it to nd minimal neural structures for solving a given learning task. We tested the algorithm on a benchmark control task and found it to perform very well.
Keywords: ReinforcementLearning
|
|
|
|
Moody, J., Wu, L., Liao, Y. & Saffell, M., 1998. Performance Functions and Reinforcement Learning for Trading Systems and Portfolios, Journal of Forecasting, 17, p. 441–470.
Abstract: We propose to train trading systems and portfolios by optimizing objective functions that directly measure trading and investment performance. Rather than basing a trading system on forecasts or training via a supervised learning algorithm using labelled trading data, we train our systems using recurrent reinforcement learning (RRL) algorithms. The performance functions that we consider for reinforcement learning are profit or wealth, economic utility, the Sharpe ratio and our proposed differential Sharpe ratio. The trading and portfolio management systems require prior decisions as input in order to properly take into account the effects of transactions costs, market impact and taxes. This temporal dependence on system state requires the use of reinforcement versions of standard recurrent learning algorithms. We present empirical results in controlled experiments that demonstrate the efficacy of some of our methods for optimizing trading systems and portfolios. For a long/short trader, we find that maximizing the differential Sharpe ratio yields more consistent results than maximizing profits, and that both methods outperform a trading system based on forecasts that minimize MSE. We find that portfolio traders trained to maximize the differential Sharpe ratio achieve better risk-adjusted returns than those trained to maximize profit. Finally, we provide simulation results for an S&P 500 / TBill asset allocation system that demonstrate the presence of out-of-sample predictability in the monthly S&P 500 stock index for the 25 year period 1970 through 1994.
Keywords: ReinforcementLearning
|
|
|
|
Polani, D. & Miikkulainen, R., 1999. Fast Reinforcement Learning through Eugenic Neuro-Evolution. University of Mainz.
Keywords: NeuralNets; ReinforcementLearning
|
|
|
|
Whiteson, S. & Stone, P., 2006. Evolutionary Function Approximation for Reinforcement Learning, Journal of Machine Learning Research, 7, p. 877–917.
Abstract: Temporal difference methods are theoretically grounded and empirically effective methods for addressing reinforcement learning problems. In most real-world reinforcement learning tasks, TD methods require a function approximator to represent the value function. However, using function approximators requires manually making crucial representational decisions. This paper investigates evolutionary function approximation, a novel approach to automatically selecting function approximator representations that enable efficient individual learning. This method evolves individuals that are better able to learn. We present a fully implemented instantiation of evolutionary function approximation which combines NEAT, a neuroevolutionary optimization technique, with Q-learning, a popular TD method. The resulting NEAT+Q algorithm automatically discovers effective representations for neural network function approximators. This paper also presents on-line evolutionary computation, which improves the on-line performance of evolutionary computation by borrowing selection mechanisms used in TD methods to choose individual actions and using them in evolutionary computation to select policies for evaluation. We evaluate these contributions with extended empirical studies in two domains: 1) the mountain car task, a standard reinforcement learning benchmark on which neural network function approximators have previously performed poorly and 2) server job scheduling, a large probabilistic domain drawn from the field of autonomic computing. The results demonstrate that evolutionary function approximation can significantly improve the performance of TD methods and on-line evolutionary computation can significantly improve evolutionary methods. This paper also presents additional tests that offer insight into what factors can make neural network function approximation difficult in practice.
Keywords: ReinforcementLearning
|
|