《Towards Inverse Reinforcement Learning for Limit Order Book Dynamics》
---
作者:
Jacobo Roa-Vicens, Cyrine Chtourou, Angelos Filos, Francisco Rullan,
Yarin Gal, Ricardo Silva
---
最新提交年份:
2019
---
英文摘要:
Multi-agent learning is a promising method to simulate aggregate competitive behaviour in finance. Learning expert agents\' reward functions through their external demonstrations is hence particularly relevant for subsequent design of realistic agent-based simulations. Inverse Reinforcement Learning (IRL) aims at acquiring such reward functions through inference, allowing to generalize the resulting policy to states not observed in the past. This paper investigates whether IRL can infer such rewards from agents within real financial stochastic environments: limit order books (LOB). We introduce a simple one-level LOB, where the interactions of a number of stochastic agents and an expert trading agent are modelled as a Markov decision process. We consider two cases for the expert\'s reward: either a simple linear function of state features; or a complex, more realistic non-linear function. Given the expert agent\'s demonstrations, we attempt to discover their strategy by modelling their latent reward function using linear and Gaussian process (GP) regressors from previous literature, and our own approach through Bayesian neural networks (BNN). While the three methods can learn the linear case, only the GP-based and our proposed BNN methods are able to discover the non-linear reward case. Our BNN IRL algorithm outperforms the other two approaches as the number of samples increases. These results illustrate that complex behaviours, induced by non-linear reward functions amid agent-based stochastic scenarios, can be deduced through inference, encouraging the use of inverse reinforcement learning for opponent-modelling in multi-agent systems.
---
中文摘要:
多智能体学习是一种很有前途的模拟金融业中群体竞争行为的方法。因此,通过外部演示学习专家代理的奖励函数对于基于代理的真实仿真的后续设计特别相关。反向强化学习(IRL)旨在通过推理获得此类奖励函数,从而将产生的策略推广到过去未观察到的状态。本文研究了IRL是否能够从真实金融随机环境中的代理人那里推断出这样的回报:限价订单簿(LOB)。我们引入了一个简单的一级LOB,其中多个随机代理和一个专家交易代理的交互被建模为一个马尔可夫决策过程。我们考虑专家报酬的两种情况:要么是状态特征的简单线性函数;或者一个复杂的,更现实的非线性函数。鉴于专家代理的演示,我们试图通过使用先前文献中的线性和高斯过程(GP)回归器以及我们自己通过贝叶斯神经网络(BNN)的方法来建模他们的潜在回报函数来发现他们的策略。虽然这三种方法都可以学习线性案例,但只有基于GP的方法和我们提出的BNN方法能够发现非线性奖励案例。随着样本数的增加,我们的BNN IRL算法的性能优于其他两种方法。这些结果表明,在基于agent的随机场景中,非线性奖励函数诱导的复杂行为可以通过推理推断出来,从而鼓励在多agent系统中使用逆强化学习进行对手建模。
---
分类信息:
一级分类:Computer Science 计算机科学
二级分类:Machine Learning 机器学习
分类描述:Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文(有监督的,无监督的,强化学习,强盗问题,等等),包括健壮性,解释性,公平性和方法论。对于机器学习方法的应用,CS.LG也是一个合适的主要类别。
--
一级分类:Quantitative Finance 数量金融学
二级分类:Trading and Market Microstructure 交易与市场微观结构
分类描述:Market microstructure, liquidity, exchange and auction design, automated trading, agent-based modeling and market-making
市场微观结构,流动性,交易和拍卖设计,自动化交易,基于代理的建模和做市
--
一级分类:Statistics 统计学
二级分类:Machine Learning 机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
---
PDF下载:
-->