摘要翻译:
本文研究了有限销售季节下的动态品种优化问题。在每个时间段内,销售者在基数约束下向到达的顾客提供一系列可替代产品,顾客根据一个离散选择模型在所提供的产品中进行购买。现有的工作大多将每个产品与一个实值固定均值效用相关联,并假设一个多项式logit选择(MNL)模型。在许多实际应用中,产品的特征/上下文信息是很容易获得的。在本文中,我们通过假设均值效用与特征之间的线性关系来引入特征信息。另外,我们允许产品的特征信息随时间变化,使得底层的选择模型也可以是非平稳的。为了解决这种变化的MNL模型下的动态优化问题,我们需要同时学习潜在的未知系数并做出优化决策。为此,我们提出了一个基于上置信限(UCB)的策略,并建立了以$\wideTilde O(d\sqrt{T})$为序的遗憾界,其中$d$是特征的维数,$\wideTilde O$抑制对数依赖性。我们进一步建立了下限$\omega(d\sqrt{T}/k)$,其中$k$是提供的分类的基数约束,通常很小。当$k$是一个常数时,我们的策略是最优的,直到对数因子。在UCB算法的开发阶段,我们需要根据学习到的信息求解一个组合优化的分类优化问题。我们进一步发展了一个近似算法和一个高效的贪婪启发式。数值研究进一步证明了该策略的有效性。
---
英文标题:
《Dynamic Assortment Optimization with Changing Contextual Information》
---
作者:
Xi Chen, Yining Wang, Yuan Zhou
---
最新提交年份:
2019
---
分类信息:
一级分类:Economics 经济学
二级分类:Econometrics 计量经济学
分类描述:Econometric Theory, Micro-Econometrics, Macro-Econometrics, Empirical Content of Economic Relations discovered via New Methods, Methodological Aspects of the Application of Statistical Inference to Economic Data.
计量经济学理论,微观计量经济学,宏观计量经济学,通过新方法发现的经济关系的实证内容,统计推论应用于经济数据的方法论方面。
--
一级分类:Computer Science 计算机科学
二级分类:Machine Learning 机器学习
分类描述:Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.
关于机器学习研究的所有方面的论文(有监督的,无监督的,强化学习,强盗问题,等等),包括健壮性,解释性,公平性和方法论。对于机器学习方法的应用,CS.LG也是一个合适的主要类别。
--
一级分类:Statistics 统计学
二级分类:Machine Learning 机器学习
分类描述:Covers machine learning papers (supervised, unsupervised, semi-supervised learning, graphical models, reinforcement learning, bandits, high dimensional inference, etc.) with a statistical or theoretical grounding
覆盖机器学习论文(监督,无监督,半监督学习,图形模型,强化学习,强盗,高维推理等)与统计或理论基础
--
---
英文摘要:
In this paper, we study the dynamic assortment optimization problem under a finite selling season of length $T$. At each time period, the seller offers an arriving customer an assortment of substitutable products under a cardinality constraint, and the customer makes the purchase among offered products according to a discrete choice model. Most existing work associates each product with a real-valued fixed mean utility and assumes a multinomial logit choice (MNL) model. In many practical applications, feature/contexutal information of products is readily available. In this paper, we incorporate the feature information by assuming a linear relationship between the mean utility and the feature. In addition, we allow the feature information of products to change over time so that the underlying choice model can also be non-stationary. To solve the dynamic assortment optimization under this changing contextual MNL model, we need to simultaneously learn the underlying unknown coefficient and makes the decision on the assortment. To this end, we develop an upper confidence bound (UCB) based policy and establish the regret bound on the order of $\widetilde O(d\sqrt{T})$, where $d$ is the dimension of the feature and $\widetilde O$ suppresses logarithmic dependence. We further established the lower bound $\Omega(d\sqrt{T}/K)$ where $K$ is the cardinality constraint of an offered assortment, which is usually small. When $K$ is a constant, our policy is optimal up to logarithmic factors. In the exploitation phase of the UCB algorithm, we need to solve a combinatorial optimization for assortment optimization based on the learned information. We further develop an approximation algorithm and an efficient greedy heuristic. The effectiveness of the proposed policy is further demonstrated by our numerical studies.
---
PDF链接:
https://arxiv.org/pdf/1810.13069


雷达卡



京公网安备 11010802022788号







