Optimal Off-Policy Evaluation from Multiple Logging Policies
Nathan Kallus * 1 Yuta Saito *1
Masatoshi Uehara * 1
Abstract In most of the above studies, the observations used to evalu-
ate a new policy are assumed generated by a single logging
We study off-policy evaluation (OPE) from multi- policy. Often, however, we have the opportunity to leverage
ple logging policie ...


雷达卡




京公网安备 11010802022788号







