Leveraging Factored Action Spaces for Off-Policy Evaluation


In high-stakes decision-making domains such as healthcare and self-driving cars, off-policy evaluation (OPE) can help practitioners understand the performance of a new policy before deployment by using observational data. However, when dealing with problems involving large and combinatorial action spaces, existing OPE estimators often suffer from substantial bias and/or variance. In this work, we investigate the role of factored action spaces in improving OPE. Specifically, we propose and study a new family of decomposed IS estimators that leverage the inherent factorisation structure of actions. We theoretically prove that our proposed estimator achieves lower variance and remains unbiased, subject to certain assumptions regarding the underlying problem structure. Empirically, we demonstrate that our estimator outperforms standard IS in terms of mean squared error and conduct sensitivity analyses probing the validity of various assumptions. Future work should investigate how to design or derive the factorisation for practical problems so as to maximally adhere to the theoretical assumptions.

ICML 2023 Workshop on Counterfactuals in Minds and Machines; ICML 2023 Workshop on New Frontiers in Learning, Control, and Dynamical Systems
Shengpu Tang
Shengpu Tang
PhD candidate