top of page

Approximate information for efficient exploration-exploitation strategies

  • cyrilrenassia
  • May 22, 2024
  • 1 min read

PHYSICAL REVIEW


Alex Barbier-Chebbah*, Christian L. Vestergaard , and Jean-Baptiste Masson†


Abstract


This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multiarmed bandit problems. These involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a class of algorithms, approximate information maximization (AIM), which employs a carefully chosen analytical approximation to the gradient of the entropy to choose which arm to pull at each point in time. AIM matches the performance of Thompson sampling, which is known to be asymptotically optimal, as well as that of Infomax from which it derives. AIM thus retains the advantages of Infomax while also offering enhanced computational speed, tractability, and ease of implementation. In particular, we demonstrate how to apply it to a 50-armed bandit game. Its expression is tunable, which allows for specific optimization in various settings, making it possible to

surpass the performance of Thompson sampling at short and intermediary times.


More information at DOI: 10.1103/PhysRevE.109.L052105

Recent Posts

See All

Comments


bottom of page