Curiosity Killed or Incapacitated the Cat and the Asymptotically Optimal Agent
Reinforcement learners are agents that learn to pick actions that lead to high reward. Ideally, the value of a reinforcement learners policy approaches optimalitywhere the optimal informed policy is the one which maximizes reward. Unfortunately, we show that if an agent is guaranteed to be asymptotically optimal in any (stochastically computable) environment, then subject to an assumption about the true environment, this agent will be either destroyed or incapacitated with probability 1.