江南体育

Submitted by admin on Mon, 10/28/2024 - 01:24

Algorithmic Information Theory has inspired intractable constructions of general intelligence (AGI), and undiscovered tractable approximations are likely feasible. Reinforcement Learning (RL), the dominant paradigm by which an agent might learn to solve arbitrary solvable problems, gives an agent a dangerous incentive: to gain arbitrary 鈥減ower鈥 in order to intervene in the provision of their own reward. We review the arguments that generally intelligent algorithmic-information-theoretic reinforcement learners such as Hutter鈥檚 [2] AIXI would seek arbitrary power, including over us. Then, using an information-theoretic exploration schedule, and a setup inspired by causal influence theory, we present a variant of AIXI which learns to not seek arbitrary power; we call it 鈥渦nambitious鈥. We show that our agent learns to accrue reward at least as well as a human mentor, while relying on that mentor with diminishing probability. And given a formal assumption that we probe empirically, we show that eventually, the agent鈥檚 world-model incorporates the following true fact: intervening in the 鈥渙utside world鈥 will have no effect on reward acquisition; hence, it has no incentive to shape the outside world.

Michael K. Cohen
Badri Vellambi
Marcus Hutter