Curiosity-Driven Exploration<br>via Latent Bayesian Surprise

Abstract

The human intrinsic desire to pursue knowledge, also known as curiosity, is considered essential in the process of skill acquisition. With the aid of artificial curiosity, we could equip current techniques for control, such as Reinforcement Learning, with more natural exploration capabilities. A promising approach in this respect has consisted of using Bayesian surprise on model parameters, i.e. a metric for the difference between prior and posterior beliefs, to favour exploration. In this contribution, we propose to apply Bayesian surprise in a latent space representing the agent’s current understanding of the dynamics of the system, drastically reducing the computational costs. We extensively evaluate our method by measuring the agent’s performance in terms of environment exploration, for continuous tasks, and looking at the game scores achieved, for video games. Our model is computationally cheap and compares positively with current state-of-the-art methods on several problems. We also investigate the effects caused by stochasticity in the environment, which is often a failure case for curiosity-driven agents. In this regime, the results suggest that our approach is resilient to stochastic transitions.

Visual Control Zero-shot Benchmark

This experiment evaluates the performance in a zero-shot learning setting (also see Sekar et al. 2020 for details).

The model and the agent are trained without rewards, collecting data through exploration. Some of the curiosity-driven behaviours learned in this phase with Latent Bayesian Surprise are showed in the GIF, under the Exploration column.

Along the exploration process, snapshots of the agent’s model are used to train a task policy on the final task and plot its zero-shot performance. These behaviours are shown under the Task column.

Exploration

Task

Acrobot Swingup

Cartpole Balance

Cartpole Swingup

Cheetah Run

Cup Catch

Finger Spin

Finger Turn Easy

Finger Turn Hard

Hopper Stand

Hopper Hop

Pendulum Swingup

Quadruped Walk

Quadruped Run

Reacher Easy

Reacher Hard

Walker Stand

Walker Walk

Walker Run

Arcade Games Experiments

The curves show the game score achieved during an episode of training. Agents learn using only the intrinsic motivation signal.

To incentivize comparison against our baseline, we make public the data used in the plots, which can be easily integrated with the original Large-Scale Study of Curiosity-Driven Learning open-source implementation.

Download Data [.zip]

They follow videos of the agents playing the games, driven only by their curiosity.

Curiosity-Driven Exploration
via Latent Bayesian Surprise

Pietro Mazzaglia, Ozan Catal, Tim Verbelen, Bart Dhoedt
IDLab, Ghent University
Accepted to AAAI-22 (oral)
Presented at Self-supervision for Reinforcement Learning workshop (ICLR 2021)

Abstract

Visual Control Zero-shot Benchmark

Arcade Games Experiments

BeamRider

Breakout

Montezuma Revenge

Pong

Qbert

Riverraid

Seaquest

Space Invaders

Super Mario Bros. (1)

Super Mario Bros. (2)