Abstract

This project introduces an adapted version of Plan2Explore, where an agent builds a world model trained in a self-supervised manner. We introduce this framework to a new task Stacker from the DeepMind Control Suite. In addition to visual observations, we introduce proprioceptive information to Plan2Explore to enhance the agent’s performance. In particular, we added contact sensors information from the fingers to the input of the agent such that it can fulfill subtasks like grabbing and stacking. These subtasks are exploration rewards that should steer the agent towards a more targeted exploration of the environment and improve the performance for the desired stacker task. In addition, we introduce a simplified version of the stacker task called Push2Target, where only the x-axis is considered.