Recent advances in on-policy reinforcement learning (RL) methods enabled learning agents in virtual environments to master complex tasks with high-dimensional and continuous observation and action spaces. However, leveraging this family of algorithms in multi-fingered robotic grasping remains a challenge due to large sim-to-real fidelity gaps and the high sample complexity of on-policy RL algorithms. This work aims to bridge these gaps by first reinforcement-learning a multi-fingered robotic grasping policy in simulation that operates in the pixel space of the input: a single depth image. Using a mapping from pixel space to Cartesian space according to the depth map, this method transfers to the real world with high fidelity and introduces a novel attention mechanism that substantially improves grasp success rate in cluttered environments. Finally, the direct-generative nature of this method allows learning of multi-fingered grasps that have flexible end-effector positions, orientations and rotations, as well as all degrees of freedom of the hand.
同策略强化学习(RL)方法的最新进展使虚拟环境中的学习智能体能够掌握具有高维连续观测和动作空间的复杂任务。然而,由于模拟与现实的保真度差距较大以及同策略强化学习算法的高样本复杂性,在多指机器人抓取中应用这类算法仍然是一个挑战。这项工作旨在通过首先在模拟中强化学习一种在输入像素空间(单个深度图像)中运行的多指机器人抓取策略来弥合这些差距。利用根据深度图从像素空间到笛卡尔空间的映射,该方法能高保真地迁移到现实世界,并引入了一种新颖的注意力机制,大大提高了在杂乱环境中的抓取成功率。最后,这种方法的直接生成特性允许学习具有灵活的末端执行器位置、方向和旋转以及手部所有自由度的多指抓取。