Abstract: With extensive pretrained knowledge and high-level general capabilities, large language models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in aspects, such as ...
This paper proposes an exploration-efficient deep reinforcement learning with reference (DRLR) policy framework for learning robotics tasks incorporating demonstrations. The DRLR framework is ...
An overview of our research on agentic RL. In this work, we systematically investigate three dimensions of agentic RL: data, algorithms, and reasoning modes. Our findings reveal: Real end-to-end ...
The path planning capability of autonomous robots in complex environments is crucial for their widespread application in the real world. However, long-term decision-making and sparse reward signals ...
Abstract: Safe reinforcement learning (Safe RL) aims to learn policies capable of learning and adapting within complex environments while ensuring actions remain free from catastrophic consequences.
verl is a flexible, efficient and production-ready RL training library for large language models (LLMs). verl is the open-source version of HybridFlow: A Flexible and Efficient RLHF Framework paper.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results