Reinforcement Learning Code in Python

I tried vibe coding an app as a beginner - here's what Cursor and Replit taught me

I tried four vibe-coding tools, including Cursor and Replit, with no coding background. Here's what worked (and what didn't).

IEEE

Preference-Based Multi-Objective Reinforcement Learning

Abstract: Multi-objective reinforcement learning (MORL) is a structured approach for optimizing tasks with multiple objectives. However, it often relies on pre-defined reward functions, which can be ...

GitHub

Demystifying Reinforcement Learning in Agentic Reasoning

An overview of our research on agentic RL. In this work, we systematically investigate three dimensions of agentic RL: data, algorithms, and reasoning modes. Our findings reveal: Real end-to-end ...

GitHub

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning

We are excited to release the CapRL 2.0 series: CapRL-Qwen3VL-2B and CapRL-Qwen3VL-4B. These models feature fewer parameters while delivering even more powerful captioning performance. Notably, ...

IEEE

Generalizable Offline Multiobjective Reinforcement Learning via Preference-Conditioned Diffuser

Abstract: Multiobjective reinforcement learning (MORL) addresses sequential decision-making problems with multiple objectives by learning policies optimized for diverse pReferences. While traditional ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results