I tried four vibe-coding tools, including Cursor and Replit, with no coding background. Here's what worked (and what didn't).
Abstract: Multi-objective reinforcement learning (MORL) is a structured approach for optimizing tasks with multiple objectives. However, it often relies on pre-defined reward functions, which can be ...
An overview of our research on agentic RL. In this work, we systematically investigate three dimensions of agentic RL: data, algorithms, and reasoning modes. Our findings reveal: Real end-to-end ...
We are excited to release the CapRL 2.0 series: CapRL-Qwen3VL-2B and CapRL-Qwen3VL-4B. These models feature fewer parameters while delivering even more powerful captioning performance. Notably, ...
Abstract: Multiobjective reinforcement learning (MORL) addresses sequential decision-making problems with multiple objectives by learning policies optimized for diverse pReferences. While traditional ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results