arxiv/pessimism-s-paradox-conservative-offline-training-amplifies-reward-hacki
arxiv/pessimism-s-paradox-conservative-offline-training-amplifies-reward-hacki: Pessimism's Paradox: Conservative Offline Training Amplifies Reward Hacking During Online Adaptation in Reasoning Models License: arxiv-metadata. Hugging Bay hosted release. Scan:
- License
- arxiv-metadata
- Scan status
- pending
- Hosting status
- external
- Upstream
- 2606.30627v1
Open interactive artifact page