STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability Paper β’ 2606.19236 β’ Published 2 days ago β’ 8
Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models Paper β’ 2603.01571 β’ Published Mar 2 β’ 34
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper β’ 2603.01562 β’ Published Mar 2 β’ 64
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct Paper β’ 2308.09583 β’ Published Aug 18, 2023 β’ 8
WizardLM: Empowering Large Language Models to Follow Complex Instructions Paper β’ 2304.12244 β’ Published Apr 24, 2023 β’ 14
WizardCoder: Empowering Code Large Language Models with Evol-Instruct Paper β’ 2306.08568 β’ Published Jun 14, 2023 β’ 34