XYX's picture

XYX

xuyd16

·

AI & ML interests

None yet

Recent Activity

authored a paper 15 days ago

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

upvoted a paper 15 days ago

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

submitted a paper 15 days ago

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

View all activity

Organizations

None yet

authored a paper 15 days ago

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

Paper • 2605.12483 • Published 16 days ago • 10

submitted a paper to Daily Papers 15 days ago

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

Paper • 2605.12483 • Published 16 days ago • 10

submitted a paper to Daily Papers about 1 month ago

TIP: Token Importance in On-Policy Distillation

Paper • 2604.14084 • Published Apr 15 • 15

submitted a paper to Daily Papers 3 months ago

PACED: Distillation at the Frontier of Student Competence

Paper • 2603.11178 • Published Mar 11 • 4

authored 4 papers 3 months ago

Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning

Paper • 2602.21420 • Published Feb 24 • 6

On-Policy Self-Distillation for Reasoning Compression

Paper • 2603.05433 • Published Mar 5 • 9

Not all tokens are needed(NAT): token efficient reinforcement learning

Paper • 2603.06619 • Published Feb 20 • 1

PACED: Distillation at the Frontier of Student Competence

Paper • 2603.11178 • Published Mar 11 • 4