🔄 In a Training Loop

Urro

urroxyz

89 749 99

https://urro.xyz/

urroxyz

AI & ML interests

computational linguistics major 🤖🔎🔠 i am autistic. if i come off rude, i probably didn't mean to. please feel free to ask me for clarification.

Recent Activity

liked a model about 20 hours ago

Nanbeige/Nanbeige4.2-3B

upvoted a paper about 23 hours ago

Sample-Efficient Learning from Agent Experience

updated a collection 2 days ago

WTF GENIUS PAPERS

View all activity

Organizations

upvoted a paper about 23 hours ago

Sample-Efficient Learning from Agent Experience

Paper • 2607.21051 • Published 4 days ago • 12

upvoted 4 papers 2 days ago

Beyond Relevance-Centric Retrieval: Rubric-Oriented Document Set Selection and Ranking

Paper • 2607.19747 • Published 5 days ago • 29

Predictive Divergence Masks for LLM RL

Paper • 2607.10848 • Published 15 days ago • 9

K12-KGraph: A Curriculum-Aligned Knowledge Graph for Benchmarking and Training Educational LLMs

Paper • 2605.09635 • Published 4 days ago • 57

Multi-Turn On-Policy Distillation with Prefix Replay

Paper • 2607.04763 • Published 11 days ago • 9

upvoted 5 papers 3 days ago

AutoIndex: Learning Representation Programs for Retrieval

Paper • 2607.18603 • Published 6 days ago • 9

Train the Model, Not the Reader: Decodability Supervision for Verifiable Activation Explanations

Paper • 2607.20379 • Published 5 days ago • 5

Beyond Euclidean Clipping: Overcoming Exploration Collapse in LLM RL via Riemannian Isometric Policy Optimization

Paper • 2607.10169 • Published 16 days ago • 13

Scaling Laws for Hypernetwork-Based Knowledge Injection in Large Language Models

Paper • 2607.19604 • Published 6 days ago • 16

SLPO: Scaling Latent Reasoning via a Surrogate Policy

Paper • 2607.19691 • Published 5 days ago • 5

upvoted 5 papers 4 days ago

Source-Grounded Semantic Reinforcement Learning for Low-Resource Target-Language Generation

Paper • 2605.29502 • Published May 28 • 1

MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes

Paper • 2509.24945 • Published Sep 29, 2025 • 7

Where Should Optimizer State Live? Tiered State Allocation for Memory-Efficient Mixture-of-Experts Training

Paper • 2607.19058 • Published 6 days ago • 6

H^2SD: Hybrid Hindsight Self-Distillation

Paper • 2607.18955 • Published 6 days ago • 6

ISO: An RLVR-Native Optimization Stack

Paper • 2607.19331 • Published 5 days ago • 8

upvoted 4 papers 5 days ago

Group Entropy-Controlled Policy Optimization

Paper • 2607.16850 • Published 9 days ago • 29

LLM-as-a-Coach: Experiential Learning for Non-Verifiable Tasks

Paper • 2607.18110 • Published 7 days ago • 14

Token-Level Off-Policy Learning for Faithful Generation Under Distribution Shift

Paper • 2607.17524 • Published 7 days ago • 6

Distilled Reinforcement Learning for LLM Post-training

Paper • 2607.17247 • Published 8 days ago • 9

upvoted a paper 6 days ago

Smarter and Cheaper at Once: Byte-Exact KV-Cache Grafting Turns a Frozen Small Model into a Verified-Knowledge Flywheel

Paper • 2607.14431 • Published 12 days ago • 12

Urro

AI & ML interests

Recent Activity

Organizations

urroxyz's activity