GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts Paper • 2604.12978 • Published 3 days ago • 5
Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness Paper • 2604.12373 • Published 3 days ago • 7
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation Paper • 2604.09497 • Published 7 days ago • 26
TradingAgents: Multi-Agents LLM Financial Trading Framework Paper • 2412.20138 • Published Dec 28, 2024 • 45
Kronos: A Foundation Model for the Language of Financial Markets Paper • 2508.02739 • Published Aug 2, 2025 • 19
The Depth Ceiling: On the Limits of Large Language Models in Discovering Latent Planning Paper • 2604.06427 • Published 10 days ago • 11
view article Article How I contributed a new model to the Transformers library using Codex 17 days ago • 46
Reasoning Shift: How Context Silently Shortens LLM Reasoning Paper • 2604.01161 • Published 15 days ago • 31
view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand Dec 4, 2025 • 68
gpt-oss-safeguard Collection gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are safety reasoning models built-upon gpt-oss • 2 items • Updated Oct 29, 2025 • 67
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 513
Leaderboards and benchmarks ✨ Collection Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 88 items • Updated Mar 2 • 117