Sleeping Agents 2 SciEval Leaderboard 🥇 2 Open, science-focus leaderboards benchmarking LLMs and VLMs
SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence Paper • 2512.22334 • Published Dec 26, 2025 • 36