FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions Paper β’ 2509.17177 β’ Published Sep 21, 2025 β’ 13
CCI4.0: A Bilingual Pretraining Dataset for Enhancing Reasoning in Large Language Models Paper β’ 2506.07463 β’ Published Jun 9, 2025 β’ 12