QIMMA Leaderboard
AI & ML interests
None defined yet.
Recent Activity
QIMMA ⛰️ — A quality-first Arabic LLM Leaderboard that evaluates and compares the performance of Arabic Large Language Models.
About
QIMMA قمّة (Summit in Arabic) is a quality-assured Arabic LLM evaluation leaderboard built on 14 carefully chosen benchmarks spanning STEM, legal reasoning, medical knowledge, poetry, cultural understanding, and code generation. QIMMA includes over 52,000 quality-validated samples across multiple-choice, generative, and code evaluation tracks. Over 99% of QIMMA's content is native Arabic, ensuring authentic linguistic and cultural assessment rather than relying on translated materials. The leaderboard is powered by a fully automated GPU evaluation pipeline, generously supported by TII (Technology Innovation Institute), running on H100 infrastructure. Every submitted model goes through the same pipeline under the same conditions.
Submit Your Model
Submissions are open to everyone. To submit a model through the leaderboard space QIMMA leaderboard.
Set model type to base for base models or instruct for chat/instruction-tuned models.
Evaluation Queue & Delays
We run a shared GPU evaluation queue. Jobs are picked up automatically every few minutes, but evaluation times vary depending on model size and current queue load — please expect delays of several hours during busy periods.
You can track your submission status at any time by checking the status field in the leaderboard-requests dataset:
| Status | Meaning |
|---|---|
submitted |
Request received, waiting to be picked up |
pending |
Job dispatched, queued on GPU cluster |
running |
Evaluation actively in progress |
finished |
Results published to the leaderboard |
failed |
Something went wrong — will retry automatically |
If your submission has been stuck for more than 24 hours or you have any questions, open a thread in the Discussion tab and we'll look into it.