QIMMA Leaderboard

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

amztheory updated a dataset 10 days ago

qimma/MCQ_ArabicMMLU_org

amztheory published a dataset 10 days ago

qimma/MCQ_ArabicMMLU_org

amztheory updated a dataset 10 days ago

qimma/MCQ_MedAraBench_org

View all activity

Organization Card

Community About org cards

QIMMA ⛰️ — A quality-first Arabic LLM Leaderboard that evaluates and compares the performance of Arabic Large Language Models.

About

QIMMA قمّة (Summit in Arabic) is a quality-assured Arabic LLM evaluation leaderboard built on 14 carefully chosen benchmarks spanning STEM, legal reasoning, medical knowledge, poetry, cultural understanding, and code generation. QIMMA includes over 52,000 quality-validated samples across multiple-choice, generative, and code evaluation tracks. Over 99% of QIMMA's content is native Arabic, ensuring authentic linguistic and cultural assessment rather than relying on translated materials. The leaderboard is powered by a fully automated GPU evaluation pipeline, generously supported by TII (Technology Innovation Institute), running on H100 infrastructure. Every submitted model goes through the same pipeline under the same conditions.

Submit Your Model

Submissions are open to everyone. To submit a model through the leaderboard space QIMMA leaderboard. Set model type to base for base models or instruct for chat/instruction-tuned models.

Evaluation Queue & Delays

We run a shared GPU evaluation queue. Jobs are picked up automatically every few minutes, but evaluation times vary depending on model size and current queue load — please expect delays of several hours during busy periods. You can track your submission status at any time by checking the status field in the leaderboard-requests dataset:

Status	Meaning
`submitted`	Request received, waiting to be picked up
`pending`	Job dispatched, queued on GPU cluster
`running`	Evaluation actively in progress
`finished`	Results published to the leaderboard
`failed`	Something went wrong

If your submission has been stuck for more than 24 hours or you have any questions, open a thread in the Discussion tab and we'll look into it.

Collections 1

spaces 2

Qimma Leaderboard

📊

Qimma leaderboard

models 0

None public yet

datasets 20

QIMMA Leaderboard

AI & ML interests

Recent Activity

QIMMA ⛰️ — A quality-first Arabic LLM Leaderboard that evaluates and compares the performance of Arabic Large Language Models.

About

Submit Your Model

Evaluation Queue & Delays

Collections 1

qimma/MCQ_ArabCulture

qimma/MCQ_AraDiCE-Culture

qimma/MCQ_ArabicMMLU

qimma/QA_MedArabiQ

qimma/MCQ_ArabCulture

qimma/MCQ_AraDiCE-Culture

qimma/MCQ_ArabicMMLU

qimma/QA_MedArabiQ

spaces 2

Qimma Leaderboard

models 0

datasets 20

qimma/MCQ_ArabicMMLU_org

qimma/MCQ_MedAraBench_org

qimma/MCQ_PalmX_org

qimma/MCQ_MizanQA_org

qimma/leaderboard-requests

qimma/leaderboard-details

qimma/leaderboard-results

qimma/MCQ_AraDiCE-Culture

qimma/MCQ_3LM-STEM

qimma/MCQ_MizanQA

AI & ML interests

Recent Activity

Team members 9

QIMMA ⛰️ — A quality-first Arabic LLM Leaderboard that evaluates and compares the performance of Arabic Large Language Models.

About

Submit Your Model

Evaluation Queue & Delays

Collections 1

spaces 2 Sort: Recently updated

Qimma Leaderboard

models 0

datasets 20 Sort: Recently updated

spaces 2

datasets 20