Quark-50m-Instruct

Quark-50m-Instruct is a small (≈56M parameters) decoder-only language model, fine-tuned for instruction following. It is built on the same architecture of “SmolLM” family and was fully pretrained on 5 billion tokens from HuggingFaceTB/smollm‑corpus.

  • Model type: Causal Language Model (LLaMA‑style decoder)
  • Architecture: GQA · SwiGLU · RMSNorm · RoPE · Weight‑tying
  • Pretraining tokens: 5 B
  • Fine‑tuning: Instruction‑tuned (details below)
  • Creators: OvercastLab (research & development lab for ML/AI)
  • Release date: 22 April 2026

Model Summary

Quark-50m-Instruct is designed to be an efficient assistant that can run on consumer GPUs (e.g., RTX 3070 with 8 GB VRAM) and even on CPU for light workloads. It is not competitive with large models on knowledge‑intensive tasks, but it excels at:

  • Simple conversational tasks
  • Code generation and explanation (Python)
  • Short text rewriting and summarisation
  • On‑device / edge inference

The architecture closely follows the efficient‑small‑LM blueprint popularised by SmolLM:

Component Details
Vocab size 49,152
Hidden size 384
Layers 24
Attention Grouped Query (6 Q heads, 2 KV heads)
FFN SwiGLU with 1,024 intermediate
Position RoPE (θ = 10,000)
Normalisation RMSNorm (pre‑block)

Total trainable parameters: ≈48 M (with weight tying).

Benchmark Evaluation Metrics

Category Benchmark Metric Score / Value Status
Linguistics & Grammar BLiMP Accuracy 68.12% Success
Commonsense & Reasoning PIQA Normalized Accuracy 57.83% Success
COPA Accuracy 57.00% Success
BoolQ Accuracy 52.17% Success
WinoGrande Accuracy 47.36% Success
HellaSwag Normalized Accuracy 28.49% Success
RACE Accuracy 26.41% Success
CommonsenseQA Accuracy 20.31% Success
Academic & Knowledge SciQ Normalized Accuracy 49.00% Success
ARC-Easy Normalized Accuracy 36.49% Success
MMLU Accuracy 25.64% Success
ARC-Challenge Normalized Accuracy 25.17% Success
OpenBookQA Normalized Accuracy 25.40% Success
Language Modeling LAMBADA Accuracy 15.87% Success
WikiText-2 Word Perplexity 251.76 Success

Note: The Arithmetic benchmark failed due to outdated script support (arithmetic.py), and SocialIQA failed due to a registration tag error (siqa). Total baseline execution completed successfully for all other 15 tasks.

Uses

Direct Use

The model can be used via the 🤗 Transformers library for standard text generation. It expects chat‑formatted input (see example below).

Downstream Use

Because of the open Apache‑2.0 license, you may fine‑tune Quark-50m‑Instruct on your own data for domain‑specific tasks – for instance, a customer‑support bot, a code reviewer, or a story writer.

Limitations

  • Limited world knowledge (stopped at mid‑2025 pretraining data).
  • Short context window (2,048 tokens).
  • Small size means it can make more factual mistakes than larger models.

How to Get Started

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "ThingAI/Quark-50m-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

messages = [
    {"role": "system", "content": "You are Quark, a helpful assistant."},
    {"role": "user", "content": "Explain group query attention in one sentence."}
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
2,473
Safetensors
Model size
56.7M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ThingAI/Quark-50m

Quantizations
1 model

Dataset used to train ThingAI/Quark-50m

Spaces using ThingAI/Quark-50m 2

Collection including ThingAI/Quark-50m