CRITICAL FIX (2026-03-19): Fixed eos_token_id — previous versions caused infinite thinking loops. You MUST re-download this model if you downloaded before today.
Update (2026-03-18): Models updated to v2.1 with VLM support and fixed configs. Re-download if you got this before today.
MLX Studio — the only app that natively supports JANG models
Early Adoption: LM Studio, Ollama, oMLX, Inferencer do not support JANG yet. Use MLX Studio or
pip install "jang[mlx]".
Qwen3.5-122B-A10B — JANG_3L (3-bit, 8-bit attention) — VLM
JANG — Jang Adaptive N-bit Grading | Mixed-Precision Quantization for Apple Silicon
JANG is fully open-source. Quantization engine, research, and full commit history: github.com/jjang-ai/jangq. Created by Jinho Jang.
Results (200-question MMLU)
| Model | MMLU | Size | Speed |
|---|---|---|---|
| JANG_4K | 86% | 69 GB | ~50 tok/s |
| JANG_3L | 81.5% | 49 GB | 49.6 tok/s |
| JANG_2S | 79% | 35 GB | 54 tok/s |
| MLX 4-bit | 85% | 64 GB | — |
| MLX 2-bit | 56.5% | 36 GB | — |
JANG_3L: 81.5% at 49 GB — fits on 64 GB Macs. 8-bit attention preserves quality at 3-bit compression.
Per-Subject Scores
| Subject | JANG_3L |
|---|---|
| Abstract Algebra | 12/20 |
| Anatomy | 18/20 |
| Astronomy | 19/20 |
| College CS | 15/20 |
| College Physics | 15/20 |
| HS Biology | 19/20 |
| HS Chemistry | 18/20 |
| HS Mathematics | 11/20 |
| Logical Fallacies | 18/20 |
| World Religions | 18/20 |
| Total | 163/200 = 81.5% |
Specs
| Metric | Value |
|---|---|
| Source | Qwen3.5-122B-A10B |
| Architecture | MoE (256 experts, 8 active) + GatedDeltaNet SSM |
| Profile | JANG_3L (CRITICAL=8, IMPORTANT=4, COMPRESS=3) |
| Average bits | ~3.08 |
| GPU Memory | 48.5 GB |
| Speed | 49.6 tok/s |
| VLM | Yes (vision encoder preserved) |
| Format | v2 (MLX-native, instant load) |
Install
pip install "jang[mlx]"
Quick Start
from jang_tools.loader import load_jang_model
from mlx_lm.sample_utils import make_sampler
from mlx_lm.generate import generate_step
import mlx.core as mx
model, tokenizer = load_jang_model("JANGQ-AI/Qwen3.5-122B-A10B-JANG_3L")
sampler = make_sampler(temp=0.7)
tokens = tokenizer.encode("What is photosynthesis?")
for tok, _ in generate_step(prompt=mx.array(tokens), model=model, max_tokens=200, sampler=sampler):
t = tok.item() if hasattr(tok, 'item') else int(tok)
print(tokenizer.decode([t]), end="", flush=True)
if t == tokenizer.eos_token_id: break
VLM Inference
from jang_tools.loader import load_jang_vlm_model
from mlx_vlm import generate
model, processor = load_jang_vlm_model("JANGQ-AI/Qwen3.5-122B-A10B-JANG_3L")
prompt = processor.tokenizer.apply_chat_template(
[{"role": "user", "content": [
{"type": "image", "image": "photo.jpg"},
{"type": "text", "text": "Describe this image."}
]}], add_generation_prompt=True, tokenize=False, enable_thinking=False)
result = generate(model, processor, prompt, ["photo.jpg"], max_tokens=200)
print(result.text)
Links
- GitHub | HuggingFace | MLX Studio | PyPI
한국어
Qwen3.5-122B — JANG_3L
JANG은 Apple Silicon을 위한 혼합정밀도 양자화 포맷입니다.
| 모델 | MMLU | 크기 | 속도 |
|---|---|---|---|
| JANG_3L | 81.5% | 49 GB | 49.6 tok/s |
pip install "jang[mlx]"
GitHub · HuggingFace · MLX Studio
장진호 제작 · Created by Jinho Jang — jangq.ai · @dealignai
- Downloads last month
- 617
Quantized
Model tree for JANGQ-AI/Qwen3.5-122B-A10B-JANG_3L
Base model
Qwen/Qwen3.5-122B-A10B
