Qwen3.5-27B โ€” IQ4_XS GGUF

Quantized GGUF version of Qwen/Qwen3.5-27B, converted and quantized to IQ4_XS format (~13 GB) for CPU inference.

Quantized by @merileijona โ€” GitHub: juhanimerilehto


Quantization

Property Value
Format IQ4_XS
Approx. size ~13 GB
Base model Qwen/Qwen3.5-27B
Converter llama.cpp

IQ4_XS is an importance-matrix quant (imatrix). It uses calibration data to allocate bits where they matter most, giving better quality at the same size compared to standard K-quants.


Intended use

This quantization is intended for local, CPU-only inference on high-RAM workstations where GPU VRAM is insufficient to run the full model. It has not been formally benchmarked. The settings and usage notes below reflect the actual configuration used during testing.


Usage with llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="model_IQ4_XS.gguf",
    n_gpu_layers=0,     # 0 = CPU-only
    n_ctx=4096,
    n_threads=16,
    verbose=False,
)

response = llm(
    "Your prompt here",
    max_tokens=2048,
    temperature=0.7,
    top_p=0.9,
    min_p=0.01,
)
print(response["choices"][0]["text"])

Tested configuration

Setting Value
n_gpu_layers 0 (CPU-only)
n_ctx 4096
n_threads 16
temperature 0.7
top_p 0.9
min_p 0.01
max_tokens 2048

Test hardware:

  • CPU: AMD Ryzen 9 5950X (16 cores)
  • RAM: 128 GB
  • OS: Windows 11
  • GPU: Not used for inference

Token generation speed was not formally measured. The model ran stably at the settings above with no observed repetition or loop issues.


Notes

  • min_p=0.01 is recommended to prevent token loops at longer outputs
  • The F16 intermediate GGUF (~54 GB) is not included; only the final quantized file
  • For GPU-assisted inference, increase n_gpu_layers to offload layers to VRAM
Downloads last month
93
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for merileijona/Qwen3.5-27B-IQ4_XS-GGUF

Base model

Qwen/Qwen3.5-27B
Quantized
(164)
this model