HIGGS-per-tensor
updated
meta-llama/Llama-3.2-1B-Instruct
Text Generation
• 1B • Updated • 8.29M
• • 1.46k
inference-optimization/Llama-3.2-1B-Instruct-FP8-Dynamic
inference-optimization/Llama-3.2-1B-Instruct-NVFP4
0.8B • Updated • 86
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-heuristic-per-tensor
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-hybrid-per-tensor
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-noise-per-tensor
inference-optimization/Llama-3.2-1B-Instruct-5.5-bits-mode-heuristic-per-tensor
inference-optimization/Llama-3.2-1B-Instruct-5.5-bits-mode-hybrid-per-tensor
inference-optimization/Llama-3.2-1B-Instruct-5.5-bits-mode-noise-per-tensor
inference-optimization/Llama-3.2-1B-Instruct-6-bits-mode-heuristic-per-tensor
inference-optimization/Llama-3.2-1B-Instruct-6-bits-mode-hybrid-per-tensor
inference-optimization/Llama-3.2-1B-Instruct-6-bits-mode-noise-per-tensor
inference-optimization/Llama-3.2-1B-Instruct-6.5-bits-mode-heuristic-per-tensor
inference-optimization/Llama-3.2-1B-Instruct-6.5-bits-mode-hybrid-per-tensor
inference-optimization/Llama-3.2-1B-Instruct-6.5-bits-mode-noise-per-tensor
inference-optimization/Llama-3.2-1B-Instruct-7-bits-mode-heuristic-per-tensor
inference-optimization/Llama-3.2-1B-Instruct-7-bits-mode-hybrid-per-tensor
inference-optimization/Llama-3.2-1B-Instruct-7-bits-mode-noise-per-tensor
meta-llama/Llama-3.2-3B-Instruct
Text Generation
• 3B • Updated • 1.75M
• • 2.2k
inference-optimization/Llama-3.2-3B-Instruct-FP8-Dynamic
inference-optimization/Llama-3.2-3B-Instruct-NVFP4
inference-optimization/Llama-3.2-3B-Instruct-5-bits-mode-heuristic-per-tensor
inference-optimization/Llama-3.2-3B-Instruct-5-bits-mode-hybrid-per-tensor
inference-optimization/Llama-3.2-3B-Instruct-5-bits-mode-noise-per-tensor
inference-optimization/Llama-3.2-3B-Instruct-5.5-bits-mode-heuristic-per-tensor
inference-optimization/Llama-3.2-3B-Instruct-5.5-bits-mode-hybrid-per-tensor
inference-optimization/Llama-3.2-3B-Instruct-5.5-bits-mode-noise-per-tensor
inference-optimization/Llama-3.2-3B-Instruct-6-bits-mode-heuristic-per-tensor
inference-optimization/Llama-3.2-3B-Instruct-6-bits-mode-hybrid-per-tensor
inference-optimization/Llama-3.2-3B-Instruct-6-bits-mode-noise-per-tensor
inference-optimization/Llama-3.2-3B-Instruct-6.5-bits-mode-heuristic-per-tensor
inference-optimization/Llama-3.2-3B-Instruct-6.5-bits-mode-hybrid-per-tensor
inference-optimization/Llama-3.2-3B-Instruct-6.5-bits-mode-noise-per-tensor
inference-optimization/Llama-3.2-3B-Instruct-7-bits-mode-heuristic-per-tensor
inference-optimization/Llama-3.2-3B-Instruct-7-bits-mode-hybrid-per-tensor
inference-optimization/Llama-3.2-3B-Instruct-7-bits-mode-noise-per-tensor
meta-llama/Llama-3.1-8B-Instruct
Text Generation
• 8B • Updated • 11.3M
• • 6.01k
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation
• 8B • Updated • 46k
• 9
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation
• 5B • Updated • 16.1k
• 1
inference-optimization/Llama-3.1-8B-Instruct-5-bits-mode-heuristic-per-tensor
inference-optimization/Llama-3.1-8B-Instruct-5-bits-mode-hybrid-per-tensor
inference-optimization/Llama-3.1-8B-Instruct-5-bits-mode-noise-per-tensor
inference-optimization/Llama-3.1-8B-Instruct-5.5-bits-mode-heuristic-per-tensor
inference-optimization/Llama-3.1-8B-Instruct-5.5-bits-mode-hybrid-per-tensor
inference-optimization/Llama-3.1-8B-Instruct-5.5-bits-mode-noise-per-tensor
inference-optimization/Llama-3.1-8B-Instruct-6-bits-mode-heuristic-per-tensor
inference-optimization/Llama-3.1-8B-Instruct-6-bits-mode-hybrid-per-tensor
inference-optimization/Llama-3.1-8B-Instruct-6-bits-mode-noise-per-tensor
inference-optimization/Llama-3.1-8B-Instruct-6.5-bits-mode-heuristic-per-tensor
inference-optimization/Llama-3.1-8B-Instruct-6.5-bits-mode-hybrid-per-tensor
inference-optimization/Llama-3.1-8B-Instruct-6.5-bits-mode-noise-per-tensor
inference-optimization/Llama-3.1-8B-Instruct-7-bits-mode-heuristic-per-tensor
inference-optimization/Llama-3.1-8B-Instruct-7-bits-mode-hybrid-per-tensor
inference-optimization/Llama-3.1-8B-Instruct-7-bits-mode-noise-per-tensor
Text Generation
• 8B • Updated • 12.1M
• • 1.13k
RedHatAI/Qwen3-8B-FP8-dynamic
Text Generation
• 8B • Updated • 53.5k
• 12
Text Generation
• 5B • Updated • 3.08k
• 2
inference-optimization/Qwen3-8B-5-bits-mode-heuristic-per-tensor
inference-optimization/Qwen3-8B-5-bits-mode-hybrid-per-tensor
inference-optimization/Qwen3-8B-5-bits-mode-noise-per-tensor
inference-optimization/Qwen3-8B-5.5-bits-mode-heuristic-per-tensor
inference-optimization/Qwen3-8B-5.5-bits-mode-hybrid-per-tensor
inference-optimization/Qwen3-8B-5.5-bits-mode-noise-per-tensor
inference-optimization/Qwen3-8B-6-bits-mode-heuristic-per-tensor
inference-optimization/Qwen3-8B-6-bits-mode-hybrid-per-tensor
inference-optimization/Qwen3-8B-6-bits-mode-noise-per-tensor
inference-optimization/Qwen3-8B-6.5-bits-mode-heuristic-per-tensor
inference-optimization/Qwen3-8B-6.5-bits-mode-hybrid-per-tensor
inference-optimization/Qwen3-8B-6.5-bits-mode-noise-per-tensor
inference-optimization/Qwen3-8B-7-bits-mode-heuristic-per-tensor
inference-optimization/Qwen3-8B-7-bits-mode-hybrid-per-tensor
inference-optimization/Qwen3-8B-7-bits-mode-noise-per-tensor
Text Generation
• 31B • Updated • 2.12M
• 898
RedHatAI/Qwen3-30B-A3B-FP8-dynamic
Text Generation
• 31B • Updated • 128k
• 3
RedHatAI/Qwen3-30B-A3B-NVFP4
Text Generation
• 17B • Updated • 73.4k
• 2
inference-optimization/Qwen3-30B-A3B-5-bits-mode-heuristic-per-tensor
inference-optimization/Qwen3-30B-A3B-5-bits-mode-hybrid-per-tensor
inference-optimization/Qwen3-30B-A3B-5-bits-mode-noise-per-tensor
inference-optimization/Qwen3-30B-A3B-5.5-bits-mode-heuristic-per-tensor
inference-optimization/Qwen3-30B-A3B-5.5-bits-mode-hybrid-per-tensor
inference-optimization/Qwen3-30B-A3B-5.5-bits-mode-noise-per-tensor
inference-optimization/Qwen3-30B-A3B-6-bits-mode-heuristic-per-tensor
inference-optimization/Qwen3-30B-A3B-6-bits-mode-hybrid-per-tensor
inference-optimization/Qwen3-30B-A3B-6-bits-mode-noise-per-tensor
inference-optimization/Qwen3-30B-A3B-6.5-bits-mode-heuristic-per-tensor
inference-optimization/Qwen3-30B-A3B-6.5-bits-mode-hybrid-per-tensor
inference-optimization/Qwen3-30B-A3B-6.5-bits-mode-noise-per-tensor
inference-optimization/Qwen3-30B-A3B-7-bits-mode-heuristic-per-tensor
inference-optimization/Qwen3-30B-A3B-7-bits-mode-hybrid-per-tensor
inference-optimization/Qwen3-30B-A3B-7-bits-mode-noise-per-tensor
Qwen/Qwen3-30B-A3B-Instruct-2507
Text Generation
• 31B • Updated • 930k
• • 814
inference-optimization/Qwen3-30B-A3B-Instruct-2507-FP8-Dynamic
inference-optimization/Qwen3-30B-A3B-Instruct-2507-NVFP4
inference-optimization/Qwen3-30B-A3B-Instruct-2507-5-bits-mode-heuristic-per-tensor
inference-optimization/Qwen3-30B-A3B-Instruct-2507-5-bits-mode-hybrid-per-tensor
inference-optimization/Qwen3-30B-A3B-Instruct-2507-5-bits-mode-noise-per-tensor
inference-optimization/Qwen3-30B-A3B-Instruct-2507-5.5-bits-mode-heuristic-per-tensor
inference-optimization/Qwen3-30B-A3B-Instruct-2507-5.5-bits-mode-hybrid-per-tensor
inference-optimization/Qwen3-30B-A3B-Instruct-2507-5.5-bits-mode-noise-per-tensor
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6-bits-mode-heuristic-per-tensor
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6-bits-mode-hybrid-per-tensor
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6-bits-mode-noise-per-tensor
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6.5-bits-mode-heuristic-per-tensor
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6.5-bits-mode-hybrid-per-tensor
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6.5-bits-mode-noise-per-tensor
inference-optimization/Qwen3-30B-A3B-Instruct-2507-7-bits-mode-heuristic-per-tensor
inference-optimization/Qwen3-30B-A3B-Instruct-2507-7-bits-mode-hybrid-per-tensor
inference-optimization/Qwen3-30B-A3B-Instruct-2507-7-bits-mode-noise-per-tensor