Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

posted an update 1 day ago

Post

2959

Google releases Gemma 4 QAT. ✨
You can now run Gemma 4 at 3x less memory with near original performance.

QAT makes it possible to run Gemma 4 26B-A4B on 16GB RAM.

GGUFs: https://huggingface.co/collections/unsloth/gemma-4-qat
QAT Guide: https://unsloth.ai/docs/models/gemma-4/qat

1 reply

Jiaqi-hkust

posted an update 1 day ago

Post

2606

Happy to introduce Response-G1 #ACL2026 — a proactive agent for streaming video understanding.

📄 Paper: http://arxiv.org/abs/2605.07575
📷 Code: http://github.com/kadmkbl/Response-G1

We are happy to have a further discussion!!!

#ACL2026 #AI #Multimodal #VideoUnderstanding #OpenSource #LLM

1 reply

dippatel1994

posted an update 3 days ago

Post

968

To make revising LLM architectures and training methods faster, I created a deck of 180 visual flashcards. It started as a personal hobby, but slowly became cheat code for reviewing LLM concepts before technical interviews. People love it!

Swipe through these samples, and if you want to grab the full set or follow the project, the repo is here: https://github.com/llmsresearch/llm-flashcards.

matteospanio

posted an update 3 days ago

Post

6980

🎶 Released mule-torch — an unofficial PyTorch port of MULE (SF-NFNet-F0), SiriusXM/Pandora's music-audio embedding model (McCallum et al., ISMIR 2022).

No retraining: I re-implemented the architecture in pure PyTorch and transferred the original TensorFlow weights, then checked it layer by layer against the genuine TF pipeline.

✅ End-to-end clip-embedding cosine 0.9999999 vs the original
✅ ONNX backbone parity < 1e-6
✅ 62.35M params (paper: ~62.4M)
✅ Batched, GPU-native, ONNX-exportable — none of which the original Analysis pipeline does

pip install mule-torch

from mule_torch import MuleModel
emb = MuleModel.from_pretrained()(waveform)   # (B, T)@16kHz -> (B, 1728)

🤗 Weights: matteospanio/mule
💻 Code: https://github.com/matteospanio/mule-torch
📦 PyPI: https://pypi.org/project/mule-torch/

The fun bug: parity was perfect through every conv but the block output was anti-correlated (cos = −1). Cause: the learnable skip-init gains couldn't be mapped by layer name (Keras scrambles the order) — they had to be recovered from the graph.

⚠️ Unofficial, community port — not affiliated with or endorsed by the original authors. All credit to them; please cite the paper. Weights inherit CC-BY-NC-4.0.

black-yt

posted an update about 13 hours ago

Post

869

Hey all — our ResearchClawBench leaderboard just updated 🔥

We let AI do real science: 40 tasks across 10 disciplines, compared to human papers. Hard example? 🏔️ Glacier mass change — AI must integrate 233 datasets from 35 teams, 4 methods, reproduce 6542±387 Gt ice loss vs IPCC. No toy problems.

Latest leaderboard (2026-06-09) 📊:
Agents: 🥇 Claude Code 21.5 (50 = match human), $5.3; 🥈 EvoScientist 18.8, $4.1; 🥉 Codex CLI 18.4, just $2.0
LLMs+Harness: 🥇 Claude-Opus-4.8 21.1, $4.0; 🥈 Claude-Opus-4.7 20.7; 🥉 MiniMax-M3 19.8, only $0.45; Qwen3.7-Max 18.7, $0.42, 11min 💥

Claude still king, but MiniMax/Qwen/DeepSeek are crazy cheap and competitive. Expensive isn't always better.

📎 Code & star: https://github.com/InternScience/ResearchClawBench
🏠 Website: https://internscience.github.io/ResearchClawBench-Home/
🤗 Upvote paper: ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research (2606.07591)

sergiopaniego

posted an update 1 day ago

Post

1842

OpenEnv has a new home: github.com/huggingface/OpenEnv

Starting today, it's coordinated by a committee that includes Meta-PyTorch, Reflection, Unsloth, Modal, Prime Intellect, Nvidia, Mercor, Fleet AI, and Hugging Face

frontier labs train their models and their harnesses together. Claude knows Claude Code. GPT-5.5 knows Codex. that's not an accident, it's training. open-source models deserve the same magic, but pulling that off requires infrastructure that belongs to everyone, not one lab

OpenEnv is that layer. one api, any harness, any trainer, any environment

Rewards and training loops stay in TRL, Unsloth, wherever you already work. OpenEnv is the socket they all plug into

Get involved!

Full announcement: https://huggingface.co/blog/openenv-agentic-rl

ovi054

posted an update 1 day ago

Post

1227

Color Grade Transfer LoRA ⚡

I trained a LoRA that transfer color grade directly from target image to source image directly. No Manual color grading needed. The model is fine-tuned on Qwen Image Edit 2511 model.

👉 Try it now: build-small-hackathon/Color-Grade-Transfer

AxionLab-official

posted an update 4 days ago

Post

10724

THIS IS CRAZY! THE MODEL ON THE IMAGE(Supra-50M-Reasoning) answered correctly and its QUANTIZED IN 2BIT! THE RESPONSE IS CORRECT, IN A 15MB SIZE FILE!

14 replies

pbhappliedsystems

posted an update 2 days ago

Post

1874

🚀 **New flagship dataset — and an argument about what a dataset card should be.**

Most synthetic datasets on the Hub ship row counts, a license, and little else — pipeline opaque, rejection criteria unstated, compliance unaudited. We published the opposite.

**SynthEval Cloud — Regulated-Domain Synthetic Instruction Dataset**
👉 pbhappliedsystems/syntheval-cloud-regulated-instruct-1k

**1,116** quality-gated instruction records across **7 regulated domains** (medical, legal, GDPR, privacy, education, e-commerce, transport). Every record cleared a documented cascade, not a vibe check:

- 🧪 **Dual-signal hallucination gate** — rejects only when embedding cosine *and* keyword-overlap both fail; a low score alone never rejects.
- 🔒 **Layered PII masking + independent leak audit** — a separate over-reporting scanner found **0.0% residual leak** across all 1,116 records.
- 📊 **Whole-corpus evaluation, not a sample** — MATTR **0.769**, mean cosine **0.73**, **0%** near-duplicates, **96.9%** yield.
- 🧾 **The 36 rejections ship too**, each tagged with its failing gate. Removal at the gate is the product; we show our work.

Every number on the card is a field in the evaluation_report.json shipped beside the data — full methodology + provenance (Mistral-Nemo AWQ W4A16 · vLLM 0.8.5.post1 · Modal A10G).

One release from **SynthEval**: Studio (local GPU) + Cloud (Modal+vLLM), proving quality parity across substrates.

📄 Whitepaper: https://pbhappliedsystems.com/SynthEval_Studio_and_Cloud_Quality-Gated_Synthetic_Data_Generation.pdf
🔎 Overview: https://pbhappliedsystems.com/synthetic-data.html

**CC BY 4.0** — commercial use welcome, just credit it. Need defensible synthetic data at scale? Let's talk.

— Patrick Hill, PBH Applied Systems

pankajpandey-dev

posted an update 3 days ago

Post

859

🇮🇳 Gemma-3-1B Hindi Instruct — a Hindi LLM that runs fully offline, anywhere.
Last week I shipped Qwen3-4B Hindi. This week I went the other direction: how tiny can a useful Hindi model get? So I fine-tuned Gemma-3-1B on quality-filtered Hindi instruction data and shipped the full GGUF ladder.
✅ Fine-tune (16-bit): pankajpandey-dev/gemma-3-1b-hindi-instruct
✅ GGUF (Q4/Q5/Q8): pankajpandey-dev/gemma-3-1b-hindi-instruct-GGUF
Runs in Ollama, llama.cpp, and LM Studio. The Q4_K_M is just 806 MB — runs on CPU, a cheap laptop, even a Raspberry Pi.
What I tried this round: chrF-filtered the training data to drop weak translations, and used response-only loss so the model learns how to answer, not how to repeat prompts.
Honest note: at 1B, Hindi fluency is strong but coherence is bounded by size — it's a lightweight/edge experiment, not a 4B replacement. Gemma-3-4B Hindi is next.
Part of my Hindi LLM Series — openly-licensed Indic models for local & edge use. Feedback welcome 🙏
#Hindi #IndicNLP #GGUF #LocalLLM #Gemma #EdgeAI

Recently active users