Community Blog & Articles

Community Articles

GPU Management: Why Idle GPUs Are the New Grounded Aircraft

The OlmoEarth Platform: Geospatial inference at planetary scale

LFM2.5-Encoders for Fast Long-Context Inference on CPU

NVIDIA Cosmos-H-Dreams: Bringing Real-Time Generative Simulation to Surgical Robotics

Anatomy of a Frontier Lab Agent Intrusion: A Technical Timeline of the July 2026 Incident

Bringing Nunchaku 4-bit Diffusion Inference to Diffusers

Grabette: an open system to record robot-manipulation data

Newer Models, Same Advantage

Security incident disclosure — July 2026

Model Routing Is Simple. Until It Isn’t.

Welcome Inkling by Thinking Machines

Introducing Real World VoiceEQ: Measuring the human quality of voice AI

Profiling in PyTorch (Part 3): Attention is all you profile

Native-speed vLLM transformers modeling backend

NEW Articles from Team or Enterprise organizations will get promoted to the main section.

Community Blog & Articles

Kimi K3 Model Overview: 2.8T Parameters, MXFP4 Quantization, and What the Open Weights Mean for the Community

The State of Simulation for Physical AI: An Overview

mDenseOn with the mLateOn: Open Multilingual, Long-Context, and Code Retrieval Models

VisionPsy-Nano: State-of-the-Art On-Device Vision-Language Models

FLUX 3 Model Overview: Multimodal Flow Models for Image, Video, Audio, and Action Prediction

Accelerating Qwen3.6 on Intel® Core™ Ultra Series 3 with DFlash

ECMWF's AI forecasting model is open source: now let's make it easy to run.

KV Caching Explained: Optimizing Transformer Inference Efficiency

Introducing Cosmos 3 Edge

Be Ready Before the Attack: A Practical Guide to Self-Hosting an Open Model for Cyber Defense

LettucePrevent - Real-Time Prevention of Factual Hallucinations in RAG

Uncensor any LLM with abliteration

Hugging Face on AMD Instinct MI455X: First Transformers Results

What building Shippy taught us about building agents

FeyNoBg: A SOTA Model For Background Removal

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Code a simple RAG from scratch

AI, Physical AI, World Models, VLA, VLM, and Other Terms We Should Stop Mixing Together

Kimi K3, previewed: inside the first open 3T-class model

POCKET: a 35-billion-parameter model that runs on your iPhone — and on your PC with no GPU

GPU Management: Why Idle GPUs Are the New Grounded Aircraft

The OlmoEarth Platform: Geospatial inference at planetary scale

LFM2.5-Encoders for Fast Long-Context Inference on CPU

NVIDIA Cosmos-H-Dreams: Bringing Real-Time Generative Simulation to Surgical Robotics

Anatomy of a Frontier Lab Agent Intrusion: A Technical Timeline of the July 2026 Incident

Bringing Nunchaku 4-bit Diffusion Inference to Diffusers

Grabette: an open system to record robot-manipulation data

Newer Models, Same Advantage

Security incident disclosure — July 2026

Model Routing Is Simple. Until It Isn’t.

Welcome Inkling by Thinking Machines

Introducing Real World VoiceEQ: Measuring the human quality of voice AI

Profiling in PyTorch (Part 3): Attention is all you profile

Native-speed vLLM transformers modeling backend

Kimi K3 Model Overview: 2.8T Parameters, MXFP4 Quantization, and What the Open Weights Mean for the Community

The State of Simulation for Physical AI: An Overview

mDenseOn with the mLateOn: Open Multilingual, Long-Context, and Code Retrieval Models

VisionPsy-Nano: State-of-the-Art On-Device Vision-Language Models

FLUX 3 Model Overview: Multimodal Flow Models for Image, Video, Audio, and Action Prediction

Accelerating Qwen3.6 on Intel® Core™ Ultra Series 3 with DFlash

ECMWF's AI forecasting model is open source: now let's make it easy to run.

KV Caching Explained: Optimizing Transformer Inference Efficiency

Introducing Cosmos 3 Edge

Be Ready Before the Attack: A Practical Guide to Self-Hosting an Open Model for Cyber Defense

LettucePrevent - Real-Time Prevention of Factual Hallucinations in RAG

Uncensor any LLM with abliteration

Hugging Face on AMD Instinct MI455X: First Transformers Results

What building Shippy taught us about building agents

FeyNoBg: A SOTA Model For Background Removal

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Code a simple RAG from scratch

AI, Physical AI, World Models, VLA, VLM, and Other Terms We Should Stop Mixing Together

Kimi K3, previewed: inside the first open 3T-class model

POCKET: a 35-billion-parameter model that runs on your iPhone — and on your PC with no GPU