Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

SeaWolf-AI 
posted an update 1 day ago
view post
Post
2935
Darwin-TTS: 3% of an LLM's Brain Makes TTS Speak with Emotion — Zero Training

We blended 3% of Qwen3-1.7B (LLM) FFN weights into Qwen3-TTS-1.7B's talker module. The result: emotionally enhanced speech synthesis — with zero training, zero data, and zero GPU hours.

Try the Demo: FINAL-Bench/Darwin-TTS-1.7B-Cross

Model Weights: FINAL-Bench/Darwin-TTS-1.7B-Cross

Full Research Article: https://huggingface.co/blog/FINAL-Bench/darwin-tts

Qwen3-1.7B (LLM) and Qwen3-TTS-1.7B's talker share 100% identical architecture — same hidden_size (2048), same layers (28), same heads (16). This enabled pure 1:1 weight blending across 84 FFN tensors with a single lerp operation. At 3% blend, emotion appears. At 5%, emotion intensifies. At 10%, the model breaks — producing 655-second outputs for a 3-second sentence, because the LLM's "keep generating" pattern overwhelms the TTS stop signal.

To our knowledge, this is the first training-free cross-modal weight transfer between an LLM and a TTS model. Prior work either requires adapter training (SmolTolk, 2025), fine-tuning (CSLM, 2025), or massive end-to-end compute (GPT-4o). Darwin-TTS achieves cross-modal capability transfer in under 2 minutes on CPU.

The key insight: TTS models with LLM backbones already "think" in language. We're just restoring 3% of the original LLM's language understanding patterns — particularly those related to emotional semantics and prosody planning. The code is three lines: load the model, load the LLM FFN, call p.lerp_(llm_weight, 0.03).

creators of the Darwin Evolutionary Merge Framework.
Darwin LLM V7 achieved GPQA Diamond 86.9% (HF Benchmark #3)
through CMA-ES optimized FFN crossbreeding. Darwin-TTS extends this principle from LLM-to-LLM merging into cross-modal LLM-to-TTS transfer. Apache 2.0.
victor 
posted an update 3 days ago
view post
Post
4374
Want to share my enthusiasm for zai-org/GLM-5.1 here too 🔥

I think we have it: our open source Claude Code = GLM-5.1 + Pi (https://pi.dev/) - Built a Three.js racing game to eval and it's extremely impressive. Thoughts:

- One-shot car physics with real drift mechanics (this is hard)

- My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen

- 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters

- All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio!

- Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed

You are going to hear about this model a lot in the next months - open source let's go - and thanks z-ai🚀🚀
  • 3 replies
·
SeaWolf-AI 
posted an update 3 days ago
view post
Post
5797
🧬 Darwin-27B-Opus: 86.9% on GPQA Diamond — World #5, Zero Training
We are excited to share Darwin-27B-Opus, a 27B model that achieved 86.9% on GPQA Diamond — ranking #5 globally on the HuggingFace leaderboard — without a single gradient update.

How? Darwin breeds pretrained models through evolutionary FFN crossbreeding. The father (Qwen3.5-27B) provides the reasoning architecture; the mother (Claude 4.6 Opus Reasoning Distilled) contributes structured chain-of-thought knowledge. CMA-ES automatically discovers optimal per-layer blending ratios — no human tuning required.

The result surpasses the original Qwen3.5-27B (85.5%), GLM-5.1 (744B, 86.2%), and Qwen3.5-122B (86.6%). A 27B model outperforming 744B — with zero training, zero data, one GPU, ~2 hours.

We also confirmed hybrid vigor on Korean benchmarks: Darwin-27B-KR (2nd generation offspring) surpassed both parents on CLIcK, winning 7 out of 11 categories. The evolutionary optimizer independently assigned 93% of FFN from the Korean-specialized mother while preserving 93% of attention from the reasoning-specialized father — autonomously validating our core principle: FFN carries knowledge, Attention carries reasoning.

📊 Public release: 10 days → 300+ community derivatives, 120K+ downloads.

🔗 Links:
Darwin-27B-Opus: FINAL-Bench/Darwin-27B-Opus
article: https://huggingface.co/blog/FINAL-Bench/darwin-gpqa
Darwin Family Collection: https://huggingface.co/collections/FINAL-Bench/darwin-family

If foundation models are raw ore, Darwin is the forge. We are just getting started. 🔥
imnotkitty 
posted an update about 8 hours ago
view post
Post
567
Just tried tencent/HY-World-2.0 — a multimodal world model that takes in text or a single image and generates editable 3D scenes.

Unlike Google's Genie and HY-World 1.5, v2.0 generates engine-ready 3D content:
🎮 Direct import into Unreal Engine and Unity — no format wrangling
🧊 Supports multiple 3D asset formats: Mesh, 3DGS, point cloud, etc.
✏️ Fully editable — not a baked video, but actual geometry you can modify
🤖 Also usable for embodied simulation environments

Basically: from "AI generates a world you can look at" → "AI generates a world you can ship."
omarkamali 
posted an update 4 days ago
view post
Post
4456
We got Qwen 3.5 to count Rs in Strawberry correctly! 🚨

Building on Sawtone, we’ve been testing a different way to feed language into an LLM to build the next generation of multilingual AI.

The usual setup gives the model tokenized text and asks it to perform various linguistic tasks. That works surprisingly well, until it doesn’t. Accents disappear. Words get mangled. Internal structure gets blurred away. And the cost of that gets higher once you move into multilingual and lower-resource settings.

So we tried adding a second path.

In addition to the normal text input, the model also receives Sawtone: a byte-level word representation that preserves how a word is written, how it sounds, and how it is structured.

Same LLM. Better interface.

In this proof of concept with Qwen 3.5 0.8B, that pushed our eval from 64% to 88%. The gains showed up exactly where tokenized models usually get shaky: diacritics, character order, exact spelling, and other form-sensitive behavior.

Sawtone itself is tokenizer-free, byte-level, and pre-trained across 507 languages.

Still early, but promising!

  • 4 replies
·
cahlen 
posted an update 1 day ago
view post
Post
1546
Huggingface just enabled cuda kernel repos!! This is crazy cool!

Expect a ton more portable number theory cuda kernels in the near future. I'm going to have a hell of a lot of fun with this new feature.

Appreciate it huggingface!

https://huggingface.co/kernels

  • 1 reply
·
Benedictat 
posted an update about 8 hours ago
view post
Post
704
Hunyuan HY-World 2.0 Open-Sourced | Unified SOTA for 3D Generation / Reconstruction / Simulation

HY-World 2.0 is a unified 3D world model supporting multimodal inputs including text and images.

Its end-to-end framework simultaneously performs 3D understanding, scene generation, and geometric reconstruction.

Based on HY-Pano-2.0, the model enables panorama generation without camera parameters

It ensures geometric consistency via spatial agents and trajectory planning, and achieves joint 3DGS & Mesh representation with WorldMirror 2.0, reaching SOTA performance in novel view synthesis and 3D reconstruction

Unlike Genie 3 and HY-World 1.5, which only output videos, HY-World 2.0 directly generates editable 3D assets, better meeting real-world research and simulation demands
  • 1 reply
·
DedeProGames 
posted an update about 16 hours ago
view post
Post
1479
🔥 GRM-2.5 - The most POWERFUL model for local inference

The GRM-2.5 is the newest model from Orion LLM Labs. It has consistent RAW reasoning and is capable of generating very precise responses, similar to large models, while maintaining a parameter size of 4b.

The GRM-2.5 family consists of these models:
OrionLLM/GRM-2.5 (4b)
OrionLLM/GRM-2.5-Air (0.8b)

Furthermore, the GRM-2.5 is the best option for local agentic environments, being very good in code, terminal agent, etc. It is capable of generating 1000 lines of consistent code and programming like large models.
The GRM-2.5 is the best base for FineTune to date and has vision, which means it can interpret images and videos.
  • 1 reply
·
wangbuer999 
posted an update about 2 hours ago
view post
Post
182
Hands-on testing of HY-World 2.0 shows a significant improvement in end-to-end engineering maturity compared to version 1.5

The model supports direct multimodal input from text, single-frame images, and video. Inference can be launched without camera intrinsic/extrinsic calibration or additional preprocessing

After panorama generation, the built-in Spatial Agent automatically performs semantic navigation path planning. Combined with spatial consistency constraints from HY-WorldStereo, it ensures artifact-free multi-view generation and stable geometric alignment

Outputs include standard 3D asset formats such as Mesh, 3DGS, and point clouds, which can be directly imported into Unity/UE

It is suitable for engineering scenarios including game level prototyping, digital twins, and embodied simulation
kelsend 
posted an update about 8 hours ago
view post
Post
644
Tencent Open-Sources Hunyuan 3D World Model 2.0 Generate Editable 3D Game Worlds with One Sentence, Compatible with Unity/UE

Tencent has officially released and open-sourced Hunyuan 3D World Model 2.0 (HY-World 2.0), enabling AI to evolve from video generation to creating playable, editable 3D world

Core Highlights

Text / Image / Video → Directly generate exportable 3D assets (Mesh / 3DGS / Point Cloud)

Seamlessly integrates with Unity / Unreal Engine for game maps and level prototyping

One-click reconstruction of digital twin scenes from single images/videos, no camera parameters required

Spatial Agent for intelligent navigation trajectories no wall penetration, consistent spatial height
All-new HY-Pano-2.0 + WorldMirror 2.0 architecture, achieving SOTA in 3D reconstruction and novel view synthesis

Key Breakthrough
Unlike Genie 3 and Hunyuan 1.5, which only output videos, HY-World 2.0 generates re-editable 3D worlds that support collision, interaction, and engine import

Application Scenarios
Game Development, Indoor Preview, Urban Planning, Digitalization of Cultural Heritage, Embodied AI Simulation