Hiring 💼

4 82 208

Cahlen Humphreys PRO

cahlen

https://bigcompute.science

cahlen

AI & ML interests

☠️💻

Recent Activity

liked a dataset about 21 hours ago

angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k

liked a model about 21 hours ago

Supertone/supertonic-3

liked a model about 21 hours ago

bytedance-research/Lance

View all activity

Organizations

liked a dataset about 21 hours ago

angrygiraffe/claude-opus-4.6-4.7-reasoning-8.7k

Viewer • Updated 21 days ago • 38.5k • 4.18k • 184

liked 2 models about 21 hours ago

Supertone/supertonic-3

Text-to-Speech • Updated 5 days ago • 37.5k • 581

bytedance-research/Lance

Any-to-Any • Updated about 8 hours ago • 1k • 648

liked a model 5 days ago

openbmb/MiniCPM-V-4.6

Image-Text-to-Text • 1B • Updated 3 days ago • 222k • 904

liked a model 14 days ago

Zyphra/ZAYA1-8B

9B • Updated 11 days ago • 155k • 546

replied to unmodeled-tyler's post 14 days ago

I just constantly have the feeling that people haven't figured it out yet -- but I also realize I'm in a highly niche area. But it does seem like.. I mean.. so many things are so much easier now because of the tools we use -- but how many people even know these tools exist in the larger scheme of things?

For example, I noticed this headline today from CNBC:

Anthropic’s Mythos set off a cybersecurity ‘hysteria.’ Experts say the threat was already here

But anybody who knows anything realizes that you haven't had to wait for Mythos to build a decent harness around tons of different uncensored models to do even more. It's just that the layman is currently catching up, I feel.

We live in an AI bubble I think.. not the kind that is going to 'pop' and destroy the economy, but the kind of bubble where if you're really good at what you already do in a research / academic sense, you're going to be unstoppable with the current AI tools. And they just keep getting better.

Excellent thought provoking post! Have a good weekend!

liked 2 models 14 days ago

SulphurAI/Sulphur-2-base

Text-to-Video • 9B • Updated about 24 hours ago • 1.25M • 1.27k

nvidia/Nemotron-Elastic-12B

Text Generation • 12B • Updated 14 days ago • 381 • 65

liked a model 15 days ago

Open-OSS/privacy-filter

Token Classification • 1B • Updated 16 days ago • 244k • 9

reacted to BlueNipples's post with 👀 16 days ago

Post

2942

Good news, llama.cpp seems to be close to supporting MTP on qwen models. Bad news, every single gguf will have to be redone when it is.

1 reply

reacted to yuriyvnv's post with 🔥 16 days ago

Post

3619

📄 The WAVe paper is officially out in the Information Sciences Journal.

You saw the PT and NL model releases earlier this year. This is the peer-reviewed paper behind them, with the full method, ablations, and downstream ASR evaluation.

Quick recap: WAVe is a 1B multimodal embedding model that filters synthetic speech at the word level, not the sentence level. On Portuguese ASR it cuts training steps by 34%, improves cross-domain generalization by 50%, and matches WER with 30% less synthetic data.

📦 Resources
- Paper: https://www.sciencedirect.com/science/article/pii/S0020025526005220
- PT model: yuriyvnv/WAVe-1B-Multimodal-PT
- NL model: yuriyvnv/WAVe-1B-Multimodal-NL
- Collection: https://huggingface.co/collections/yuriyvnv/multi-modal-embeddings-for-synthetic-transcript-filtering
- Code: https://github.com/yuriyvnv/WAVe

If you train ASR on synthetic or back-translated data, would like to see WAVe benchmarked on other languages.

@reach-vb @ylacombe @hf-audio @BramVanroy

#speech #asr #multimodal #syntheticdata #lowresource

reacted to unmodeled-tyler's post with 🚀 16 days ago

Post

4108

Hey Hugging Face!

Repo: https://github.com/unmodeled-tyler/vessel-browser

I wanted to share a cool feature from my open source AI native web browser, Vessel: Persistent highlights!

You can highlight anything on the page and the context is provided to the agent. It's kind of a fun way to learn about new stuff, synthesize info, or just deepen your comprehension/understanding.

Since highlights are persistent, you can close the page, come back later - and your highlights will be exactly where you left them. I've found this particularly useful when reviewing technical blogs, model cards, etc.

Check it out!

1 reply

posted an update 16 days ago

Post

187

So I built a multimodal video annotation pipeline in my spare time, as you do.

corpus-mill turns any long-form video with people on camera into a time-aligned event corpus across audio, vision, OCR, faces, brand observations, music, and clip-worthy moments. Runs entirely on local GPU because — and I cannot stress this enough — your footage has no business being on someone else's servers.

The honest origin: I needed real multimodal supervision data, the public corpora are weirdly thin once you need per-frame / per-speaker / per-second labels with provenance, so I built one. Then it grew. Then I looked up and it was 30K LOC and ~30 stages and I thought, ok, maybe other people would want this.

Stack is the usual suspects: Whisper-large-v3 (faster-whisper), pyannote-3.1 (which secretly drags in 433 NeMo modules — surprise!), Qwen2.5-VL-7B for vision/OCR/shoppable detection, dlib + YuNet for faces, qwen2.5:7b / qwen3:14b via local Ollama for the LLM passes, chromaprint + PDQ for fingerprinting. Outputs as Parquet + SQLite. Apache 2.0.

There's a Docker compose that works, after I spent a day discovering that faster-whisper wants CUDA 12 cuBLAS while pyannote 4 wants CUDA 13, and the answer is "install both, point LD_LIBRARY_PATH at the cu12 wheels, ship it." That's now baked in. You're welcome.

Spare-time project, bugs are real, fixing them for your specific footage is on you. If you're training multimodal models and want a corpus pipeline you fully control on-prem, this might save you months. If not, the README is at least mildly entertaining.

https://github.com/cahlen/corpus-mill

1 reply

liked 2 datasets 18 days ago

nvidia/Nemotron-Image-Training-v3

Viewer • Updated 25 days ago • 6.92M • 8.98k • 65

open-thoughts/AgentTrove

Viewer • Updated 15 days ago • 1.7M • 10.7k • 150

liked 3 models 18 days ago

reacted to prithivMLmods's post with 🔥 21 days ago

Post

5880

Multimodal-Edge Demo, a node-based inference canvas demo, is now live on Spaces. It features node-based Transformers for fast inference across 10+ edge-device multimodal models on the Hub, all within a single space. The series includes models from Qwen3.5, Qwen3-VL, Gemma 4, and the LFM 2.5 VL model series, with support for reasoning and grounding tasks.

🤗 Demo: prithivMLmods/Multimodal-Edge-Node
🔗 GitHub: https://github.com/PRITHIVSAKTHIUR/Multimodal-Edge-Node
✅ Multimodal Apps Collections: https://huggingface.co/collections/prithivMLmods/hall-of-multimodal-apps

🤗 > To learn more, visit the app page or the respective model pages.

liked a model 22 days ago

nvidia/Gemma-4-26B-A4B-NVFP4

Text Generation • 14B • Updated 11 days ago • 923k • 59

Cahlen Humphreys PRO

AI & ML interests

Recent Activity

Organizations

cahlen's activity