Yuriy Perezhohin PRO

yuriyvnv

https://scholar.google.com/citations?user=I5uzFtwAAAAJ&hl=en

AI & ML interests

Automatic Speech Recognition, Embeddings, Code Generation, Synthetic Data Generation and Filtering

Recent Activity

liked a model 3 days ago

nvidia/nemotron-3.5-asr-streaming-0.6b

updated a model 4 days ago

yuriyvnv/parakeet-tdt-0.6b-EN-Medical

updated a model 4 days ago

yuriyvnv/Qwen3-ASR-1.7B-EN-Medical

View all activity

Organizations

Posts 6

Post

🏥 Two medical English ASR models are up
Hey, back from a long holiday. While I was out the team kept working on this one and the results are pretty interesting. Medical English ASR, evaluated against the published MultiMed paper.

🩺 yuriyvnv/parakeet-tdt-0.6b-EN-Medical
🩺 yuriyvnv/Qwen3-ASR-1.7B-EN-Medical

Both trained on MultiMed (leduckhai/MultiMed) mixed with Common Voice 17 English train and validation. Mixing CV in prevents catastrophic forgetting of general English. Medical-only training without CV cost us 5 absolute WER points on general English.

📊 Normalized WER on MultiMed-en test, same protocol as the paper:

Parakeet 0.6B zero-shot: 19.22
Parakeet 0.6B fine-tuned: 14.31 (25% relative reduction)

Qwen3-ASR 1.7B zero-shot: 16.41 (although here we had catastrophic forgetting on CV test set)
Qwen3-ASR 1.7B fine-tuned: 16.50

@hf-audio @QwenLM thanks for the toolkits. Big thanks to @leduckhai and the MultiMed authors for the dataset.

#asr #speech #medical #healthcareai #parakeet #qwen #qwen3asr #nemo #medicalasr

Post

3661

📄 The WAVe paper is officially out in the Information Sciences Journal.

You saw the PT and NL model releases earlier this year. This is the peer-reviewed paper behind them, with the full method, ablations, and downstream ASR evaluation.

Quick recap: WAVe is a 1B multimodal embedding model that filters synthetic speech at the word level, not the sentence level. On Portuguese ASR it cuts training steps by 34%, improves cross-domain generalization by 50%, and matches WER with 30% less synthetic data.

📦 Resources
- Paper: https://www.sciencedirect.com/science/article/pii/S0020025526005220
- PT model: yuriyvnv/WAVe-1B-Multimodal-PT
- NL model: yuriyvnv/WAVe-1B-Multimodal-NL
- Collection: https://huggingface.co/collections/yuriyvnv/multi-modal-embeddings-for-synthetic-transcript-filtering
- Code: https://github.com/yuriyvnv/WAVe

If you train ASR on synthetic or back-translated data, would like to see WAVe benchmarked on other languages.

@reach-vb @ylacombe @hf-audio @BramVanroy

#speech #asr #multimodal #syntheticdata #lowresource