Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Evan's picture
1 2 40

Evan

Gtmix
Β·

AI & ML interests

None yet

Recent Activity

reacted to qgallouedec's post with πŸ”₯ about 11 hours ago
TRL v1.2 introduces the SSDTrainer πŸš€ Simple Self-Distillation (SSD) from Apple's paper "Embarrassingly Simple Self-Distillation Improves Code Generation" is now available as an experimental trainer in TRL. The recipe is as minimal as the name suggests: sample completions from the model itself at a training-time temperature, then fine-tune on those raw, unverified samples with plain cross-entropy. No reward model. No verifier. No teacher model. No reinforcement learning. Just prompts and the model. ```python from trl.experimental.ssd import SSDConfig, SSDTrainer trainer = SSDTrainer( model="Qwen/Qwen3-4B-Instruct", args=SSDConfig(temperature=0.6, top_k=20, top_p=0.95), train_dataset=dataset, ) trainer.train() ``` v1.2 also ships expanded tool-calling support (LLaMA 3.1 / 3.2, DeepSeek-V3), another round of KTO ↔ DPO alignment getting us closer to promoting KTO to stable, a big GRPO simplification for overlong tool results, deprecation of `use_transformers_paged`, and key fixes for VLM response parsing. Full release notes: https://github.com/huggingface/trl/releases/tag/v1.2.0
liked a Space 17 days ago
tventurella/mr_chatterbox
liked a model 20 days ago
mistralai/Voxtral-4B-TTS-2603
View all activity

Organizations

Hugging Face MCP Course's profile picture

Gtmix 's datasets

None public yet
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs