Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Open to Work
33.1
TFLOPS
42
6
65
DedeProGames
PRO
DedeProGames
Follow
21world's profile picture
HANAKT19's profile picture
seeingterra's profile picture
24 followers
·
34 following
AI & ML interests
Thinking and Agentic Finetuning
Recent Activity
reacted
to
anakin87
's
post
with 🔥
about 5 hours ago
📣 I just published a free course on Reinforcement Learning Environments for Language Models! 📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs. Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data. But what actually are these environments in practice❓ And how do you build them effectively❓ Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. I've packaged everything I learned into this short course. What you'll learn 🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain 🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts 🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments 🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master 🔸 Build the game Environment 🔸 Use it to generate synthetic data for SFT warm-up 🔸 Group-based Reinforcement Learning If you're interested in building "little worlds" where LLMs can learn, this course is for you. --- 🤗🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
reacted
to
anakin87
's
post
with ❤️
about 5 hours ago
📣 I just published a free course on Reinforcement Learning Environments for Language Models! 📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs. Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data. But what actually are these environments in practice❓ And how do you build them effectively❓ Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. I've packaged everything I learned into this short course. What you'll learn 🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain 🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts 🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments 🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master 🔸 Build the game Environment 🔸 Use it to generate synthetic data for SFT warm-up 🔸 Group-based Reinforcement Learning If you're interested in building "little worlds" where LLMs can learn, this course is for you. --- 🤗🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
liked
a Space
about 15 hours ago
sam-motamed/VOID
View all activity
Organizations
DedeProGames
's datasets
5
Sort: Recently updated
DedeProGames/Medical-Reasoning-SFT-Nano
Viewer
•
Updated
Mar 8
•
20k
•
42
•
2
DedeProGames/Terminal-Reasoning-SFT
Viewer
•
Updated
Mar 5
•
15.2k
•
13
•
1
DedeProGames/OpenThoughts3-20k
Viewer
•
Updated
Feb 27
•
20k
•
15
•
1
DedeProGames/andy-4-mixture
Viewer
•
Updated
Feb 20
•
121k
•
30
•
1
DedeProGames/OpenThoughts-114k
Viewer
•
Updated
Feb 20
•
114k
•
117
•
1