ยท
AI & ML interests
Language Models: orchestration, post-training, GRPO, synthetic data...
Contributing to Haystack LLM framework ๐๏ธ
Recent Activity
repliedto their post about 4 hours ago ๐ฃ I just published a free course on Reinforcement Learning Environments for Language Models!
๐ COURSE: https://github.com/anakin87/llm-rl-environments-lil-course
Over the past year, we've seen a shift in LLM Post-Training.
Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs.
Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data.
But what actually are these environments in practiceโ And how do you build them effectivelyโ
Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models.
I've packaged everything I learned into this short course.
What you'll learn
๐น Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain
๐น How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts
๐น Common patterns: How to build single-turn, multi-turn, and tool-use environments
๐น Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master
๐ธ Build the game Environment
๐ธ Use it to generate synthetic data for SFT warm-up
๐ธ Group-based Reinforcement Learning
If you're interested in building "little worlds" where LLMs can learn, this course is for you.
---
๐ค๐น๏ธ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe
๐ HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe View all activity Organizations
anakin87/LFM2-2.6B-mr-tictactoe
Text Generation
โข 3B โข Updated โข 279
anakin87/LFM2-2.6B-ttt-rl-2
Text Generation
โข Updated โข 9
anakin87/LFM2-2.6B-ttt-rl-merged
Text Generation
โข 3B โข Updated โข 11
anakin87/LFM2-2.6B-ttt-rl
Text Generation
โข Updated anakin87/LFM2-2.6B-ttt-sft
Text Generation
โข 3B โข Updated โข 8
anakin87/Phi-3.5-mini-ITA
Text Generation
โข 4B โข Updated โข 5.26k
โข 13
anakin87/Qwen3-0.6B-alphabet-sort-grpo
Text Generation
โข 0.6B โข Updated โข 10
anakin87/gemma-2-2b-ita-sft
Text Generation
โข 3B โข Updated anakin87/electra-italian-xxl-cased-squad-it
Question Answering
โข 0.1B โข Updated โข 16
โข 8
Text Generation
โข 3B โข Updated โข 33
โข 28
anakin87/qwen-scheduler-7b-grpo
Text Generation
โข Updated โข 6
anakin87/gemma-2-9b-neogenesis-ita
Text Generation
โข 9B โข Updated โข 1.28k
โข โข 11
anakin87/gemma-2-2b-neogenesis-ita
Text Generation
โข 3B โข Updated โข 1.33k
โข โข 6
anakin87/yo-Llama-3-8B-Instruct
Text Generation
โข 8B โข Updated โข 10
โข 7
anakin87/Llama-3-8b-ita-ties
Text Generation
โข 8B โข Updated โข 1.29k
โข โข 3
anakin87/Llama-3-8b-ita-slerp
Text Generation
โข 8B โข Updated โข 2.47k
โข โข 1
anakin87/Llama-3-8b-ita-ties-pro
Text Generation
โข 8B โข Updated โข 2.46k
โข โข 1
anakin87/gemma-2b-orpo-GGUF
3B โข Updated โข 7
โข 7
anakin87/gorilla-openfunctions-v2-sharded
Text Generation
โข 7B โข Updated โข 3
anakin87/gorilla-openfunctions-v0-sharded
Text Generation
โข 7B โข Updated โข 9
โข 1
anakin87/zephyr-7b-alpha-sharded
Text Generation
โข 7B โข Updated โข 25
โข 16