qwen3-0.6B-rlvr-grpo
Overview
This model is a reinforcement learning fine-tuned version of Qwen3-0.6B.
It was trained using GRPO RL fine-tuning to improve reasoning and response quality.
Training Details
- Base model: Qwen3-0.6B
- Method: GRPO RL fine-tuning
- Upload date: 2026-02-21
- Framework: PyTorch
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Saminx22/qwen3-0.6B-rlvr-grpo")
tokenizer = AutoTokenizer.from_pretrained("Saminx22/qwen3-0.6B-rlvr-grpo")