qwen3-0.6B-rlvr-grpo

Overview

This model is a reinforcement learning fine-tuned version of Qwen3-0.6B.

It was trained using GRPO RL fine-tuning to improve reasoning and response quality.

Training Details

  • Base model: Qwen3-0.6B
  • Method: GRPO RL fine-tuning
  • Upload date: 2026-02-21
  • Framework: PyTorch

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Saminx22/qwen3-0.6B-rlvr-grpo")
tokenizer = AutoTokenizer.from_pretrained("Saminx22/qwen3-0.6B-rlvr-grpo")
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading