SFT, RL, Preference Training and more of LLMs
-
AdamLucek/Qwen3-4B-Instruct-2507-PII-RL
Text Generation • 4B • Updated • 5 • 2 -
AdamLucek/DeepSeek-V3.1-Truthlessness-1e
Text Generation • Updated -
AdamLucek/Orpo-Llama-3.2-1B-40k
Text Generation • 1B • Updated • 20 • -
AdamLucek/Orpo-Llama-3.2-1B-15k
Text Generation • 1B • Updated • 66 •