11 2

liu zh

morphism42

AI & ML interests

None yet

Recent Activity

liked a dataset 20 days ago

PRIME-RL/Eurus-2-RL-Data

upvoted a paper about 2 months ago

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

upvoted a paper 3 months ago

Towards Autonomous Mathematics Research

View all activity

Organizations

None yet

upvoted a paper about 2 months ago

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Paper • 2603.27538 • Published Mar 29 • 146

upvoted a paper 3 months ago

Towards Autonomous Mathematics Research

Paper • 2602.10177 • Published Feb 10 • 36

upvoted a paper 4 months ago

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

Paper • 2601.08468 • Published Jan 13 • 7

upvoted a paper 8 months ago

On Predictability of Reinforcement Learning Dynamics for Large Language Models

Paper • 2510.00553 • Published Oct 1, 2025 • 9

upvoted a paper 10 months ago

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Paper • 2508.02193 • Published Aug 4, 2025 • 138

upvoted a paper over 1 year ago

Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

Paper • 2502.02508 • Published Feb 4, 2025 • 22

upvoted an article over 1 year ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

medmekk, marcsun13, lvwerra, pcuenq, osanseviero, thomwolf

•

Sep 18, 2024

• 280

upvoted 3 articles almost 2 years ago

Article

How NuminaMath Won the 1st AIMO Progress Prize

yfleureau, liyongsea, edbeeching, lewtun, benlipkin, romansoletskyi, vwxyzjn, kashif

•

Jul 11, 2024

• 128

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

natolambert, LouisCastricato, lvwerra, Dahoas

•

Dec 9, 2022

• 413

Article

Fine-tune Llama 3 with ORPO

mlabonne

•

Apr 22, 2024

• 240

upvoted an article about 2 years ago

Article

Personal Copilot: Train Your Own Coding Assistant

smangrul, sayakpaul

•

Oct 27, 2023

• 79

liu zh

AI & ML interests

Recent Activity

Organizations

morphism42's activity

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

How NuminaMath Won the 1st AIMO Progress Prize

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Fine-tune Llama 3 with ORPO

Personal Copilot: Train Your Own Coding Assistant