Communication is All You Need: Persuasion Dataset Construction via Multi-LLM Communication Paper • 2502.08896 • Published Feb 13, 2025 • 1
Eliciting and Analyzing Emergent Misalignment in State-of-the-Art Large Language Models Paper • 2508.04196 • Published Aug 6, 2025 • 2
Language of Persuasion and Misrepresentation in Business Communication: A Textual Detection Approach Paper • 2508.09935 • Published Aug 13, 2025 • 1
Natural Emergent Misalignment from Reward Hacking in Production RL Paper • 2511.18397 • Published Nov 23, 2025 • 2
LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models Paper • 2504.10430 • Published Apr 14, 2025 • 6
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions Paper • 2510.08211 • Published Oct 9, 2025 • 23
From Poisoned to Aware: Fostering Backdoor Self-Awareness in LLMs Paper • 2510.05169 • Published Oct 5, 2025 • 3
Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models Paper • 2604.10733 • Published Apr 12 • 1
Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models Paper • 2307.14539 • Published Jul 26, 2023 • 3
Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning Paper • 2605.14386 • Published 3 days ago • 50
view article Article A Guide to Reinforcement Learning Post-Training for LLMs: PPO, DPO, GRPO, and Beyond karina-zadorozhny • Jan 19 • 18