Building on HF

Brian Moran

obversarystudios

AI & ML interests

AI evaluation, agent observability, memory substrates, failure traces, boundary mapping, prompt-injection defense, constrained policy learning, cognitive systems

Recent Activity

updated a collection about 18 hours ago

papers

updated a collection about 18 hours ago

papers

updated a collection about 18 hours ago

articles

View all activity

Organizations

None yet

updated 2 collections about 18 hours ago

papers

Collection

22 items • Updated about 18 hours ago

articles

Collection

2 items • Updated about 18 hours ago

liked a Space about 18 hours ago

GIFT Eval

🥇

212

GIFT-Eval: A Benchmark for General Time Series Forecasting

upvoted a paper about 18 hours ago

Communication is All You Need: Persuasion Dataset Construction via Multi-LLM Communication

Paper • 2502.08896 • Published Feb 13, 2025 • 1

updated a collection about 18 hours ago

papers

Collection

22 items • Updated about 18 hours ago

upvoted 8 papers about 18 hours ago

Eliciting and Analyzing Emergent Misalignment in State-of-the-Art Large Language Models

Paper • 2508.04196 • Published Aug 6, 2025 • 2

Language of Persuasion and Misrepresentation in Business Communication: A Textual Detection Approach

Paper • 2508.09935 • Published Aug 13, 2025 • 1

Natural Emergent Misalignment from Reward Hacking in Production RL

Paper • 2511.18397 • Published Nov 23, 2025 • 2

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

Paper • 2504.10430 • Published Apr 14, 2025 • 6

LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions

Paper • 2510.08211 • Published Oct 9, 2025 • 23

From Poisoned to Aware: Fostering Backdoor Self-Awareness in LLMs

Paper • 2510.05169 • Published Oct 5, 2025 • 3

Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models

Paper • 2604.10733 • Published Apr 12 • 1

Frontier Models are Capable of In-context Scheming

Paper • 2412.04984 • Published Dec 6, 2024 • 4

updated a collection about 18 hours ago

papers

Collection

22 items • Updated about 18 hours ago

upvoted a paper about 18 hours ago

Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

Paper • 2307.14539 • Published Jul 26, 2023 • 3

upvoted a paper 1 day ago

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Paper • 2605.14386 • Published 3 days ago • 50

updated a collection 1 day ago

papers

Collection

22 items • Updated about 18 hours ago

upvoted an article 1 day ago

Article

A Guide to Reinforcement Learning Post-Training for LLMs: PPO, DPO, GRPO, and Beyond

karina-zadorozhny

•

Jan 19

• 18

Brian Moran

AI & ML interests

Recent Activity

Organizations

obversarystudios's activity

GIFT Eval

A Guide to Reinforcement Learning Post-Training for LLMs: PPO, DPO, GRPO, and Beyond