Small Vectors, Big Effects: A Mechanistic Study of RL-Induced Reasoning via Steering Vectors Paper • 2509.06608 • Published Sep 8, 2025
Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy Paper • 2505.24473 • Published May 30, 2025
Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders Paper • 2606.12138 • Published 15 days ago • 8
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation Paper • 2505.22255 • Published May 28, 2025 • 24
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models Paper • 2502.03032 • Published Feb 5, 2025 • 60