Submitted by
Federico Torrielli
AI & ML interests
Interpretability-informed control
Recent Activity
View all activity
Papers
Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion
Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals