AI & ML interests
Interpretability-informed control
Recent Activity
View all activity
Papers
Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion
Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals
aisilab 's models
None public yet