Mohammed Suhail B Nadaf
Identifiers
No identifiers captured yet.
Papers (2)
- reward-lens: A Mechanistic Interpretability Library for Reward Models cs.LG · 2026 · author #1
- Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens cs.LG · 2026 · author #1
Mentions
No mention provenance yet.