Mohammed Suhail B Nadaf

Identifiers

No identifiers captured yet.

Papers (2)

reward-lens: A Mechanistic Interpretability Library for Reward Models cs.LG · 2026 · author #1
Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens cs.LG · 2026 · author #1

Mentions

No mention provenance yet.