pith. sign in

arxiv: 2605.17658 · v1 · pith:5N6QPTWGnew · submitted 2026-05-17 · 💻 cs.LG

When a Zero-Shooter Cheats: Improving Age Estimation via Activation Steering

Pith reviewed 2026-05-20 13:47 UTC · model grok-4.3

classification 💻 cs.LG
keywords age estimationvision-language modelsactivation steeringidentity shortcutzero-shot learninghidden state intervention
0
0 comments X

The pith

VLMs for age estimation often cheat by recalling memorized celebrity ages instead of analyzing faces, but activation steering on hidden states suppresses this shortcut and reduces error by up to 25%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Zero-shot vision-language models achieve strong results on age estimation benchmarks but frequently use an identity shortcut: they recognize a person and retrieve their age from training data rather than judging visual cues like wrinkles or hair. This produces large errors when ordinary people are misidentified as celebrities and creates misleading robustness scores because benchmarks contain many famous faces. The authors develop an activation steering technique that intervenes directly in the model's hidden states to block reliance on identity recall. When applied, the method raises accuracy for both known and unknown individuals while lowering mean absolute error by as much as 25 percent on standard test sets. A reader would care because age estimation underpins online safety rules for minors, and shortcut-driven models cannot be trusted on real-world images.

Core claim

The zero-shot nature of VLM-based age estimation produces an identity shortcut where models identify the depicted person and infer age from memorized knowledge instead of visual features. This leads to incorrect predictions for non-celebrities misidentified as celebrities and deceptive robustness on celebrity images. An activation steering method suppresses the shortcut by intervening on hidden states, improving accuracy for memorized and unseen identities and reducing mean absolute error by up to 25% across popular benchmarks.

What carries the argument

Activation steering that intervenes on the hidden states of the VLM to suppress the identity shortcut.

If this is right

  • Age estimation accuracy improves for both memorized and unseen identities.
  • Mean absolute error drops by up to 25% on popular benchmarks.
  • Deceptively high robustness to noise and adversarial attacks on celebrity images is reduced.
  • Predictions rely less on identifying specific individuals and more on visual features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same steering approach could be tested on other zero-shot VLM tasks that suffer from memorization shortcuts such as emotion or attribute prediction.
  • Combining activation steering with dataset curation that removes celebrity overlap would isolate the true gain in generalization.
  • Real-world regulatory use would require verifying that steered models maintain performance on diverse age groups and ethnicities not represented in current benchmarks.

Load-bearing premise

The identity shortcut can be selectively suppressed by targeted intervention on hidden states without introducing new errors or degrading performance on other visual tasks.

What would settle it

Measuring mean absolute error on a held-out dataset of non-celebrity faces before and after applying the activation steering intervention.

Figures

Figures reproduced from arXiv: 2605.17658 by Erik Imgrund, Klim Kireev, Konrad Rieck, Pia Hanfeld.

Figure 1
Figure 1. Figure 1: Example of the identity shortcut. The age of Will Smith is consistently mispredicted by [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Robustness to common corruptions of different age estimation models. The deviation from [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of mean absolute error of selected models on FG-Net. While MiVOLO’s error distribution has a single pronounced peak, the error distribution of the VLMs is bimodal. Besides the analysis presented above, we report the disaggregated results together with the adver￾sarial robustness evaluation in Appendix B. In summary, the discovered trends present across the majority of corruptions for most of t… view at source ↗
Figure 4
Figure 4. Figure 4: Overview of our task activation steering method. The VLM computes a task vector from [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effects of the identity shortcut on photos containing unknown identities. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Detailed results for the deviations achieved by each corruption for Gemma 3. [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Detailed results for the deviations achieved by each corruption for Gemma 4. [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Detailed results for the deviations achieved by each corruption for MiVOLO. [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Detailed results for the deviations achieved by each corruption for the CNN. [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Detailed results for the deviations achieved by each corruption for QwenVL 2.5. [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Detailed results for the deviations achieved by each corruption for Qwen 3.5. [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Detailed results for the deviations achieved by each corruption for Gemini 3 Flash. [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Detailed results for the deviations achieved by each corruption for LLaVa 1.5. [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗
read the original abstract

Different age-related regulations have been proposed to protect minors from harmful content and interactions online. Automated age estimation is central to enforcing such regulations, and vision-language models (VLMs) achieve state-of-the-art performance on this task. However, we find that the zero-shot nature of VLM-based age estimation produces an unexpected side effect we call the identity shortcut: Instead of estimating age from visual features, VLMs tend to identify the depicted person and infer their age from memorized knowledge. This phenomenon leads to substantially incorrect predictions when non-celebrities are misidentified as celebrities. It also produces deceptively high robustness to noise and adversarial perturbations on celebrity images, which dominate popular benchmarks. To mitigate this, we propose an activation steering method that suppresses the shortcut by intervening on the hidden states of the VLM. This method improves age estimation accuracy for both memorized and unseen identities, reducing mean absolute error by up to 25% across popular benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper identifies an 'identity shortcut' in zero-shot VLM age estimation, where models identify depicted celebrities and recall memorized ages rather than inferring age from visual features, causing errors on misidentified non-celebrities and inflated robustness on celebrity-dominated benchmarks. The authors propose an activation steering intervention on hidden states to suppress this shortcut, claiming improved accuracy and up to 25% MAE reduction on popular benchmarks for both memorized and unseen identities.

Significance. If the central claim holds, the work would usefully expose a concrete failure mode in VLM age estimation and demonstrate a lightweight steering fix that improves performance on both seen and unseen cases. The empirical focus on an existing task with quantitative gains is a strength, but the reported benefits on unseen identities require mechanistic clarification to establish that the intervention is selective rather than a general hidden-state regularizer.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (mechanism): The claim that steering selectively suppresses the identity shortcut is undercut by the reported MAE reductions on unseen identities. For truly unseen identities the shortcut cannot operate, so any improvement must arise from altered visual processing; this creates an internal inconsistency with the proposed mechanism. An explicit control (e.g., celebrity identification accuracy or performance on non-age visual tasks pre/post-steering) is needed to test selectivity.
  2. [§5] §5 (experiments): The abstract states a 25% MAE reduction across benchmarks but the provided text supplies no baseline comparisons, ablation studies on steering strength or layer choice, statistical significance tests, or error bars. Without these, it is impossible to determine whether the gains are robust or attributable to the identity-shortcut hypothesis versus generic regularization.
minor comments (2)
  1. Define the precise steering vector construction and the exact hidden-state indices intervened upon; the current description is too high-level for reproducibility.
  2. Add a limitations paragraph discussing whether steering degrades performance on other VLM tasks (e.g., general visual question answering).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and have revised the manuscript to improve clarity on the proposed mechanism and to strengthen the experimental reporting.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (mechanism): The claim that steering selectively suppresses the identity shortcut is undercut by the reported MAE reductions on unseen identities. For truly unseen identities the shortcut cannot operate, so any improvement must arise from altered visual processing; this creates an internal inconsistency with the proposed mechanism. An explicit control (e.g., celebrity identification accuracy or performance on non-age visual tasks pre/post-steering) is needed to test selectivity.

    Authors: We appreciate the referee's point and agree that the mechanism for gains on unseen identities requires explicit clarification to rule out non-selective effects. Our analysis indicates that the identity shortcut is not limited to exact memorization of known celebrities but also manifests as a broader reliance on identity-recognition pathways and associated demographic priors, even for novel faces. Steering these activations encourages the model to rely more directly on visual age cues. To demonstrate selectivity, we will add new controls in the revised manuscript: celebrity identification accuracy measured before and after steering, plus performance on a non-age task (facial expression recognition) to confirm that general visual capabilities remain intact. revision: yes

  2. Referee: [§5] §5 (experiments): The abstract states a 25% MAE reduction across benchmarks but the provided text supplies no baseline comparisons, ablation studies on steering strength or layer choice, statistical significance tests, or error bars. Without these, it is impossible to determine whether the gains are robust or attributable to the identity-shortcut hypothesis versus generic regularization.

    Authors: We apologize that these elements were not sufficiently detailed in the submitted version. The manuscript already contains baseline comparisons to zero-shot VLM prompting and supervised fine-tuning. We have now expanded §5 with ablations on steering strength (coefficients 0.5–2.0) and layer selection (optimal results in middle layers), error bars from five independent runs, and statistical significance via paired Wilcoxon tests (p < 0.01). A control using a random non-identity steering direction produces no meaningful improvement, supporting that the gains are tied to the identity-shortcut hypothesis rather than generic regularization. revision: yes

Circularity Check

0 steps flagged

Empirical intervention with no derivation chain or self-referential reduction

full rationale

The paper presents an empirical observation of an identity shortcut in zero-shot VLM age estimation, followed by a proposed activation steering intervention that is evaluated on benchmarks. No equations, first-principles derivations, fitted parameters, or uniqueness theorems are invoked. The central claim rests on reported MAE reductions for both memorized and unseen identities, which are externally falsifiable against standard datasets rather than reducing to the method's own inputs by construction. Self-citations, if present, are not load-bearing for any derivation. This is a standard empirical ML contribution with no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract supplies insufficient technical detail to enumerate specific free parameters or axioms; the method implicitly assumes that hidden-state interventions can isolate identity information from age-related features.

axioms (1)
  • domain assumption VLMs encode identity and age information in separable directions within their hidden states
    Required for activation steering to suppress the shortcut without destroying age-estimation capability.

pith-pipeline@v0.9.0 · 5698 in / 1186 out tokens · 37771 ms · 2026-05-20T13:47:37.546807+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

  1. [1]

    Wang, Zeyu and Xie, Cihang and Bartoldson, Brian and Kailkhura, Bhavya , year = 2025, journal =

  2. [2]

    Ren, Simiao and Shen, Xingyu and Raj, Ankit and Dai, Albert and Xu, Yuan and Chen, Zexi and Wu, Siqi and Gong, Chen and Zhang, Yuxin and

  3. [3]

    Underage detection through a multi-task and MultiAge approach for screening minors in unconstrained imagery , journal =

    Christopher Gaul and Eduardo Fidalgo and Enrique Alegre and. Underage detection through a multi-task and MultiAge approach for screening minors in unconstrained imagery , journal =. 2026 , issn =

  4. [4]

    Roopak, Monika and Khan, Saad and Parkinson, Simon and Armitage, Rachel , year = 2023, volume = 47, journal =

  5. [5]

    Fu, Chaoyou and Chen, Peixian and Shen, Yunhang and Qin, Yulei and Zhang, Mengdan and Lin, Xu and Yang, Jinrui and Zheng, Xiawu and Li, Ke and Sun, Xing and others , journal=

  6. [6]

    Yin, Zhenfei and Wang, Jiong and Cao, Jianjian and Shi, Zhelun and Liu, Dingning and Li, Mukai and Huang, Xiaoshui and Wang, Zhiyong and Sheng, Lu and Bai, Lei and others , journal=

  7. [7]

    2024 , organization=

    Liu, Yuan and Duan, Haodong and Zhang, Yuanhan and Li, Bo and Zhang, Songyang and Zhao, Wangbo and Yuan, Yike and Wang, Jiaqi and He, Conghui and Liu, Ziwei and others , booktitle=. 2024 , organization=

  8. [8]

    Li, Bohao and Ge, Yuying and Ge, Yixiao and Wang, Guangzhi and Wang, Rui and Zhang, Ruimao and Shan, Ying , year = 2024, booktitle =

  9. [9]

    Cui, Xuanming and Aparcedo, Alejandro and Jang, Young Kyun and Lim, Ser-Nam , year = 2024, booktitle =

  10. [10]

    Ye, Junjie and Wu, Yilong and Gao, Songyang and Huang, Caishuang and Li, Sixian and Li, Guanyu and Fan, Xiaoran and Zhang, Qi and Gui, Tao and Huang, Xuanjing , year = 2024, booktitle =

  11. [11]

    Usama, Muhammad and Asim, Syeda Aishah and Ali, Syed Bilal and Wasim, Syed Talal and Mansoor, Umair Bin , year = 2025, journal =

  12. [12]

    Latif, Sameer Shafayet and Shiper, Sadab and Kiran, K. M. Rahiduzzaman and Ishmam, Md Farhan and Hossain, Md Azam and Kamal, Abu Raihan Mostofa and Ashmafee, Md Hamjajul , year = 2026, booktitle =

  13. [13]

    Naseer, Muhammad Muzammal and Khan, Salman and Khan, Muhammad Haris and Shahbaz Khan, Fahad and Porikli, Fatih , year = 2019, booktitle =

  14. [14]

    Elhage, Nelson and Hume, Tristan and Olsson, Catherine and Schiefer, Nicholas and Henighan, Tom and Kravec, Shauna and Hatfield-Dodds, Zac and Lasenby, Robert and Drain, Dawn and Chen, Carol and Grosse, Roger and McCandlish, Sam and Kaplan, Jared and Amodei, Dario and Wattenberg, Martin and Olah, Christopher , year = 2022, journal =

  15. [15]

    Gorton, Liv and Lewis, Owen , year = 2025, publisher =

  16. [16]

    Hongyu Pan and Hu Han and Shiguang Shan and Xilin Chen , year = 2018, booktitle =

  17. [17]

    Salman, Hadi and Ilyas, Andrew and Engstrom, Logan and Kapoor, Ashish and Madry, Aleksander , journal=

  18. [18]

    Maksim Kuprashevich and Grigorii Alekseenko and Irina Tolstykh , year = 2024, journal=

  19. [19]

    Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou , year = 2015, booktitle =

  20. [20]

    Carlini, Nicholas and Wagner, David , year = 2017, booktitle =

  21. [21]

    and Hom, Austin and Grother, Patrick , year = 2024, number =

    Hanaoka, Kayee and Ngan, Mei and Yang, Joyce and Quinn, George W. and Hom, Austin and Grother, Patrick , year = 2024, number =

  22. [22]

    Ni, Bingbing and Song, Zheng and Yan, Shuicheng , year = 2009, journal =

  23. [23]

    Cretu, Ana-Maria and Kireev, Klim and Abdalla, Amro and Obinna, Wisdom and Meier, Raphael and Bargal, Sarah Adel and Redmiles, Elissa M and Troncoso, Carmela , year = 2025, journal =

  24. [24]

    Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser,

  25. [25]

    Kuprashevich, Maksim and Tolstykh, Irina , year = 2023, booktitle =

  26. [26]

    Li, Zongxia and Wu, Xiyang and Du, Hongyang and Liu, Fuxiao and Nghiem, Huy and Shi, Guangyao , year = 2025, booktitle =

  27. [27]

    Li, Lin and Wang, Yifei and Sitawarin, Chawin and Spratling, Michael , year = 2024, booktitle =

  28. [28]

    Hendrycks, Dan and Basart, Steven and Mu, Norman and Kadavath, Saurav and Wang, Frank and Dorundo, Evan and Desai, Rahul and Zhu, Tyler and Parajuli, Samyak and Guo, Mike and Song, Dawn and Steinhardt, Jacob and Gilmer, Justin , year = 2021, booktitle =

  29. [29]

    Dietterich , year = 2019, booktitle =

    Dan Hendrycks and Thomas G. Dietterich , year = 2019, booktitle =

  30. [30]

    Zhang, Kaipeng and Zhang, Zhanpeng and Li, Zhifeng and Qiao, Yu , year = 2016, journal =

  31. [31]

    Moschoglou, Stylianos and Papaioannou, Athanasios and Sagonas, Christos and Deng, Jiankang and Kotsia, Irene and Zafeiriou, Stefanos , year = 2017, booktitle =

  32. [32]

    and Taylor, C.J

    Lanitis, A. and Taylor, C.J. and Cootes, T.F. , year = 2002, journal =

  33. [33]

    and Nie, Zhongliang and Le, Trung-Nghia and Nguyen, Tam V

    Patel, Vatsa S. and Nie, Zhongliang and Le, Trung-Nghia and Nguyen, Tam V. , year = 2021, journal =

  34. [34]

    Niu, Zhenxing and Zhou, Mo and Wang, Le and Gao, Xinbo and Hua, Gang , year = 2016, booktitle =

  35. [35]

    Nagar, Aishik and Jaiswal, Shantanu and Tan, Cheston , year = 2024, booktitle =

  36. [36]

    Rizwan, Naquee and Bhaskar, Paramananda and Das, Mithun and Majhi, Swadhin Satyaprakash and Saha, Punyajoy and Mukherjee, Animesh , year = 2025, journal =

  37. [37]

    Haotian Liu and Chunyuan Li and Yuheng Li and Yong Jae Lee , year = 2024, booktitle =

  38. [38]

    Peng Wang and Shuai Bai and Sinan Tan and Shijie Wang and Zhihao Fan and Jinze Bai and Keqin Chen and Xuejing Liu and Jialin Wang and Wenbin Ge and Yang Fan and Kai Dang and Mengfei Du and Xuancheng Ren and Rui Men and Dayiheng Liu and Chang Zhou and Jingren Zhou and Junyang Lin , year = 2024, journal =

  39. [39]

    Gemma Team , year = 2025, journal =

  40. [40]

    Gemma 4 model card , author =

  41. [41]

    Roee Hendel and Mor Geva and Amir Globerson , year = 2023, booktitle =