Starve to Perceive: Taming Lazy Perception in VLMs with Constrained Visual Bandwidth
Pith reviewed 2026-05-20 10:48 UTC · model grok-4.3
The pith
Constraining each visual observation to a tight token budget forces VLMs to learn functional active perception rather than lazy mimicry.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When visual input per observation is limited to a tight token budget, training makes active perception the only viable path, so models learn to issue and depend on zoom, crop, and pan operations instead of ignoring their outputs.
What carries the argument
Perceptual starvation via constrained visual bandwidth that limits tokens per observation and thereby requires multi-step visual search.
If this is right
- Active perception becomes functionally necessary during training rather than optional.
- The same gains appear without adding losses, rewards, or new model components.
- The method works as a drop-in change to existing post-training pipelines.
- Improvements hold across diverse benchmarks for high-resolution situated agents.
Where Pith is reading between the lines
- Bandwidth limits may similarly encourage active exploration in other multimodal agent settings such as navigation or robotics.
- The result suggests that many current perception shortfalls in VLMs arise from training incentives rather than inherent model limits.
- Testing the approach at still higher resolutions could reveal whether the token starvation scales or requires further adjustments.
Load-bearing premise
Limiting each observation to a tight token budget will eliminate viable shortcuts and force the model to learn useful zoom, crop, and pan operations rather than failing or inventing other workarounds.
What would settle it
Measure whether performance stays high on the same tasks when the trained model is forced to ignore or disable its zoom, crop, and pan operations.
Figures
read the original abstract
Vision-Language Models (VLMs) deployed as situated agents in high-resolution visual environments require active perception -- the ability to dynamically decide where to look through operations like zooming, cropping, and panning. However, current training paradigms produce models that mimic the surface form of such operations without functionally depending on their outputs, a phenomenon we term lazy perception. We trace this to a fundamental learning asymmetry: when coarse global views combined with language priors suffice for moderate accuracy, the model has no incentive to learn harder multi-step visual search. If a model can succeed without actively looking, it will never learn to look. This motivates Starve to Perceive, a training paradigm that constrains visual bandwidth -- restricting each observation to a tight token budget so that no single view suffices for task completion, making active perception the only viable strategy. Despite requiring no auxiliary losses, reward shaping, or architectural changes -- serving as a minimal, plug-in modification to standard post-training pipelines -- models trained under perceptual starvation achieve substantial gains of 5% average relative improvement across diverse benchmarks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes 'Starve to Perceive,' a minimal post-training modification for Vision-Language Models that restricts each visual observation to a tight token budget. This constraint is intended to eliminate 'lazy perception'—where models mimic zoom/crop/pan operations without functionally depending on their outputs—by making active perception the only viable path to task success. The central empirical claim is an average 5% relative improvement across diverse benchmarks, achieved without auxiliary losses, reward shaping, or architectural changes.
Significance. If the gains are shown to arise specifically from functional active perception rather than side-effects of the constraint, the method would provide a simple, plug-in intervention for improving VLM agents in high-resolution settings. The approach directly targets a documented learning asymmetry and requires no extra machinery, which is a practical strength for adoption in existing pipelines.
major comments (2)
- [Experimental Evaluation] Experimental Evaluation: The manuscript reports a 5% average relative improvement but provides no details on the exact benchmarks, baseline comparisons, statistical significance, variance across runs, or ablation controls. This prevents evaluation of whether the gains support the claim that perceptual starvation forces active perception.
- [Training Paradigm and Ablation Analysis] Training Paradigm and Ablation Analysis: No post-training ablation is reported in which the learned zoom/crop/pan operations are disabled or replaced by fixed/random views. Without this test, it remains possible that improvements arise from implicit regularization, altered gradient flow, or forced multi-turn reasoning rather than functional dependence on active perception outputs, undermining the central mechanistic claim.
minor comments (2)
- [Abstract] The abstract refers to 'diverse benchmarks' without naming them; listing the specific tasks and datasets in the abstract or introduction would improve immediate readability.
- [Method] Notation for the token budget constraint and observation limit should be introduced with a clear equation or pseudocode early in the method section to avoid ambiguity when describing the starvation mechanism.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments have prompted us to strengthen the experimental section with additional details and controls. We respond to each major comment below and indicate the corresponding revisions.
read point-by-point responses
-
Referee: [Experimental Evaluation] Experimental Evaluation: The manuscript reports a 5% average relative improvement but provides no details on the exact benchmarks, baseline comparisons, statistical significance, variance across runs, or ablation controls. This prevents evaluation of whether the gains support the claim that perceptual starvation forces active perception.
Authors: We appreciate the referee noting the need for greater transparency. The original manuscript presents the 5% relative gain in Section 4 across a suite of VQA, reasoning, and navigation benchmarks, with comparisons to standard fine-tuning baselines. To address the concern directly, the revised version adds explicit listings of all datasets, a new table with full baseline results, standard deviations computed over three independent runs, and paired t-test p-values confirming statistical significance (p < 0.05) for the reported improvements. Expanded ablation tables on token-budget sizes are also included in Section 5. revision: yes
-
Referee: [Training Paradigm and Ablation Analysis] Training Paradigm and Ablation Analysis: No post-training ablation is reported in which the learned zoom/crop/pan operations are disabled or replaced by fixed/random views. Without this test, it remains possible that improvements arise from implicit regularization, altered gradient flow, or forced multi-turn reasoning rather than functional dependence on active perception outputs, undermining the central mechanistic claim.
Authors: This is a fair and important point for isolating the mechanism. In the revised manuscript we add a post-training ablation that freezes the active-perception policy and substitutes fixed random views at inference time. Performance falls back to levels statistically indistinguishable from the unconstrained baseline, while attention maps show markedly lower utilization of the provided visual tokens. These results support that the gains arise from functional dependence on the learned operations rather than regularization or multi-turn effects alone. revision: yes
Circularity Check
No circularity: empirical training modification with no derivations
full rationale
The paper presents Starve to Perceive as a practical training change that restricts visual token budget per observation to force active perception strategies in VLMs. The reported outcome is an empirical 5% average relative improvement across benchmarks, achieved without auxiliary losses or architectural modifications. No equations, first-principles derivations, or predictions are offered that could reduce the gains to fitted parameters, self-defined quantities, or self-citation chains by construction. The motivation (that tight bandwidth makes active perception the only viable path) is a design rationale, not a mathematical claim that loops back on itself. The work is therefore self-contained as an experimental result rather than a derivation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Coarse global views plus language priors are sufficient for moderate accuracy on the target tasks, removing any incentive for multi-step visual search.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
By restricting the maximum token count per glimpse, we introduce a strict upper bound on the channel capacity between the original high-resolution image X and the model’s internal state. ... the only viable mathematical solution to maximize the objective is to learn a policy that actively filters out noise
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
the constrained environment acts as a strict physical regularizer ... active multi-step visual reasoning ceases to be an optional strategy; it becomes the singular pathway to maximizing the reward
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2511.05017 (2025) 13
Agrawal, A., KV, G., Aralikatti, R., Jagatap, G., Yuan, J., Kamarshi, V., Fanelli, A., Huang, F.: Towards mitigating hallucinations in large vision-language models by refining textual embeddings. arXiv preprint arXiv:2511.05017 (2025) 13
-
[2]
Advances in neural information processing systems35, 23716– 23736 (2022) 13
Alayrac, J.B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Men- sch, A., Millican, K., Reynolds, M., et al.: Flamingo: a visual language model for few-shot learning. Advances in neural information processing systems35, 23716– 23736 (2022) 13
work page 2022
-
[3]
Anderson, P., Wu, Q., Teney, D., Bruce, J., Johnson, M., Sünderhauf, N., Reid, I., Gould, S., van den Hengel, A.: Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments (2018),https:// arxiv.org/abs/1711.072801
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[4]
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Bai, J., Bai, S., Chen, K., Du, M., Fan, Y., Fan, Z., Ge, W., Liu, D., Men, R., Ren, X., et al.: Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond. arXiv preprint arXiv:2308.12966 (2023) 13
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., Tang, J., Zhong, H., Zhu, Y., Yang, M., Li, Z., Wan, J., Wang, P., Ding, W., Fu, Z., Xu, Y., Ye, J., Zhang, X., Xie, T., Cheng, Z., Zhang, H., Yang, Z., Xu, H., Lin, J.: Qwen2.5-vl technical report (2025),https://arxiv.org/abs/2502.13923 7, 8
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [6]
-
[7]
Gemini Team, Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., Millican, K., Silver, D., et al.: Gemini: A family of highlycapablemultimodalmodels.arXivpreprintarXiv:2508.11630(2025),https: //arxiv.org/abs/2312.1180513
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[8]
Cur- rent opinion in neurobiology21(4), 553–558 (2011) 1
Ibbotson, M., Krekelberg, B.: Visual perception and saccadic eye movements. Cur- rent opinion in neurobiology21(4), 553–558 (2011) 1
work page 2011
-
[9]
In: International conference on machine learning
Jia, C., Yang, Y., Xia, Y., Chen, Y.T., Parekh, Z., Pham, H., Le, Q., Sung, Y.H., Li, Z., Duerig, T.: Scaling up visual and vision-language representation learning with noisy text supervision. In: International conference on machine learning. pp. 4904–4916. PMLR (2021) 13
work page 2021
-
[10]
Lai, X., Li, J., Li, W., Liu, T., Li, T., Zhao, H.: Mini-o3: Scaling up reasoning patterns and interaction turns for visual search (2025),https://arxiv.org/abs/ 2509.079697, 8, 19, 20, 21
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[11]
Li, B., Zhang, Y., Guo, D., Zhang, R., Li, F., Zhang, H., Zhang, K., Zhang, P., Li, Y., Liu, Z., Li, C.: Llava-onevision: Easy visual task transfer (2024),https: //arxiv.org/abs/2408.033268
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
In: International conference on machine learning
Li, J., Li, D., Savarese, S., Hoi, S.: Blip-2: Bootstrapping language-image pre- training with frozen image encoders and large language models. In: International conference on machine learning. pp. 19730–19742. PMLR (2023) 13
work page 2023
-
[13]
arXiv preprint arXiv:2508.09456 (2025) 21
Li, J., Xu, B., Chen, S., Li, J., Lei, J., Zhao, H., Zhang, D.: Iag: Input-aware backdoor attack on vlm-based visual grounding. arXiv preprint arXiv:2508.09456 (2025) 21
-
[14]
In: Proceedings of the AAAI Conference on Artificial Intelligence
Li, J., Zhang, D., Wang, X., Hao, Z., Lei, J., Tan, Q., Zhou, C., Liu, W., Yang, Y., Xiong, X., et al.: Chemvlm: Exploring the power of multimodal large language models in chemistry area. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 415–423 (2025) 21
work page 2025
-
[15]
Evaluating Object Hallucination in Large Vision-Language Models
Li, Y., Du, Y., Zhou, K., Wang, J., Zhao, W.X., Wen, J.R.: Evaluating object hal- lucination in large vision-language models. arXiv preprint arXiv:2305.10355 (2023) 13
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[16]
arXiv preprint arXiv:2508.04567 (2025) 13
Li, Y., Zhou, K., Zhao, W.X., Fang, L., Wen, J.R.: Analyzing and mitigating object hallucination: A training bias perspective. arXiv preprint arXiv:2508.04567 (2025) 13
-
[17]
Advances in Neural Information Processing Systems , year =
Liu, C., Xu, Z., Wei, Q., Wu, J., Zou, J., Wang, X.E., Zhou, Y., Liu, S.: More thinking, less seeing? assessing amplified hallucination in multimodal reasoning models. arXiv preprint arXiv:2505.21523 (2025) 13
-
[18]
Advances in neural information processing systems36, 34892–34916 (2023) 13
Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. Advances in neural information processing systems36, 34892–34916 (2023) 13
work page 2023
-
[19]
Masry, A., Long, D.X., Tan, J.Q., Joty, S., Hoque, E.: Chartqa: A benchmark for question answering about charts with visual and logical reasoning (2022),https: //arxiv.org/abs/2203.102441
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[20]
In: International conference on machine learning
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021) 13
work page 2021
-
[21]
nature323(6088), 533–536 (1986) 2
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back- propagating errors. nature323(6088), 533–536 (1986) 2
work page 1986
-
[22]
Shao, Z., Wang, P., Zhu, Q., Xu, R., Song, J., Bi, X., Zhang, H., Zhang, M., Li, Y.K., Wu, Y., Guo, D.: Deepseekmath: Pushing the limits of mathematical reasoning in open language models (2024),https://arxiv.org/abs/2402.03300 7
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[23]
In: Christodoulopoulos, C., Chakraborty, T., Rose, C., Peng, V
Shen, H., Zhao, K., Zhao, T., Xu, R., Zhang, Z., Zhu, M., Yin, J.: ZoomEye: Enhancing multimodal LLMs with human-like zooming capabilities through tree- based image exploration. In: Christodoulopoulos, C., Chakraborty, T., Rose, C., Peng, V. (eds.) Proceedings of the 2025 Conference on Empirical Methods in Nat- ural Language Processing. pp. 6602–6618. Ass...
-
[24]
HybridFlow: A Flexible and Efficient RLHF Framework
Sheng, G., Zhang, C., Ye, Z., Wu, X., Zhang, W., Zhang, R., Peng, Y., Lin, H., Wu, C.: Hybridflow: A flexible and efficient rlhf framework. arXiv preprint arXiv: 2409.19256 (2024) 18
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[25]
Tishby, N., Zaslavsky, N.: Deep learning and the information bottleneck principle (2015),https://arxiv.org/abs/1503.024064
work page internal anchor Pith review Pith/arXiv arXiv 2015
- [26]
- [27]
- [28]
-
[29]
In:FindingsoftheAssociationforComputationalLinguistics:ACL2025.pp.3060– 3075 (2025) 21
Wang, H., Li, L., Qu, C., Xu, W., Zhu, F., Chu, W., Lin, F.: To code or not to code? adaptive tool integration for math language models via expectation-maximization. In:FindingsoftheAssociationforComputationalLinguistics:ACL2025.pp.3060– 3075 (2025) 21
work page 2025
-
[30]
In: Advances in Neural Information Processing Systems (NeurIPS) (2025), spotlight 21
Wang, H., Qu, C., Huang, Z., Chu, W., Lin, F., Chen, W.: Vl-rethinker: Incen- tivizing self-reflection of vision-language models with reinforcement learning. In: Advances in Neural Information Processing Systems (NeurIPS) (2025), spotlight 21
work page 2025
-
[31]
In: International Conference on Learning Representations (ICLR) (2026) 21
Wang, H., Que, H., Xu, Q., Liu, M., Zhou, W., Feng, J., Zhong, W., Ye, W., Yang, T., Huang, W., et al.: Reverse-engineered reasoning for open-ended generation. In: International Conference on Learning Representations (ICLR) (2026) 21
work page 2026
-
[32]
Wang, H., Su, A., Ren, W., Lin, F., Chen, W.: Pixel reasoner: Incentivizing pixel-space reasoning with curiosity-driven reinforcement learning (2025),https: //arxiv.org/abs/2505.159662, 7, 8, 13, 20
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[33]
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
Wang, H., Wei, C., Ren, W., Liu, J., Lin, F., Chen, W.: Rationalrewards: Rea- soning rewards scale visual generation both training and test time. arXiv preprint arXiv:2604.11626 (2026) 21
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[34]
In: International Conference on Learning Representations (ICLR) (2026) 21
Wang, H., Xu, Q., Liu, C., Wu, J., Lin, F., Chen, W.: Emergent hierarchical rea- soning in llms through reinforcement learning. In: International Conference on Learning Representations (ICLR) (2026) 21
work page 2026
-
[35]
In: International Conference on Machine Learning (ICML) (2026), spotlight 21
Wang, H., Xu, Q., Wang, C., Xue, T., Peng, C., Chen, W., Lin, F.: Bad seeing or bad thinking? rewarding perception for vision-language reasoning. In: International Conference on Machine Learning (ICML) (2026), spotlight 21
work page 2026
- [36]
- [37]
- [38]
-
[39]
xAI: Grok-1.5 vision preview.https://x.ai/news/grok- 1.5v(Apr 2024), ac- cessed: 2024-08-27 8
work page 2024
-
[40]
Yu, Q., Zhang, Z., Zhu, R., Yuan, Y., Zuo, X., Yue, Y., Dai, W., Fan, T., Liu, G., Liu, L., Liu, X., Lin, H., Lin, Z., Ma, B., Sheng, G., Tong, Y., Zhang, C., Zhang, M., Zhang, W., Zhu, H., Zhu, J., Chen, J., Chen, J., Wang, C., Yu, H., Song, Y., Wei, X., Zhou, H., Liu, J., Ma, W.Y., Zhang, Y.Q., Yan, L., Qiao, M., Wu, Y., Wang, M.: Dapo: An open-source l...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[41]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Zhang, D., Lei, J., Li, J., Wang, X., Liu, Y., Yang, Z., Li, J., Wang, W., Yang, S., Wu, J., et al.: Critic-v: Vlm critics help catch vlm errors in multimodal reasoning. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 9050–9061 (2025) 21
work page 2025
-
[42]
Zhang, X., Gao, Z., Zhang, B., Li, P., Zhang, X., Liu, Y., Yuan, T., Wu, Y., Jia, Y., Zhu, S.C., Li, Q.: Adaptive chain-of-focus reasoning via dynamic visual search and zooming for efficient vlms (2025),https://arxiv.org/abs/2505.154368, 13
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[43]
Zhang, Y.F., Zhang, H., Tian, H., Fu, C., Zhang, S., Wu, J., Li, F., Wang, K., Wen, Q., Zhang, Z., Wang, L., Jin, R., Tan, T.: Mme-realworld: Could your multimodal llm challenge high-resolution real-world scenarios that are difficult for humans? (2025),https://arxiv.org/abs/2408.132578
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[44]
Evaluating and steering modality preferences in multimodal large language model
Zhang, Y., Ma, J., Hou, Y., Bai, X., Chen, K., Xiang, Y., Yu, J., Zhang, M.: Evaluating and steering modality preferences in multimodal large language model, 2025a. URL https://arxiv. org/abs/2505.20977 21
-
[45]
Instruction Anchor: Dissecting the Mechanistic Dynamics of Modality Arbitration
Zhang, Y., Xu, M., Bai, X., Zhang, P., Xiang, Y., Zhang, M., et al.: Instruction anchors: Dissecting the causal dynamics of modality arbitration. arXiv preprint arXiv:2602.03677 (2026) 21
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[46]
When modalities conflict: How unimodal reasoning uncertainty governs preference dynamics in mllms,
Zhang, Z., Wang, T., Gong, X., Shi, Y., Wang, H., Wang, D., Hu, L.: When modal- ities conflict: How unimodal reasoning uncertainty governs preference dynamics in mllms. arXiv preprint arXiv:2511.02243 (2025) 13
-
[47]
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Zheng, Y., Zhang, R., Zhang, J., Ye, Y., Luo, Z., Feng, Z., Ma, Y.: Llamafac- tory: Unified efficient fine-tuning of 100+ language models. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). Association for Computational Linguistics, Bangkok, Thailand (2024),http://arxiv.org/abs/24...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[48]
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning
Zheng, Z., Yang, M., Hong, J., Zhao, C., Xu, G., Yang, L., Shen, C., Yu, X.: Deepeyes: Incentivizing "thinking with images" via reinforcement learning (2025), https://arxiv.org/abs/2505.143627, 8, 13, 21
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[49]
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Zhu, J., Wang, W., Chen, Z., Liu, Z., Ye, S., Gu, L., Tian, H., Duan, Y., Su, W., Shao, J., et al.: Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models. arXiv preprint arXiv:2504.10479 (2025) 13 Supplementary Material A Limitations WhileStarve to Perceiveachieves strong gains with minimal modifications to existin...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[50]
The agent thinks about needing to examine the region closely
-
[51]
The agent calls focus on the specific region(s) from the input
-
[52]
If a box is marked "Wait, this box seems wrong.", the agent 13self-corrects by pivoting to other regions. 14 15Strict Rules: 16- PRESERVE all bounding box coordinates exactly. 17- ALL <box>...</box> bboxes must appear in <tool_call>...</tool_call>. 18- Each turn: <think>...</think> <tool_call>...</tool_call> 19OR: <think>...</think> <answer>...</answer> 2...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.