Recognition: unknown
Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation
Pith reviewed 2026-05-10 13:05 UTC · model grok-4.3
The pith
Alignment in early visual cortex predicts lower sycophancy in vision-language models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Alignment specifically in early visual cortex (V1--V3) is a reliable negative predictor of sycophancy, with all leave-one-out correlations negative and the strongest effect for existence denial attacks. This anatomically specific relationship is absent in higher-order category-selective regions, suggesting that faithful low-level visual encoding provides a measurable anchor against adversarial linguistic override.
What carries the argument
The brain alignment metric based on predicting fMRI responses in visual cortex regions of interest from model activations, which quantifies how well model visual features match human neural patterns.
If this is right
- Greater alignment in V1-V3 leads to lower sycophancy across tested prompt categories.
- The protective effect is most pronounced against existence denial attacks.
- Models of varying sizes and architectures show this pattern consistently.
- No similar predictive power comes from alignment in higher visual areas.
Where Pith is reading between the lines
- Training objectives that enhance early visual fidelity could improve model robustness to manipulation.
- This link suggests testing similar alignments for resistance to other forms of adversarial input.
- Developers might use brain alignment scores as a proxy for safety properties in vision models.
- Extending this to other brain areas or modalities could reveal broader principles of model stability.
Load-bearing premise
That the accuracy of predicting fMRI responses from model features accurately represents the model's actual visual processing and directly affects its response to conflicting text prompts.
What would settle it
Measuring sycophancy in models before and after fine-tuning to improve or worsen prediction of V1-V3 fMRI responses using the same set of 76,800 prompts.
Figures
read the original abstract
Vision-language models are increasingly deployed in high-stakes settings, yet their susceptibility to sycophantic manipulation remains poorly understood, particularly in relation to how these models represent visual information internally. Whether models whose visual representations more closely mirror human neural processing are also more resistant to adversarial pressure is an open question with implications for both neuroscience and AI safety. We investigate this question by evaluating 12 open-weight vision-language models spanning 6 architecture families and a 40$\times$ parameter range (256M--10B) along two axes: brain alignment, measured by predicting fMRI responses from the Natural Scenes Dataset across 8 human subjects and 6 visual cortex regions of interest, and sycophancy, measured through 76,800 two-turn gaslighting prompts spanning 5 categories and 10 difficulty levels. Region-of-interest analysis reveals that alignment specifically in early visual cortex (V1--V3) is a reliable negative predictor of sycophancy ($r = -0.441$, BCa 95\% CI $[-0.740, -0.031]$), with all 12 leave-one-out correlations negative and the strongest effect for existence denial attacks ($r = -0.597$, $p = 0.040$). This anatomically specific relationship is absent in higher-order category-selective regions, suggesting that faithful low-level visual encoding provides a measurable anchor against adversarial linguistic override in vision-language models. We release our code on \href{https://github.com/aryashah2k/Gaslight-Gatekeep-Sycophantic-Manipulation}{GitHub} and dataset on \href{https://huggingface.co/datasets/aryashah00/Gaslight-Gatekeep-V1-V3}{Hugging Face}
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates 12 open-weight vision-language models across 6 architecture families on two axes: brain alignment, quantified via linear prediction of fMRI responses from the Natural Scenes Dataset in 6 visual ROIs across 8 subjects, and sycophancy, quantified via 76,800 two-turn gaslighting prompts in 5 categories. It reports a reliable negative correlation between alignment specifically in V1–V3 and overall sycophancy rate (r = −0.441, BCa 95% CI [−0.740, −0.031]), with all 12 leave-one-out correlations negative and the strongest effect for existence-denial attacks (r = −0.597, p = 0.040); the relationship is absent in higher-order ROIs.
Significance. If the correlation is robust, the result supplies a concrete, anatomically specific empirical link between low-level visual fidelity and resistance to linguistic override, with direct implications for both neuroscience-inspired model design and AI safety. Strengths include the multi-architecture, multi-scale model set, public code and dataset release, and explicit robustness checks (leave-one-out, bootstrap CI); the paper correctly frames the finding as correlational rather than causal.
minor comments (3)
- [Methods] Methods section on brain alignment: specify the exact regression procedure (ridge or otherwise), regularization parameter selection, and cross-validation scheme used to compute prediction accuracy for each ROI and subject.
- [Results] Results, Table 2 or equivalent: report whether the p = 0.040 for existence-denial attacks is corrected for the five attack categories and six ROIs; if uncorrected, add a note on family-wise error control.
- [Methods] Prompt construction: clarify how the 10 difficulty levels are operationalized and whether prompt templates were held constant across models.
Simulated Author's Rebuttal
We thank the referee for their careful reading of the manuscript, accurate summary of our methods and results, and recommendation for minor revision. We appreciate the recognition of the multi-architecture scope, public releases, and explicit robustness checks, as well as the correct framing of the findings as correlational.
Circularity Check
No circularity: empirical correlation between independent measurements
full rationale
The paper reports a correlation (r = -0.441) between two separately computed quantities: (1) brain alignment scores obtained by predicting fMRI responses from the Natural Scenes Dataset across subjects and ROIs, and (2) sycophancy rates measured via 76,800 prompt evaluations. No equation, ansatz, or self-citation reduces the reported relationship to a fitted parameter or prior result by construction. Leave-one-out checks and anatomical specificity are direct empirical observations, not forced outputs. The analysis is self-contained and externally falsifiable.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption fMRI responses from the Natural Scenes Dataset provide a reliable ground-truth measure of human early visual cortex activity
Reference graph
Works this paper leans on
-
[1]
@esa (Ref
\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...
-
[2]
\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...
-
[3]
@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...
-
[4]
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models, 2023 a . URL https://arxiv.org/abs/2301.12597
work page internal anchor Pith review arXiv 2023
-
[5]
Llava-next: Improved reasoning, ocr, and world knowledge, January 2024 a
Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, and Yong Jae Lee. Llava-next: Improved reasoning, ocr, and world knowledge, January 2024 a . URL https://llava-vl.github.io/blog/2024-01-30-llava-next/
2024
-
[6]
Improved baselines with visual instruction tuning, 2023
Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning, 2023
2023
-
[7]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report, 2...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[8]
Performance-optimized hierarchical models predict neural responses in higher visual cortex
Daniel L K Yamins, Ha Hong, Charles F Cadieu, Ethan A Solomon, Darren Seibert, and James J DiCarlo. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl. Acad. Sci. U. S. A., 111 0 (23): 0 8619--8624, June 2014
2014
-
[9]
Martin Schrimpf, Jonas Kubilius, Ha Hong, Najib J. Majaj, Rishi Rajalingham, Elias B. Issa, Kohitij Kar, Pouya Bashivan, Jonathan Prescott-Roy, Franziska Geiger, Kailyn Schmidt, Daniel L. K. Yamins, and James J. DiCarlo. Brain-score: Which artificial neural network for object recognition is most brain-like? bioRxiv, 2020. doi:10.1101/407007. URL https://w...
-
[10]
A large-scale examination of inductive biases shaping high-level visual representation in brains and machines
Colin Conwell, Jacob S Prince, Kendrick N Kay, George A Alvarez, and Talia Konkle. A large-scale examination of inductive biases shaping high-level visual representation in brains and machines. Nat. Commun., 15 0 (1): 0 9383, October 2024
2024
- [11]
-
[12]
Towards Understanding Sycophancy in Language Models
Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, and Ethan Perez. Towards understanding sycophancy in language models, 2025. URL h...
work page internal anchor Pith review arXiv 2025
-
[13]
In: Findings of the Association for Computational Linguistics: ACL 2023, pp
Ethan Perez, Sam Ringer, Kamile Lukosiute, Karina Nguyen, Edwin Chen, Scott Heiner, Craig Pettit, Catherine Olsson, Sandipan Kundu, Saurav Kadavath, Andy Jones, Anna Chen, Benjamin Mann, Brian Israel, Bryan Seethor, Cameron McKinnon, Christopher Olah, Da Yan, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li, Eli Tran-Johnson, Guro Khundadze, Jackson Ke...
-
[14]
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedback,...
work page internal anchor Pith review arXiv 2022
-
[15]
On evaluating ad- versarial robustness of large vision-language models
Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Chongxuan Li, Ngai-Man Cheung, and Min Lin. On evaluating adversarial robustness of large vision-language models, 2023. URL https://arxiv.org/abs/2305.16934
-
[16]
Survey of vulnerabilities in large language models revealed by adversarial attacks
Erfan Shayegani, Md Abdullah Al Mamun, Yu Fu, Pedram Zaree, Yue Dong, and Nael Abu-Ghazaleh. Survey of vulnerabilities in large language models revealed by adversarial attacks, 2023. URL https://arxiv.org/abs/2310.10844
-
[17]
arXiv preprint arXiv:2311.17600 , volume=
Xin Liu, Yichen Zhu, Jindong Gu, Yunshi Lan, Chao Yang, and Yu Qiao. Mm-safetybench: A benchmark for safety evaluation of multimodal large language models, 2024 b . URL https://arxiv.org/abs/2311.17600
-
[18]
A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence
Emily J Allen, Ghislain St-Yves, Yihan Wu, Jesse L Breedlove, Jacob S Prince, Logan T Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, J Benjamin Hutchinson, Thomas Naselaris, and Kendrick Kay. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nat. Neurosci., 25 0 (1): 0 116--126, January 2022
2022
-
[19]
Better bootstrap confidence intervals
Bradley Efron. Better bootstrap confidence intervals. J. Am. Stat. Assoc., 82 0 (397): 0 171--185, March 1987
1987
-
[20]
Encoding and decoding in fMRI
Thomas Naselaris, Kendrick N Kay, Shinji Nishimoto, and Jack L Gallant. Encoding and decoding in fMRI . Neuroimage, 56 0 (2): 0 400--410, May 2011
2011
-
[21]
Identifying natural images from human brain activity
Kendrick N Kay, Thomas Naselaris, Ryan J Prenger, and Jack L Gallant. Identifying natural images from human brain activity. Nature, 452 0 (7185): 0 352--355, March 2008
2008
-
[22]
Representational similarity analysis - connecting the branches of systems neuroscience
Nikolaus Kriegeskorte, Marieke Mur, and Peter Bandettini. Representational similarity analysis - connecting the branches of systems neuroscience. Front. Syst. Neurosci., 2: 0 4, November 2008
2008
-
[23]
Diverse deep neural networks all predict human inferior temporal cortex well, after training and fitting
Katherine R Storrs, Tim C Kietzmann, Alexander Walther, Johannes Mehrer, and Nikolaus Kriegeskorte. Diverse deep neural networks all predict human inferior temporal cortex well, after training and fitting. J. Cogn. Neurosci., 33 0 (10): 0 2044--2064, September 2021
2044
-
[24]
Limits to visual representational correspondence between convolutional neural networks and the human brain
Yaoda Xu and Maryam Vaziri-Pashkam. Limits to visual representational correspondence between convolutional neural networks and the human brain. Nat. Commun., 12 0 (1): 0 2065, April 2021
2065
-
[25]
A self-supervised domain-general learning framework for human ventral stream representation
Talia Konkle and George A Alvarez. A self-supervised domain-general learning framework for human ventral stream representation. Nat. Commun., 13 0 (1): 0 491, January 2022
2022
-
[26]
Vandermeulen, Katherine Hermann, Andrew K
Lukas Muttenthaler, Lorenz Linhardt, Jonas Dippel, Robert A. Vandermeulen, Katherine Hermann, Andrew K. Lampinen, and Simon Kornblith. Improving neural network representations using human similarity judgments, 2023. URL https://arxiv.org/abs/2306.04507
-
[27]
Visual field maps in human cortex
Brian A Wandell, Serge O Dumoulin, and Alyssa A Brewer. Visual field maps in human cortex. Neuron, 56 0 (2): 0 366--383, October 2007
2007
-
[28]
The fusiform face area: a module in human extrastriate cortex specialized for face perception
N Kanwisher, J McDermott, and M M Chun. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci., 17 0 (11): 0 4302--4311, June 1997
1997
-
[29]
A cortical representation of the local visual environment
R Epstein and N Kanwisher. A cortical representation of the local visual environment. Nature, 392 0 (6676): 0 598--601, April 1998
1998
-
[30]
A cortical area selective for visual processing of the human body
P E Downing, Y Jiang, M Shuman, and N Kanwisher. A cortical area selective for visual processing of the human body. Science, 293 0 (5539): 0 2470--2473, September 2001
2001
- [31]
-
[32]
Alignment and adversarial robustness: Are more human-like models more secure?, 2025
Blaine Hoak, Kunyang Li, and Patrick McDaniel. Alignment and adversarial robustness: Are more human-like models more secure?, 2025. URL https://arxiv.org/abs/2502.12377
-
[33]
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences, 2023. URL https://arxiv.org/abs/1706.03741
-
[34]
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, ...
work page Pith review arXiv 2022
-
[35]
Tulika Saha, Vaibhav Gakhreja, Anindya Sundar Das, Souhitya Chakraborty, and Sriparna Saha
Leonardo Ranaldi and Giulia Pucci. When large language models contradict humans? large language models' sycophantic behaviour, 2025. URL https://arxiv.org/abs/2311.09410
-
[36]
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob...
work page internal anchor Pith review arXiv 2023
-
[37]
Language models learn to mislead humans via rlhf
Jiaxin Wen, Ruiqi Zhong, Akbir Khan, Ethan Perez, Jacob Steinhardt, Minlie Huang, Samuel R. Bowman, He He, and Shi Feng. Language models learn to mislead humans via rlhf, 2024. URL https://arxiv.org/abs/2409.12822
-
[38]
Understanding the effects of iterative prompting on truthfulness, 2024
Satyapriya Krishna, Chirag Agarwal, and Himabindu Lakkaraju. Understanding the effects of iterative prompting on truthfulness, 2024. URL https://arxiv.org/abs/2402.06625
-
[39]
Bradley Efron and Robert J Tibshirani.An introduction to the bootstrap, volume
Stephanie Lin, Jacob Hilton, and Owain Evans. T ruthful QA : Measuring how models mimic human falsehoods. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3214--3252, Dublin, Ireland, May 2022. Association for Computa...
-
[40]
Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna ...
work page internal anchor Pith review arXiv 2024
-
[41]
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski,...
work page internal anchor Pith review arXiv 2022
-
[42]
Visual adversarial examples jailbreak aligned largelanguagemodels
Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Peter Henderson, Mengdi Wang, and Prateek Mittal. Visual adversarial examples jailbreak aligned large language models, 2023. URL https://arxiv.org/abs/2306.13213
-
[43]
arXiv preprint arXiv:2309.00236 , year=
Luke Bailey, Euan Ong, Stuart Russell, and Scott Emmons. Image hijacks: Adversarial images can control generative models at runtime, 2024. URL https://arxiv.org/abs/2309.00236
-
[44]
arXiv preprint arXiv:2403.09792 , year=
Yifan Li, Hangyu Guo, Kun Zhou, Wayne Xin Zhao, and Ji-Rong Wen. Images are achilles' heel of alignment: Exploiting visual vulnerabilities for jailbreaking multimodal large language models, 2025. URL https://arxiv.org/abs/2403.09792
-
[45]
Eyes wide shut? exploring the visual shortcomings of multimodal llms
Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, and Saining Xie. Eyes wide shut? exploring the visual shortcomings of multimodal llms, 2024. URL https://arxiv.org/abs/2401.06209
-
[46]
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Wayne Xin Zhao, and Ji-Rong Wen. Evaluating object hallucination in large vision-language models, 2023 b . URL https://arxiv.org/abs/2305.10355
work page internal anchor Pith review arXiv 2023
-
[47]
Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness, 2022. URL https://arxiv.org/abs/1811.12231
-
[48]
Shortcut learning in deep neural networks
Robert Geirhos, J \"o rn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, and Felix A Wichmann. Shortcut learning in deep neural networks. Nat. Mach. Intell., 2 0 (11): 0 665--673, November 2020
2020
-
[49]
Multimodal neurons in artificial neural networks
Gabriel Goh, Nick Cammarata, Chelsea Voss, Shan Carter, Michael Petrov, Ludwig Schubert, Alec Radford, and Chris Olah. Multimodal neurons in artificial neural networks. Distill, 6 0 (3), March 2021
2021
-
[50]
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision, 2021. URL https://arxiv.org/abs/2103.00020
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[51]
SmolVLM: Redefining small and efficient multimodal models
Andrés Marafioti, Orr Zohar, Miquel Farré, Merve Noyan, Elie Bakouch, Pedro Cuenca, Cyril Zakka, Loubna Ben Allal, Anton Lozhkov, Nouamane Tazi, Vaibhav Srivastav, Joshua Lochner, Hugo Larcher, Mathieu Morlon, Lewis Tunstall, Leandro von Werra, and Thomas Wolf. Smolvlm: Redefining small and efficient multimodal models, 2025. URL https://arxiv.org/abs/2504.05299
work page internal anchor Pith review arXiv 2025
-
[52]
Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Bey...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[53]
LFM2 technical report.arXiv:2511.23404, 2025
Alexander Amini, Anna Banaszak, Harold Benoit, Arthur Böök, Tarek Dakhran, Song Duong, Alfred Eng, Fernando Fernandes, Marc Härkönen, Anne Harrington, Ramin Hasani, Saniya Karwa, Yuri Khrustalev, Maxime Labonne, Mathias Lechner, Valentine Lechner, Simon Lee, Zetian Li, Noel Loo, Jacob Marks, Edoardo Mosca, Samuel J. Paech, Paul Pak, Rom N. Parnichkun, Ale...
-
[54]
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, and Junyang Lin. Qwen2-vl: Enhancing vision-language model's perception of the world at any resolution, 2024. URL https://arxiv.org/abs/2409.12191
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[55]
Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai, Matt...
work page internal anchor Pith review arXiv 2024
-
[56]
What matters when building vision-language models?, 2024
Hugo Laurençon, Léo Tronchon, Matthieu Cord, and Victor Sanh. What matters when building vision-language models?, 2024. URL https://arxiv.org/abs/2405.02246
-
[57]
PaliGemma: A versatile 3B VLM for transfer
Lucas Beyer, Andreas Steiner, André Susano Pinto, Alexander Kolesnikov, Xiao Wang, Daniel Salz, Maxim Neumann, Ibrahim Alabdulmohsin, Michael Tschannen, Emanuele Bugliarello, Thomas Unterthiner, Daniel Keysers, Skanda Koppula, Fangyu Liu, Adam Grycner, Alexey Gritsenko, Neil Houlsby, Manoj Kumar, Keran Rong, Julian Eisenschlos, Rishabh Kabra, Matthias Bau...
work page internal anchor Pith review arXiv 2024
-
[58]
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. Microsoft coco: Common objects in context, 2015. URL https://arxiv.org/abs/1405.0312
work page internal anchor Pith review arXiv 2015
-
[59]
Influence: Science and practice, 3rd ed
Robert Cialdini. Influence: Science and practice, 3rd ed. rd ed, 3: 0 253, 1993
1993
-
[60]
Statistical power analysis for the behavioral sciences
Jacob Cohen. Statistical power analysis for the behavioral sciences. Routledge, London, England, 2 edition, May 2013
2013
-
[61]
D. H. Hubel and T. N. Wiesel. Receptive fields and functional architecture of monkey striate cortex. The Journal of Physiology, 195 0 (1): 0 215--243, 1968. doi:https://doi.org/10.1113/jphysiol.1968.sp008455. URL https://physoc.onlinelibrary.wiley.com/doi/abs/10.1113/jphysiol.1968.sp008455
- [62]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.