LogitDynamics: Reliable ViT Error Detection from Layerwise Logit Trajectories

Ido Beigelman; Moti Freiman

arxiv: 2604.10643 · v1 · submitted 2026-04-12 · 💻 cs.CV

LogitDynamics: Reliable ViT Error Detection from Layerwise Logit Trajectories

Ido Beigelman , Moti Freiman This is my paper

Pith reviewed 2026-05-10 16:21 UTC · model grok-4.3

classification 💻 cs.CV

keywords error detectionvision transformerslogit trajectorieslayerwise featuresconfidence estimationViTmisclassification predictioninternal signals

0 comments

The pith

A linear probe on how class logits evolve across the final layers of a Vision Transformer can predict when its classification is wrong.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops LogitDynamics to detect errors in ViT image classifications from signals available in one forward pass. Lightweight linear heads attached to intermediate layers collect the logits of the predicted class and its top competitors, along with statistics on how unstable the top class rankings are from layer to layer. A linear probe is trained on these features to output an error indicator. The method matches or exceeds baseline performance on area under the precision-recall curve across datasets while showing stronger generalization when the probe is trained on one dataset and tested on another, all with negligible added computation.

Core claim

The trajectories of class logits and the instability of top-ranked classes across the last L layers of a ViT supply a usable signal for whether the final prediction is an error, which can be extracted via simple auxiliary heads and fed to a linear classifier.

What carries the argument

LogitDynamics features: the logits of the predicted class and top-K competitors plus instability statistics of top-ranked classes, collected from the last L layers through attached linear heads and used as input to an error-predicting linear probe.

If this is right

Error detection requires only a single forward pass plus lightweight heads and a probe.
AUCPR matches or exceeds that of standard confidence baselines on the evaluated datasets.
Cross-dataset transfer of the error predictor is stronger than for competing methods.
The approach exploits internal depth-wise signals in a manner parallel to hallucination detection in language models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

ViTs appear to accumulate and sometimes revise class evidence progressively through later layers instead of committing to a decision only at the output.
The same layerwise logit tracking could be tested on convolutional networks or other sequential architectures to check whether comparable instability signals appear outside transformers.
In settings with distribution shift, combining these dynamics with final-layer confidence might yield more robust uncertainty estimates than either alone.
Replacing the linear probe with a small nonlinear model or selecting layers adaptively could be explored to see whether further gains are available without losing the method's simplicity.

Load-bearing premise

The logit values and instability statistics from the last L layers encode a signal about misclassifications that generalizes across datasets and is not created by the auxiliary heads.

What would settle it

Training the error probe on features from one dataset such as ImageNet and finding that its AUCPR on a disjoint domain such as medical images falls to the level of a random guess or below the strongest baseline would falsify the claim of a generalizable logit-trajectory signal.

Figures

Figures reproduced from arXiv: 2604.10643 by Ido Beigelman, Moti Freiman.

**Figure 1.** Figure 1: Cross-dataset AUCPR for each baseline method, reported as the performance difference relative to LogitDynamics (LogitDy [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Ablation study. Cross-dataset AUCPR differences reported as with Top-K dynamics features minus without Top-K dynamics features. Positive values indicate that adding Top-K dynamics improves AUCPR, while negative values indicate a decrease. or matches AUCPR over classical logit-based confidence measures and internal-activation baselines, while requiring minimal additional computation and no modification to… view at source ↗

**Figure 3.** Figure 3: Cross-dataset AUCPR differences relative to Logit [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

Reliable confidence estimation is critical when deploying vision models. We study error prediction: determining whether an image classifier's output is correct using only signals from a single forward pass. Motivated by internal-signal hallucination detection in large language models, we investigate whether similar depth-wise signals exist in Vision Transformers (ViTs). We propose a simple method that models how class evidence evolves across layers. By attaching lightweight linear heads to intermediate layers, we extract features from the last L layers that capture both the logits of the predicted class and its top-K competitors, as well as statistics describing instability of top-ranked classes across depth. A linear probe trained on these features predicts the error indicator. Across datasets, our method improves or matches AUCPR over baselines and shows stronger cross-dataset generalization while requiring minimal additional computation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a lightweight logit-trajectory method for flagging ViT errors that adapts LLM-style monitoring, but the abstract supplies too few numbers and protocol details to judge whether the gains are real or setup-dependent.

read the letter

The core idea is to watch how the logits for the predicted class and its top competitors evolve across the last L layers of a ViT, add a few instability statistics, and train a linear probe on those features to predict mistakes. This is a direct, low-cost transfer of internal-signal monitoring from language models to vision transformers, and the abstract claims it matches or beats baselines on AUCPR while transferring better across datasets with almost no extra compute at inference time. That combination of features and the emphasis on cross-dataset behavior is the part that feels new relative to prior error-detection work in vision. The method stays simple and the motivation is practical, which is useful for anyone who needs to know when a deployed ViT is likely to be wrong without running multiple passes or heavy ensembles. The experimental claims are the weak point. The abstract states improvements and stronger generalization but gives no actual AUCPR values, no list of baselines, no ablation results, and no mention of statistical tests. That leaves the size and reliability of the gains unclear. The stress-test note about the auxiliary linear heads is also worth checking in the full text. Those heads are trained on the source distribution to produce logits from intermediate features, so the extracted trajectories could embed dataset-specific class boundaries. If the heads must be retrained when moving to a new dataset, then the reported cross-dataset wins may not demonstrate intrinsic dynamics of the ViT but rather how well the heads fit each evaluation set. The abstract does not say whether the heads stay frozen. This work is aimed at practitioners who care about reliable single-pass confidence for vision models rather than at theorists. The idea is coherent and the overhead is low, so it is worth sending to a referee who can examine the full experiments and the head-training protocol. A serious review would clarify whether the method delivers what the abstract promises.

Referee Report

3 major / 2 minor

Summary. The paper proposes LogitDynamics for error detection in Vision Transformers: lightweight linear heads are attached to intermediate layers to extract features from the last L layers, including logits of the predicted class and its top-K competitors plus depth-wise instability statistics of top-ranked classes. A linear probe is then trained on these features to predict whether the ViT's classification is erroneous. The method is claimed to improve or match AUCPR over baselines across datasets while showing stronger cross-dataset generalization and requiring minimal extra computation, motivated by internal-signal approaches in LLMs.

Significance. If the central claims hold with proper controls, the work could offer a practical, low-overhead technique for post-hoc error detection in ViT classifiers by leveraging layerwise logit trajectories. This would be valuable for reliable deployment in safety-critical vision applications and could bridge ideas from LLM hallucination detection to vision models. The emphasis on cross-dataset transfer and minimal computation is a potential strength if the features prove intrinsic rather than artifactual.

major comments (3)

[Abstract] Abstract: the claim of 'stronger cross-dataset generalization' is load-bearing for the paper's contribution, yet the description does not specify whether the auxiliary linear heads are frozen from the source training distribution or retrained on target data when evaluating transfer. If retrained, the extracted top-K logits and instability statistics (e.g., rank changes across depth) necessarily encode target-specific class boundaries, making apparent AUCPR gains and generalization an artifact of head adaptation rather than a property of ViT logit trajectories.
[Method] Method (feature extraction paragraph): the instability statistics are described only at a high level ('statistics describing instability of top-ranked classes across depth') with no explicit equations or definitions (e.g., whether variance of logit values, rank-flip counts, or normalized entropy). Without these, it is impossible to determine if the features are reproducible or if they reduce to quantities already captured by standard baselines such as maximum softmax probability.
[Results] Results (cross-dataset experiments): the abstract asserts AUCPR improvements and stronger generalization, but the manuscript must report the exact numerical deltas, the precise baseline definitions (e.g., MSP, temperature scaling, or other logit-based detectors), the value of L and K, and whether any statistical tests (e.g., paired t-tests or bootstrap confidence intervals) support the 'improves or matches' statement. Absent these, the central empirical claim cannot be assessed.

minor comments (2)

[Abstract] The abstract and introduction should explicitly cite the specific LLM hallucination-detection papers that motivate the layerwise approach, including any quantitative parallels drawn.
[Method] Notation for the linear probe and the error indicator variable should be introduced with equations rather than prose to avoid ambiguity when describing the training objective.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and valuable suggestions. We have carefully considered each comment and provide point-by-point responses below. Where appropriate, we have revised the manuscript to address the concerns raised.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of 'stronger cross-dataset generalization' is load-bearing for the paper's contribution, yet the description does not specify whether the auxiliary linear heads are frozen from the source training distribution or retrained on target data when evaluating transfer. If retrained, the extracted top-K logits and instability statistics (e.g., rank changes across depth) necessarily encode target-specific class boundaries, making apparent AUCPR gains and generalization an artifact of head adaptation rather than a property of ViT logit trajectories.

Authors: We agree that this clarification is essential. In the cross-dataset experiments, the auxiliary linear heads are trained on the source dataset and kept frozen for the target dataset. The linear probe is likewise trained using source features only. This design ensures the extracted features capture intrinsic ViT logit trajectories rather than target-specific adaptations. We have revised the abstract and the method section to explicitly describe this experimental protocol. revision: yes
Referee: [Method] Method (feature extraction paragraph): the instability statistics are described only at a high level ('statistics describing instability of top-ranked classes across depth') with no explicit equations or definitions (e.g., whether variance of logit values, rank-flip counts, or normalized entropy). Without these, it is impossible to determine if the features are reproducible or if they reduce to quantities already captured by standard baselines such as maximum softmax probability.

Authors: We acknowledge the need for greater precision in describing the instability statistics. In the revised version, we will include explicit equations defining these features. These will be accompanied by pseudocode to facilitate reproducibility and to clarify how they differ from single-layer baselines like maximum softmax probability. revision: yes
Referee: [Results] Results (cross-dataset experiments): the abstract asserts AUCPR improvements and stronger generalization, but the manuscript must report the exact numerical deltas, the precise baseline definitions (e.g., MSP, temperature scaling, or other logit-based detectors), the value of L and K, and whether any statistical tests (e.g., paired t-tests or bootstrap confidence intervals) support the 'improves or matches' statement. Absent these, the central empirical claim cannot be assessed.

Authors: We will expand the results section to provide a table with exact AUCPR values for LogitDynamics and all compared baselines, including clear definitions of MSP and other methods. The values of L and K used in the experiments will be stated, along with any statistical tests such as bootstrap confidence intervals to substantiate the performance claims. revision: yes

Circularity Check

0 steps flagged

No circularity: independent linear probe trained on extracted logit features

full rationale

The paper's core method attaches auxiliary linear heads to intermediate ViT layers, extracts logit values for the predicted class plus top-K competitors along with depth-wise instability statistics from the last L layers, and trains a separate linear probe on these features to predict the binary error indicator. This is a standard supervised feature-based classifier with no equations or definitions that reduce the error prediction to the input features or fitted quantities by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps in the provided derivation. Cross-dataset generalization is presented as an empirical outcome rather than a definitional necessity, and the approach remains self-contained against external benchmarks without renaming known results or smuggling assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only, the claim rests on the empirical premise that layerwise logit dynamics are informative for error prediction; no free parameters, axioms, or invented entities are explicitly introduced beyond standard neural-network components.

pith-pipeline@v0.9.0 · 5432 in / 1071 out tokens · 65019 ms · 2026-05-10T16:21:51.922421+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Detecting adver- sarial examples and other misclassifications in neural net- works by introspection.arXiv preprint arXiv:1905.09186,

Jonathan Aigrain and Marcin Detyniecki. Detecting adver- sarial examples and other misclassifications in neural net- works by introspection.arXiv preprint arXiv:1905.09186,

work page arXiv 1905
[2]

The internal state of an LLM knows when it’s lying

Amos Azaria and Tom Mitchell. The internal state of an LLM knows when it’s lying. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 967– 976, Singapore, 2023. Association for Computational Lin- guistics. 1

work page 2023
[3]

Beyond token probes: Halluci- nation detection via activation tensors with ACT-ViT

Guy Bar-Shalom, Fabrizio Frasca, Yaniv Galron, Yftah Ziser, and Haggai Maron. Beyond token probes: Halluci- nation detection via activation tensors with ACT-ViT. InAd- vances in Neural Information Processing Systems 38, 2025. 3

work page 2025
[4]

Guy Bar-Shalom, Fabrizio Frasca, Derek Lim, Yoav Gel- berg, Yftah Ziser, Ran El-Yaniv, Gal Chechik, and Haggai Maron. Beyond next token probabilities: Learnable, fast detection of hallucinations and data contamination on LLM output distributions.Proceedings of the AAAI Conference on Artificial Intelligence, 40(36):30058–30066, 2026. 1

work page 2026
[5]

Addressing failure prediction by learning model confidence

Charles Corbi `ere, Nicolas Thom´e, Avner Bar-Hen, Matthieu Cord, and Patrick P ´erez. Addressing failure prediction by learning model confidence. InAdvances in Neural Informa- tion Processing Systems 32, pages 2902–2913, 2019. 1

work page 2019
[6]

ImageNet: A large-scale hierarchical im- age database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical im- age database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 2

work page 2009
[7]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representa- tions, 2021. 2

work page 2021
[8]

Leo Feng, Mohamed Osama Ahmed, Hossein Hajimir- sadeghi, and Amir H. Abdi. Towards better selective clas- sification. InInternational Conference on Learning Repre- sentations, 2023. 1

work page 2023
[9]

Dropout as a Bayesian approximation: Representing model uncertainty in deep learning

Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. InProceedings of the 33rd International Confer- ence on Machine Learning, pages 1050–1059. PMLR, 2016. 1

work page 2016
[10]

A data-driven measure of rel- ative uncertainty for misclassification detection

Eduardo Dadalto C ˆamara Gomes, Marco Romanelli, Georg Pichler, and Pablo Piantanida. A data-driven measure of rel- ative uncertainty for misclassification detection. InInterna- tional Conference on Learning Representations, 2024. 1

work page 2024
[11]

A baseline for detect- ing misclassified and out-of-distribution examples in neural networks

Dan Hendrycks and Kevin Gimpel. A baseline for detect- ing misclassified and out-of-distribution examples in neural networks. InInternational Conference on Learning Repre- sentations, 2017. 1

work page 2017
[12]

Shallow-deep networks: Understanding and mitigating net- work overthinking

Yigitcan Kaya, Sanghyun Hong, and Tudor Dumitras. Shallow-deep networks: Understanding and mitigating net- work overthinking. InProceedings of the 36th International Conference on Machine Learning, pages 3301–3310. PMLR,

work page
[13]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 2

work page 2009
[14]

Simple and scalable predictive uncertainty esti- mation using deep ensembles

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty esti- mation using deep ensembles. InAdvances in Neural Infor- mation Processing Systems 30, pages 6402–6413, 2017. 1

work page 2017
[15]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks

Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. InAdvances in Neural In- formation Processing Systems 31, pages 7167–7177, 2018. 1, 3

work page 2018
[16]

Selective classification under distribution shifts.Transactions on Machine Learning Research, 2024

Hengyue Liang, Le Peng, and Ju Sun. Selective classification under distribution shifts.Transactions on Machine Learning Research, 2024. 1

work page 2024
[17]

Energy-based out-of-distribution detection

Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. Energy-based out-of-distribution detection. InAdvances in Neural Information Processing Systems 33, pages 21464– 21475, 2020. 1, 3

work page 2020
[18]

LLMs know more than they show: On the intrinsic representation of LLM hallucinations

Hadas Orgad, Michael Toker, Zorik Gekhman, Roi Reichart, Idan Szpektor, Hadas Kotek, and Yonatan Belinkov. LLMs know more than they show: On the intrinsic representation of LLM hallucinations. InInternational Conference on Learn- ing Representations, 2025. 1

work page 2025
[19]

Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 40(6):1452–1464, 2018

Bolei Zhou, `Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 40(6):1452–1464, 2018. 2 A. Appendix A.1. Additional Cross-Dataset Results ACT-ViT mirrors the behavior of linear probing, demon- strating poor cross-data...

work page 2018

[1] [1]

Detecting adver- sarial examples and other misclassifications in neural net- works by introspection.arXiv preprint arXiv:1905.09186,

Jonathan Aigrain and Marcin Detyniecki. Detecting adver- sarial examples and other misclassifications in neural net- works by introspection.arXiv preprint arXiv:1905.09186,

work page arXiv 1905

[2] [2]

The internal state of an LLM knows when it’s lying

Amos Azaria and Tom Mitchell. The internal state of an LLM knows when it’s lying. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 967– 976, Singapore, 2023. Association for Computational Lin- guistics. 1

work page 2023

[3] [3]

Beyond token probes: Halluci- nation detection via activation tensors with ACT-ViT

Guy Bar-Shalom, Fabrizio Frasca, Yaniv Galron, Yftah Ziser, and Haggai Maron. Beyond token probes: Halluci- nation detection via activation tensors with ACT-ViT. InAd- vances in Neural Information Processing Systems 38, 2025. 3

work page 2025

[4] [4]

Guy Bar-Shalom, Fabrizio Frasca, Derek Lim, Yoav Gel- berg, Yftah Ziser, Ran El-Yaniv, Gal Chechik, and Haggai Maron. Beyond next token probabilities: Learnable, fast detection of hallucinations and data contamination on LLM output distributions.Proceedings of the AAAI Conference on Artificial Intelligence, 40(36):30058–30066, 2026. 1

work page 2026

[5] [5]

Addressing failure prediction by learning model confidence

Charles Corbi `ere, Nicolas Thom´e, Avner Bar-Hen, Matthieu Cord, and Patrick P ´erez. Addressing failure prediction by learning model confidence. InAdvances in Neural Informa- tion Processing Systems 32, pages 2902–2913, 2019. 1

work page 2019

[6] [6]

ImageNet: A large-scale hierarchical im- age database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical im- age database. In2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 2

work page 2009

[7] [7]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representa- tions, 2021. 2

work page 2021

[8] [8]

Leo Feng, Mohamed Osama Ahmed, Hossein Hajimir- sadeghi, and Amir H. Abdi. Towards better selective clas- sification. InInternational Conference on Learning Repre- sentations, 2023. 1

work page 2023

[9] [9]

Dropout as a Bayesian approximation: Representing model uncertainty in deep learning

Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. InProceedings of the 33rd International Confer- ence on Machine Learning, pages 1050–1059. PMLR, 2016. 1

work page 2016

[10] [10]

A data-driven measure of rel- ative uncertainty for misclassification detection

Eduardo Dadalto C ˆamara Gomes, Marco Romanelli, Georg Pichler, and Pablo Piantanida. A data-driven measure of rel- ative uncertainty for misclassification detection. InInterna- tional Conference on Learning Representations, 2024. 1

work page 2024

[11] [11]

A baseline for detect- ing misclassified and out-of-distribution examples in neural networks

Dan Hendrycks and Kevin Gimpel. A baseline for detect- ing misclassified and out-of-distribution examples in neural networks. InInternational Conference on Learning Repre- sentations, 2017. 1

work page 2017

[12] [12]

Shallow-deep networks: Understanding and mitigating net- work overthinking

Yigitcan Kaya, Sanghyun Hong, and Tudor Dumitras. Shallow-deep networks: Understanding and mitigating net- work overthinking. InProceedings of the 36th International Conference on Machine Learning, pages 3301–3310. PMLR,

work page

[13] [13]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. 2

work page 2009

[14] [14]

Simple and scalable predictive uncertainty esti- mation using deep ensembles

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty esti- mation using deep ensembles. InAdvances in Neural Infor- mation Processing Systems 30, pages 6402–6413, 2017. 1

work page 2017

[15] [15]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks

Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. InAdvances in Neural In- formation Processing Systems 31, pages 7167–7177, 2018. 1, 3

work page 2018

[16] [16]

Selective classification under distribution shifts.Transactions on Machine Learning Research, 2024

Hengyue Liang, Le Peng, and Ju Sun. Selective classification under distribution shifts.Transactions on Machine Learning Research, 2024. 1

work page 2024

[17] [17]

Energy-based out-of-distribution detection

Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. Energy-based out-of-distribution detection. InAdvances in Neural Information Processing Systems 33, pages 21464– 21475, 2020. 1, 3

work page 2020

[18] [18]

LLMs know more than they show: On the intrinsic representation of LLM hallucinations

Hadas Orgad, Michael Toker, Zorik Gekhman, Roi Reichart, Idan Szpektor, Hadas Kotek, and Yonatan Belinkov. LLMs know more than they show: On the intrinsic representation of LLM hallucinations. InInternational Conference on Learn- ing Representations, 2025. 1

work page 2025

[19] [19]

Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 40(6):1452–1464, 2018

Bolei Zhou, `Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 40(6):1452–1464, 2018. 2 A. Appendix A.1. Additional Cross-Dataset Results ACT-ViT mirrors the behavior of linear probing, demon- strating poor cross-data...

work page 2018