pith. sign in

In-context convergence of transformers.arXiv preprint arXiv:2310.05249

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 4 2025 1

verdicts

UNVERDICTED 5

roles

background 1

polarities

background 1

representative citing papers

Learning to Adapt: In-Context Learning Beyond Stationarity

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

Gated linear attention enables lower training and test errors in non-stationary in-context learning by adaptively modulating past inputs through a learnable recency bias under an autoregressive model of task evolution.

Provable Knowledge Acquisition and Extraction in One-Layer Transformers

cs.LG · 2025-07-28 · unverdicted · novelty 6.0

In a stylized one-layer transformer, pre-training encodes factual knowledge via relation-specific feature directions and attention patterns; fine-tuning extracts it through a relation-covering mechanism that succeeds when enough latent templates are triggered, with a failure regime explaining inauds

Radiomics-Guided Vision Transformers for Survival Analysis

physics.med-ph · 2026-04-22 · unverdicted · novelty 5.0

A radiomics-guided hybrid Vision Transformer integrates pixel embeddings with interpretable radiomic features in a multimodal Cox model for survival analysis, yielding competitive discrimination and clinically meaningful attention maps on COVID-19 chest X-ray data.

citing papers explorer

Showing 5 of 5 citing papers.

  • Understanding and Improving Continuous Adversarial Training for LLMs via In-context Learning Theory cs.LG · 2026-04-14 · unverdicted · none · ref 10

    Continuous adversarial training in the embedding space produces a robust generalization bound for linear transformers that decreases with perturbation radius, tied to singular values of the embedding matrix, and motivates a new regularizer that improves real LLM jailbreak robustness-utility tradeoff

  • Learning to Adapt: In-Context Learning Beyond Stationarity cs.LG · 2026-04-13 · unverdicted · none · ref 20

    Gated linear attention enables lower training and test errors in non-stationary in-context learning by adaptively modulating past inputs through a learnable recency bias under an autoregressive model of task evolution.

  • Visual prompting reimagined: The power of the Activation Prompts cs.CV · 2026-04-07 · unverdicted · none · ref 63

    Activation prompts on intermediate layers outperform input-level visual prompting and parameter-efficient fine-tuning in accuracy and efficiency across 29 datasets.

  • Provable Knowledge Acquisition and Extraction in One-Layer Transformers cs.LG · 2025-07-28 · unverdicted · none · ref 14

    In a stylized one-layer transformer, pre-training encodes factual knowledge via relation-specific feature directions and attention patterns; fine-tuning extracts it through a relation-covering mechanism that succeeds when enough latent templates are triggered, with a failure regime explaining inauds

  • Radiomics-Guided Vision Transformers for Survival Analysis physics.med-ph · 2026-04-22 · unverdicted · none · ref 13

    A radiomics-guided hybrid Vision Transformer integrates pixel embeddings with interpretable radiomic features in a multimodal Cox model for survival analysis, yielding competitive discrimination and clinically meaningful attention maps on COVID-19 chest X-ray data.