Title resolution pending

Attention is all you need , author=

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

SimCSE: Simple Contrastive Learning of Sentence Embeddings

cs.CL · 2021-04-18 · conditional · novelty 8.0

SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.

Self-Improvement for Fast, High-Quality Plan Generation

cs.AI · 2026-05-05 · unverdicted · novelty 7.0

Self-improvement of a decoder-only transformer yields plans averaging 30% shorter than a source symbolic planner, over 80% optimal where known, with sub-exponential latency scaling.

LIMSSR: LLM-Driven Sequence-to-Score Reasoning under Training-Time Incomplete Multimodal Observations

cs.CV · 2026-05-01 · unverdicted · novelty 7.0

LIMSSR reformulates incomplete multimodal learning as LLM-driven sequence-to-score reasoning with prompt-guided imputation and mask-aware aggregation, outperforming baselines on action quality assessment without complete training data.

Hyperspherical Forward-Forward with Prototypical Representations

cs.LG · 2026-04-30 · unverdicted · novelty 7.0

HFF replaces binary goodness-of-fit in Forward-Forward with hyperspherical prototypes for direct multi-class decisions, enabling single-forward-pass inference and training that scales to ImageNet while closing much of the gap to backpropagation.

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

cs.CV · 2023-10-09 · unverdicted · novelty 7.0

A new shared video-image tokenizer enables large language models to surpass diffusion models on standard visual generation benchmarks.

Beyond Normal References: Discriminative Few-Shot Anomaly Detection

cs.CV · 2026-05-22 · unverdicted · novelty 6.0

IDEAL learns intrinsic deviation vectors via Normal Variation Eraser and Intrinsic Deviation Encoder to score query deviations for both seen and unseen anomalies in discriminative FSAD.

QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.

Prototype-Based Test-Time Adaptation of Vision-Language Models

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

PTA adapts VLMs at test time by maintaining and updating class-specific knowledge prototypes from test samples, achieving higher accuracy than cache-based methods with far less speed loss.

PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

cs.CV · 2023-09-30 · accept · novelty 6.0

PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.

Vision Transformers Need Registers

cs.CV · 2023-09-28 · unverdicted · novelty 6.0

Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.

The False Promise of Imitating Proprietary LLMs

cs.CL · 2023-05-25 · conditional · novelty 6.0

Finetuning open LMs on ChatGPT outputs creates models that mimic style and fool human raters but fail to close the performance gap to proprietary systems on tasks not well-represented in the imitation data.

SURGE: Surrogate Gradient Adaptation in Binary Neural Networks

cs.LG · 2026-05-09 · 2 refs

citing papers explorer

Showing 12 of 12 citing papers.

SimCSE: Simple Contrastive Learning of Sentence Embeddings cs.CL · 2021-04-18 · conditional · none · ref 129
SimCSE achieves 76.3% unsupervised and 81.6% supervised Spearman's correlation on STS tasks with BERT-base, improving prior best results by 4.2% and 2.2% via simple contrastive learning.
Self-Improvement for Fast, High-Quality Plan Generation cs.AI · 2026-05-05 · unverdicted · none · ref 19
Self-improvement of a decoder-only transformer yields plans averaging 30% shorter than a source symbolic planner, over 80% optimal where known, with sub-exponential latency scaling.
LIMSSR: LLM-Driven Sequence-to-Score Reasoning under Training-Time Incomplete Multimodal Observations cs.CV · 2026-05-01 · unverdicted · none · ref 12
LIMSSR reformulates incomplete multimodal learning as LLM-driven sequence-to-score reasoning with prompt-guided imputation and mask-aware aggregation, outperforming baselines on action quality assessment without complete training data.
Hyperspherical Forward-Forward with Prototypical Representations cs.LG · 2026-04-30 · unverdicted · none · ref 39
HFF replaces binary goodness-of-fit in Forward-Forward with hyperspherical prototypes for direct multi-class decisions, enabling single-forward-pass inference and training that scales to ImageNet while closing much of the gap to backpropagation.
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation cs.CV · 2023-10-09 · unverdicted · none · ref 126
A new shared video-image tokenizer enables large language models to surpass diffusion models on standard visual generation benchmarks.
Beyond Normal References: Discriminative Few-Shot Anomaly Detection cs.CV · 2026-05-22 · unverdicted · none · ref 98
IDEAL learns intrinsic deviation vectors via Normal Variation Eraser and Intrinsic Deviation Encoder to score query deviations for both seen and unseen anomalies in discriminative FSAD.
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL cs.LG · 2026-05-03 · unverdicted · none · ref 292
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
Prototype-Based Test-Time Adaptation of Vision-Language Models cs.CV · 2026-04-23 · unverdicted · none · ref 43
PTA adapts VLMs at test time by maintaining and updating class-specific knowledge prototypes from test samples, achieving higher accuracy than cache-based methods with far less speed loss.
PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis cs.CV · 2023-09-30 · accept · none · ref 121
PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.
Vision Transformers Need Registers cs.CV · 2023-09-28 · unverdicted · none · ref 151
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
The False Promise of Imitating Proprietary LLMs cs.CL · 2023-05-25 · conditional · none · ref 111
Finetuning open LMs on ChatGPT outputs creates models that mimic style and fool human raters but fail to close the performance gap to proprietary systems on tasks not well-represented in the imitation data.
SURGE: Surrogate Gradient Adaptation in Binary Neural Networks cs.LG · 2026-05-09 · unreviewed · ref 79 · 2 links

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer