hub

Gomez and Lukasz Kaiser and Illia Polosukhin , editor =

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N · 2017

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

browse 13 citing papers

hub tools

JSON dossier citing papers JSON

representative citing papers

DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

DASH discovers stronger hybrid attention architectures for LLMs via minutes-scale differentiable search, outperforming selector baselines and Jet-Nemotron on RULER while using 0.006% of prior search tokens.

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.

LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

cs.AI · 2026-05-08 · conditional · novelty 7.0

LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.

Taming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generation

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

Variable codebook sizes that increase along the sequence in visual tokenizers reduce generation FID scores significantly for autoregressive models on ImageNet.

HORST: Composing Optimizer Geometries for Sparse Transformer Training

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

HORST uses non-commutative operator composition and a hyperbolic mirror map to combine stability from adaptive optimizers with L1 sparsity bias, outperforming AdamW across sparsity levels on vision and language tasks.

From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents

cs.CL · 2026-05-14 · unverdicted · novelty 6.0

A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.

R$^3$AG: Retriever Routing for Retrieval-Augmented Generation

cs.IR · 2026-04-22 · unverdicted · novelty 6.0

R³AG routes queries to retrievers by decomposing capabilities into retrieval quality and generation utility, trained via contrastive learning on document assessments and downstream answer correctness to outperform static methods.

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

cs.CL · 2021-09-02 · conditional · novelty 6.0

CodeT5 adds identifier-aware pre-training and bimodal dual generation to a T5-style encoder-decoder, yielding better results on defect detection, clone detection, and code-to-text, text-to-code, and code-to-code tasks than prior encoder-only or decoder-only models.

Adaptive Probe-based Steering for Robust LLM Jailbreaking

cs.CR · 2026-05-19 · unverdicted · novelty 5.0

Adaptive probe-based steering guided by model extraction and activation statistics improves LLM jailbreak success rates from 6% to 70% average harmfulness without extra contrastive prompts or manual tuning.

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

cs.CL · 2024-12-18 · unverdicted · novelty 5.0

ModernBERT is a new bidirectional encoder model achieving SOTA performance on diverse classification and retrieval benchmarks while offering superior speed and memory efficiency for long-context inference.

InternLM2 Technical Report

cs.CL · 2024-03-26 · unverdicted · novelty 5.0

InternLM2 is a new open-source LLM that outperforms prior versions on 30 benchmarks and long-context tasks through scaled pre-training to 32k tokens and a conditional online RLHF alignment strategy.

K-Quantization and its Impact on Output Performance

cs.CL · 2026-05-19 · unverdicted · novelty 3.0

Empirical evaluation of quantization effects on eight LLMs across bit widths, showing performance generally declines at lower precision but with model-size-dependent resilience and acceptable accuracy at 2 bits for many cases.

Skipping the Zeros in Diffusion Models for Sparse Data Generation

cs.LG · 2026-05-03

citing papers explorer

Showing 13 of 13 citing papers.

DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU cs.LG · 2026-05-20 · unverdicted · none · ref 2
DASH discovers stronger hybrid attention architectures for LLMs via minutes-scale differentiable search, outperforming selector baselines and Jet-Nemotron on RULER while using 0.006% of prior search tokens.
Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation cs.LG · 2026-05-18 · unverdicted · none · ref 69
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification cs.AI · 2026-05-08 · conditional · none · ref 97
LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
Taming the Entropy Cliff: Variable Codebook Size Quantization for Autoregressive Visual Generation cs.CV · 2026-05-07 · unverdicted · none · ref 1
Variable codebook sizes that increase along the sequence in visual tokenizers reduce generation FID scores significantly for autoregressive models on ImageNet.
HORST: Composing Optimizer Geometries for Sparse Transformer Training cs.LG · 2026-05-20 · unverdicted · none · ref 22
HORST uses non-commutative operator composition and a hyperbolic mirror map to combine stability from adaptive optimizers with L1 sparsity bias, outperforming AdamW across sparsity levels on vision and language tasks.
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents cs.CL · 2026-05-14 · unverdicted · none · ref 78
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
R$^3$AG: Retriever Routing for Retrieval-Augmented Generation cs.IR · 2026-04-22 · unverdicted · none · ref 38
R³AG routes queries to retrievers by decomposing capabilities into retrieval quality and generation utility, trained via contrastive learning on document assessments and downstream answer correctness to outperform static methods.
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation cs.CL · 2021-09-02 · conditional · none · ref 48
CodeT5 adds identifier-aware pre-training and bimodal dual generation to a T5-style encoder-decoder, yielding better results on defect detection, clone detection, and code-to-text, text-to-code, and code-to-code tasks than prior encoder-only or decoder-only models.
Adaptive Probe-based Steering for Robust LLM Jailbreaking cs.CR · 2026-05-19 · unverdicted · none · ref 6
Adaptive probe-based steering guided by model extraction and activation statistics improves LLM jailbreak success rates from 6% to 70% average harmfulness without extra contrastive prompts or manual tuning.
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference cs.CL · 2024-12-18 · unverdicted · none · ref 58
ModernBERT is a new bidirectional encoder model achieving SOTA performance on diverse classification and retrieval benchmarks while offering superior speed and memory efficiency for long-context inference.
InternLM2 Technical Report cs.CL · 2024-03-26 · unverdicted · none · ref 215
InternLM2 is a new open-source LLM that outperforms prior versions on 30 benchmarks and long-context tasks through scaled pre-training to 32k tokens and a conditional online RLHF alignment strategy.
K-Quantization and its Impact on Output Performance cs.CL · 2026-05-19 · unverdicted · none · ref 42
Empirical evaluation of quantization effects on eight LLMs across bit widths, showing performance generally declines at lower precision but with model-size-dependent resilience and acceptable accuracy at 2 bits for many cases.
Skipping the Zeros in Diffusion Models for Sparse Data Generation cs.LG · 2026-05-03 · unreviewed · ref 25

Gomez and Lukasz Kaiser and Illia Polosukhin , editor =

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer