super hub Mixed citations

HuggingFace's Transformers: State-of-the-art Natural Language Processing

Anthony Moi, Clement Delangue, Julien Chaumond, Lysandre Debut, Thomas Wolf, Victor Sanh · 2019 · cs.CL · arXiv 1910.03771

Mixed citation behavior. Most common role is background (54%).

104 Pith papers citing it

Background 54% of classified citations

open full Pith review browse 104 citing papers more from Anthony Moi arXiv PDF

abstract

Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. \textit{Transformers} is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered state-of-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made by and available for the community. \textit{Transformers} is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. The library is available at \url{https://github.com/huggingface/transformers}.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 14 method 8 other 4

citation-polarity summary

background 14 use method 8 unclear 4

claims ledger

abstract Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. \textit{Transformers} is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered state-of-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrain

authors

Anthony Moi Clement Delangue Julien Chaumond Lysandre Debut Thomas Wolf Victor Sanh

co-cited works

representative citing papers

EnergyAgentBench: Benchmarking LLM Agents on Live Energy Infrastructure Data

econ.EM · 2026-05-13 · accept · novelty 8.0

EnergyAgentBench is a new benchmark with 70 task variants that evaluates LLM agents on live energy data for datacenter siting, long-horizon optimization, and causal grid diagnosis.

Sieve: Dynamic Expert-Aware PIM Acceleration for Evolving Mixture-of-Experts Models

cs.AR · 2026-05-11 · conditional · novelty 8.0

Sieve dynamically schedules MoE experts across GPU and PIM hardware to handle bimodal token distributions, achieving 1.3x to 1.6x gains in throughput and interactivity over static prior PIM systems on three large models.

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

cs.AI · 2026-05-07 · unverdicted · novelty 8.0

VibeServe demonstrates that AI agents can synthesize bespoke LLM serving systems end-to-end, remaining competitive with vLLM in standard settings while outperforming it in six non-standard scenarios involving unusual models, workloads, or hardware.

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

cs.LG · 2024-07-05 · conditional · novelty 8.0

TTT layers treat the hidden state as a trainable model updated at test time, allowing linear-complexity sequence models to scale perplexity reduction with context length unlike Mamba.

RULER: What's the Real Context Size of Your Long-Context Language Models?

cs.CL · 2024-04-09 · accept · novelty 8.0

RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

cs.CL · 2023-04-03 · accept · novelty 8.0

Pythia releases 16 identically trained LLMs with full checkpoints and data tools to study training dynamics, scaling, memorization, and bias in language models.

Editing Models with Task Arithmetic

cs.LG · 2022-12-08 · accept · novelty 8.0

Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.

Discovering Latent Knowledge in Language Models Without Supervision

cs.CL · 2022-12-07 · conditional · novelty 8.0

An unsupervised technique extracts latent yes-no knowledge from language model activations by locating a direction that satisfies logical consistency properties, outperforming zero-shot accuracy by 4% on average across models and datasets.

Test-Time Training Undermines Safety Guardrails

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

Test-time training enables three new threat models that raise jailbreak attack success rates on language models to averages of 95% and 93% ASR@10 under LoRA for few-shot and generation-phase attacks across model families.

Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs

cs.CV · 2026-05-21 · conditional · novelty 7.0

Video-LLMs exhibit directional motion blindness from a direction binding gap; DeltaDirect projector objective lifts synthetic accuracy to 85.4% and real accuracy by 21.9 points while preserving other video capabilities.

Interference-Aware Multi-Task Unlearning

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

Introduces interference-aware multi-task unlearning with task-aware gradient projection and instance-level gradient orthogonalization, reducing interference scores by 30.3% and 52.9% on vision benchmarks.

TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.

EdgeFlowerTune: Evaluating Federated LLM Fine-Tuning Under Realistic Edge System Constraints

cs.CL · 2026-05-09 · unverdicted · novelty 7.0

EdgeFlowerTune is a real-device benchmark that jointly assesses model quality and system costs for federated LLM fine-tuning on edge hardware using three protocols: Quality-under-Budget, Cost-to-Target, and Robustness.

How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

DAPRO provides the first dynamic, theoretically guaranteed way to allocate interaction budgets across test cases for bounding time-to-event in multi-turn LLM evaluations, achieving tighter coverage than static conformal survival methods.

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.

Auto-FlexSwitch: Efficient Dynamic Model Merging via Learnable Task Vector Compression

cs.LG · 2026-04-30 · unverdicted · novelty 7.0

Auto-FlexSwitch achieves efficient dynamic model merging by decomposing task vectors into sparse masks, signs, and scalars, then making the compression learnable via gating and adaptive bit selection with KNN-based retrieval.

SecureRouter: Encrypted Routing for Efficient Secure Inference

cs.CR · 2026-04-16 · unverdicted · novelty 7.0

SecureRouter accelerates secure transformer inference by 1.95x via an encrypted router that selects input-adaptive models from an MPC-optimized pool with negligible accuracy loss.

Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.

VertAX: a differentiable vertex model for learning epithelial tissue mechanics

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

VertAX supplies a differentiable JAX implementation of vertex models for confluent epithelia that enables forward simulation, mechanical parameter inference, and inverse design of tissue-scale behaviors.

GhostServe: A Lightweight Checkpointing System in the Shadow for Fault-Tolerant LLM Serving

cs.DC · 2026-03-26 · unverdicted · novelty 7.0

GhostServe applies erasure coding to KV cache in host memory for fast recovery from failures in LLM serving, cutting checkpointing latency up to 2.7x and recovery latency 2.1x versus prior methods.

Pimp My LLM: Leveraging Variability Modeling to Tune Inference Hyperparameters

cs.LG · 2026-02-06 · unverdicted · novelty 7.0

Variability modeling from software engineering enables systematic sampling, measurement, and prediction of LLM inference configurations for energy, latency, and accuracy trade-offs.

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

cs.CV · 2025-12-18 · unverdicted · novelty 7.0

4D-RGPT uses perceptual 4D distillation to boost region-level 4D perception in multimodal LLMs and reports gains on existing and new video QA benchmarks.

LoRA on the Go: Instance-level Dynamic LoRA Selection and Merging

cs.CL · 2025-11-10 · unverdicted · novelty 7.0

LoGo is a training-free framework that dynamically selects and merges LoRA adapters at the instance level using signals from a single forward pass to handle diverse tasks.

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

citing papers explorer

Showing 12 of 12 citing papers after filters.

Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs cs.CV · 2026-05-21 · conditional · none · ref 67 · internal anchor
Video-LLMs exhibit directional motion blindness from a direction binding gap; DeltaDirect projector objective lifts synthetic accuracy to 85.4% and real accuracy by 21.9 points while preserving other video capabilities.
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation cs.CV · 2025-12-18 · unverdicted · none · ref 80 · internal anchor
4D-RGPT uses perceptual 4D distillation to boost region-level 4D perception in multimodal LLMs and reports gains on existing and new video QA benchmarks.
High-Resolution Image Synthesis with Latent Diffusion Models cs.CV · 2021-12-20 · conditional · none · ref 99 · internal anchor
Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and
Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex cs.CV · 2026-05-15 · unverdicted · none · ref 68 · internal anchor
MINE uses mechanistic interpretability on language-aligned image representations to generate per-voxel feature descriptions, validated via image generation and counterfactual edits that causally shift brain activation.
BAMI: Training-Free Bias Mitigation in GUI Grounding cs.CV · 2026-05-07 · unverdicted · none · ref 30 · internal anchor
BAMI mitigates precision and ambiguity biases in GUI grounding via coarse-to-fine focus and candidate selection, raising accuracy on ScreenSpot-Pro without training.
R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs cs.CV · 2026-04-22 · conditional · none · ref 48 · internal anchor
R-CoV is a six-step region-aware chain-of-verification technique that elicits coordinate and description outputs from LVLMs themselves to detect and reduce object hallucinations without external models or retraining.
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection cs.CV · 2023-03-09 · accept · none · ref 49 · internal anchor
Grounding DINO fuses language and vision via feature enhancer, language-guided query selection, and cross-modality decoder in a DINO backbone, achieving 52.5 AP zero-shot on COCO and a new record of 26.1 AP mean on ODinW.
Reconstruction of a 3D wireframe from a single line drawing via generative depth estimation cs.CV · 2026-04-15 · unverdicted · none · ref 30 · internal anchor
A latent diffusion model conditioned on line drawings estimates dense depth to reconstruct 3D wireframes, reporting 5.3% average depth error after training on over one million pairs.
Automated Annotation of Shearographic Measurements Enabling Weakly Supervised Defect Detection cs.CV · 2025-12-05 · conditional · none · ref 33 · internal anchor
An automated annotation pipeline combining Grounded DINO and SAM produces usable bounding boxes and masks for weakly supervised defect detection in shearography.
Diffusion Models are Secretly Zero-Shot 3DGS Harmonizers cs.CV · 2025-03-09 · unverdicted · none · ref 34 · internal anchor
D3DR optimizes inserted 3DGS objects with a DDS-inspired diffusion objective plus a new personalization step to match scene lighting, reporting 2 dB PSNR gain over prior methods.
Are vision-language models ready to zero-shot replace supervised classification models in agriculture? cs.CV · 2025-12-17 · unverdicted · none · ref 1 · internal anchor
Zero-shot VLMs reach at most 62% accuracy on agricultural classification tasks while supervised models like YOLO11 perform markedly higher, indicating they are not ready to replace task-specific systems.
Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation cs.CV · 2026-04-02 · unreviewed · ref 48 · internal anchor

HuggingFace's Transformers: State-of-the-art Natural Language Processing

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer