pith. sign in

super hub Mixed citations

HuggingFace's Transformers: State-of-the-art Natural Language Processing

Mixed citation behavior. Most common role is background (54%).

104 Pith papers citing it
Background 54% of classified citations
abstract

Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. \textit{Transformers} is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered state-of-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made by and available for the community. \textit{Transformers} is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. The library is available at \url{https://github.com/huggingface/transformers}.

hub tools

citation-role summary

background 14 method 8 other 4

citation-polarity summary

claims ledger

  • abstract Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. \textit{Transformers} is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered state-of-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrain

authors

co-cited works

clear filters

representative citing papers

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

cs.AI · 2026-05-07 · unverdicted · novelty 8.0

VibeServe demonstrates that AI agents can synthesize bespoke LLM serving systems end-to-end, remaining competitive with vLLM in standard settings while outperforming it in six non-standard scenarios involving unusual models, workloads, or hardware.

Editing Models with Task Arithmetic

cs.LG · 2022-12-08 · accept · novelty 8.0

Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.

Discovering Latent Knowledge in Language Models Without Supervision

cs.CL · 2022-12-07 · conditional · novelty 8.0

An unsupervised technique extracts latent yes-no knowledge from language model activations by locating a direction that satisfies logical consistency properties, outperforming zero-shot accuracy by 4% on average across models and datasets.

Test-Time Training Undermines Safety Guardrails

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

Test-time training enables three new threat models that raise jailbreak attack success rates on language models to averages of 95% and 93% ASR@10 under LoRA for few-shot and generation-phase attacks across model families.

Interference-Aware Multi-Task Unlearning

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

Introduces interference-aware multi-task unlearning with task-aware gradient projection and instance-level gradient orthogonalization, reducing interference scores by 30.3% and 52.9% on vision benchmarks.

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

citing papers explorer

Showing 14 of 14 citing papers after filters.

  • Sieve: Dynamic Expert-Aware PIM Acceleration for Evolving Mixture-of-Experts Models cs.AR · 2026-05-11 · conditional · none · ref 54 · internal anchor

    Sieve dynamically schedules MoE experts across GPU and PIM hardware to handle bimodal token distributions, achieving 1.3x to 1.6x gains in throughput and interactivity over static prior PIM systems on three large models.

  • VibeServe: Can AI Agents Build Bespoke LLM Serving Systems? cs.AI · 2026-05-07 · unverdicted · none · ref 73 · internal anchor

    VibeServe demonstrates that AI agents can synthesize bespoke LLM serving systems end-to-end, remaining competitive with vLLM in standard settings while outperforming it in six non-standard scenarios involving unusual models, workloads, or hardware.

  • Editing Models with Task Arithmetic cs.LG · 2022-12-08 · accept · none · ref 104 · internal anchor

    Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.

  • QLoRA: Efficient Finetuning of Quantized LLMs cs.LG · 2023-05-23 · conditional · none · ref 64 · internal anchor

    QLoRA finetunes 4-bit quantized LLMs via LoRA adapters to match full-precision performance while using far less memory, enabling 65B-scale training on single GPUs and producing Guanaco models near ChatGPT level.

  • High-Resolution Image Synthesis with Latent Diffusion Models cs.CV · 2021-12-20 · conditional · none · ref 99 · internal anchor

    Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrained autoencoders with cross-attention conditioning, while cutting computational and

  • An Explanation of In-context Learning as Implicit Bayesian Inference cs.CL · 2021-11-03 · conditional · none · ref 40 · internal anchor

    In-context learning emerges as implicit Bayesian inference of latent concepts when pretraining data has long-range coherence, proven for mixture-of-HMM distributions and replicated on the synthetic GINC dataset.

  • Hopfield Networks is All You Need cs.NE · 2020-07-16 · unverdicted · none · ref 44 · internal anchor

    Modern Hopfield networks store exponentially many patterns, retrieve them in one update, and have an update rule equivalent to transformer attention, enabling new Hopfield layers that improve results on multiple instance learning and drug design tasks.

  • ModelLens: Finding the Best for Your Task from Myriads of Models cs.LG · 2026-05-08 · unverdicted · none · ref 3 · internal anchor

    ModelLens learns a performance-aware latent space from 1.62M leaderboard records to rank unseen models on unseen datasets without forward passes on the target.

  • Why Does Agentic Safety Fail to Generalize Across Tasks? cs.LG · 2026-05-07 · conditional · none · ref 114 · internal anchor

    Agentic safety fails to generalize across tasks because the task-to-safe-controller mapping has a higher Lipschitz constant than the task-to-controller mapping alone, as proven in linear-quadratic control and demonstrated in quadcopter and LLM experiments.

  • Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study cs.SE · 2026-04-27 · unverdicted · none · ref 45 · internal anchor

    Fine-tuning 7B code LLMs on a custom multi-file DSL dataset achieves structural fidelity of 1.00, high exact-match accuracy, and practical utility validated by expert survey and execution checks.

  • Beyond End-to-End: Dynamic Chain Optimization for Private LLM Adaptation on the Edge cs.DC · 2026-04-08 · unverdicted · none · ref 41 · internal anchor

    ChainFed achieves memory-efficient private LLM fine-tuning on edge devices through sequential layer-by-layer adapter training with dynamic co-tuning, perceptive optimization, and adaptive starting point selection, improving accuracy by up to 46.46%.

  • HybridFlow: A Flexible and Efficient RLHF Framework cs.LG · 2024-09-28 · unverdicted · none · ref 91 · internal anchor

    HybridFlow combines single- and multi-controller paradigms with a 3D-HybridEngine to deliver 1.53x to 20.57x higher throughput for various RLHF algorithms compared to prior systems.

  • OpenVLA: An Open-Source Vision-Language-Action Model cs.RO · 2024-06-13 · unverdicted · none · ref 22 · internal anchor

    OpenVLA achieves 16.5% higher task success than the 55B RT-2-X model across 29 tasks with 7x fewer parameters while enabling effective fine-tuning and quantization without performance loss.

  • Towards Better Static Code Analysis Reports: Sentence Transformer-based Filtering of Non-Actionable Alerts cs.SE · 2026-04-20 · conditional · none · ref 45 · internal anchor

    STAF applies sentence embeddings from transformers to classify SCA findings, reaching 89% F1 and beating prior filters by 11% within projects and 6% across projects.