pith. sign in

super hub Mixed citations

HuggingFace's Transformers: State-of-the-art Natural Language Processing

Mixed citation behavior. Most common role is background (54%).

111 Pith papers citing it
Background 54% of classified citations
abstract

Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. \textit{Transformers} is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered state-of-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made by and available for the community. \textit{Transformers} is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. The library is available at \url{https://github.com/huggingface/transformers}.

hub tools

citation-role summary

background 14 method 8 other 4

citation-polarity summary

claims ledger

  • abstract Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. \textit{Transformers} is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered state-of-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrain

authors

co-cited works

clear filters

representative citing papers

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

cs.AI · 2026-05-07 · unverdicted · novelty 8.0

VibeServe demonstrates that AI agents can synthesize bespoke LLM serving systems end-to-end, remaining competitive with vLLM in standard settings while outperforming it in six non-standard scenarios involving unusual models, workloads, or hardware.

Editing Models with Task Arithmetic

cs.LG · 2022-12-08 · accept · novelty 8.0

Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.

Discovering Latent Knowledge in Language Models Without Supervision

cs.CL · 2022-12-07 · conditional · novelty 8.0

An unsupervised technique extracts latent yes-no knowledge from language model activations by locating a direction that satisfies logical consistency properties, outperforming zero-shot accuracy by 4% on average across models and datasets.

Test-Time Training Undermines Safety Guardrails

cs.LG · 2026-05-21 · unverdicted · novelty 7.0

Test-time training enables three new threat models that raise jailbreak attack success rates on language models to averages of 95% and 93% ASR@10 under LoRA for few-shot and generation-phase attacks across model families.

Interference-Aware Multi-Task Unlearning

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

Introduces interference-aware multi-task unlearning with task-aware gradient projection and instance-level gradient orthogonalization, reducing interference scores by 30.3% and 52.9% on vision benchmarks.

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

citing papers explorer

Showing 9 of 9 citing papers after filters.

  • Learning to (Learn at Test Time): RNNs with Expressive Hidden States cs.LG · 2024-07-05 · conditional · none · ref 80 · internal anchor

    TTT layers treat the hidden state as a trainable model updated at test time, allowing linear-complexity sequence models to scale perplexity reduction with context length unlike Mamba.

  • RULER: What's the Real Context Size of Your Long-Context Language Models? cs.CL · 2024-04-09 · accept · none · ref 34 · internal anchor

    RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.

  • Massive Activations in Large Language Models cs.CL · 2024-02-27 · unverdicted · none · ref 92 · internal anchor

    Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

  • Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction cs.CL · 2024-12-05 · conditional · none · ref 110 · internal anchor

    Aguvis presents a pure vision-based framework for autonomous GUI agents using structured reasoning via inner monologue, a new multimodal dataset, and two-stage training to reach SOTA on offline and online benchmarks.

  • HybridFlow: A Flexible and Efficient RLHF Framework cs.LG · 2024-09-28 · unverdicted · none · ref 91 · internal anchor

    HybridFlow combines single- and multi-controller paradigms with a 3D-HybridEngine to deliver 1.53x to 20.57x higher throughput for various RLHF algorithms compared to prior systems.

  • OpenVLA: An Open-Source Vision-Language-Action Model cs.RO · 2024-06-13 · unverdicted · none · ref 22 · internal anchor

    OpenVLA achieves 16.5% higher task success than the 55B RT-2-X model across 29 tasks with 7x fewer parameters while enabling effective fine-tuning and quantization without performance loss.

  • Crystallizing Schemas with Teleoscope: Thematic Curation of Large Text Corpora on Reddit cs.HC · 2024-02-09 · unverdicted · none · ref 65 · internal anchor

    Teleoscope enables thematic curation of large Reddit corpora via interactive refinement, with three deployments indicating benefits in serendipitous keyword discovery, search saturation confidence, and collaborative curation discussions.

  • The Platonic Representation Hypothesis cs.LG · 2024-05-13 · unverdicted · none · ref 118 · internal anchor

    Representations learned by large AI models are converging toward a shared statistical model of reality.

  • Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities cs.LG · 2024-08-14 · accept · none · ref 251 · internal anchor

    The paper introduces a new taxonomy for model merging methods and reviews their applications in LLMs, MLLMs, continual learning, multi-task learning, and other subfields while outlining open challenges.