pith. sign in

arxiv: 2411.15102 · v3 · pith:QXGEYKZOnew · submitted 2024-11-22 · 💻 cs.LG

AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution

classification 💻 cs.LG
keywords contextattributionattribotmodelscomputingefficienterrorlarge
0
0 comments X
read the original abstract

The influence of contextual input on the behavior of large language models (LLMs) has prompted the development of context attribution methods that aim to quantify each context span's effect on an LLM's generations. The leave-one-out (LOO) error, which measures the change in the likelihood of the LLM's response when a given span of the context is removed, provides a principled way to perform context attribution, but can be prohibitively expensive to compute for large models. In this work, we introduce AttriBoT, a series of novel techniques for efficiently computing an approximation of the LOO error for context attribution. Specifically, AttriBoT uses cached activations to avoid redundant operations, performs hierarchical attribution to reduce computation, and emulates the behavior of large target models with smaller proxy models. Taken together, AttriBoT can provide a >300x speedup while remaining more faithful to a target model's LOO error than prior context attribution methods. This stark increase in performance makes computing context attributions for a given response 30x faster than generating the response itself, empowering real-world applications that require computing attributions at scale. We release a user-friendly and efficient implementation of AttriBoT to enable efficient LLM interpretability as well as encourage future development of efficient context attribution methods.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. In-Context Credit Assignment via the Core

    cs.GT 2026-05 unverdicted novelty 7.0

    Algorithms based on the least core approximate stable credit assignments for AI-generated content using orders of magnitude fewer LLM calls than alternatives.

  2. RREDCoT: Segment-Level Reward Redistribution for Reasoning Models

    cs.LG 2026-06 unverdicted novelty 5.0

    RREDCoT approximates segment-level reward redistribution for CoT traces by querying the model itself, offering a lower-cost alternative to Monte Carlo credit assignment in reasoning-model RL.