pith. sign in

arxiv: 2605.22372 · v1 · pith:K5HC64CCnew · submitted 2026-05-21 · 💻 cs.LG

ASAP: Attention Sink Anchored Pruning

Pith reviewed 2026-05-22 08:07 UTC · model grok-4.3

classification 💻 cs.LG
keywords vision transformerstoken pruningattention sinkrandom walkdiffusion distancemodel efficiencyinference accelerationvisual recognition
0
0 comments X

The pith

Modeling Vision Transformer token flow as a lazy random walk lets pruning anchor to the attention sink and accelerate inference up to 48 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Vision Transformers suffer from slow computation at high resolutions because self-attention scales quadratically with the number of tokens. Current pruning approaches fail because they rely on attention scores from one layer, which tend to keep useless background tokens due to the attention sink effect. This paper shows that treating the entire information flow as a lazy random walk reveals the sink as a central point, and measuring how far each token diffuses from it separates useful from redundant tokens. Pruning based on this separation speeds up the model substantially on image, video, and vision-language tasks while accuracy stays the same or gets better.

Core claim

The central discovery is that the attention sink can be turned into an asset for pruning by modeling the ViT as a lazy random walk on tokens. The sink accumulates most of the probability mass in the cumulative transition matrix, so the diffusion distance from this sink within that matrix identifies which tokens carry foreground information and which are background redundancy. Radial Diffusion Clustering then groups tokens by this distance, and Transition Weight Pooling merges the redundant ones, all in a single training-free step.

What carries the argument

The lazy random walk on the attention graph, where the attention sink acts as the main probability accumulator and diffusion distance to it determines token importance for pruning.

Load-bearing premise

That the attention sink reliably collects the bulk of the probability mass in a lazy random walk model of token interactions, making distance to it a good way to tell important tokens from compressible ones.

What would settle it

A test where tokens with large diffusion distance to the sink are pruned and the resulting model shows a bigger accuracy drop on a standard benchmark than a competing method using direct attention scores.

Figures

Figures reproduced from arXiv: 2605.22372 by Donghun Lee, Hanyoung Kim, Jaehyuk Lee, Yanggee Kim.

Figure 1
Figure 1. Figure 1: Visual comparison of token reduction under a fixed budget. Local cosine similarity [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of ASAP. (a) Lazy Random Walk models ViT information flow for stable [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative results across different backbones (DeiT-Base, ViT-AugReg, LV-ViT-S). Our method consistently preserves foreground objects across diverse architectures and token densities. D(xi , xs) = [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Necessity of cumulative attention. While our full framework (W/ Markov Chain) success [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results on Kinetics-400 using CLIP ViT. (Top) Original sequence. (Middle) The [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Accuracy–FLOPs trade￾off on DeiT-Base for varying K and τ . The red circle marks the selected operating point (K=6, τ=7). Hyperparameter Sensitivity. ASAP introduces two primary hyperparameters: the cluster count K and the sink detection threshold τ . (We fix α = 0.5 following the convention of Attention Rollout [6]; sensitivity analysis for α is provided in Appendix J.) [PITH_FULL_IMAGE:figures/full_fig_… view at source ↗
Figure 7
Figure 7. Figure 7: Sink emergence dynamics for DeiT-Base (L=12, N=197) and CLIP ViT-Large (L=24, [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Accuracy–FLOPs trade-off on ViT-AugReg for varying [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative analysis of hallucination suppression on POPE. Each row shows the input image, [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative results on DeiT-Base L.2 ViT-AugReg [PITH_FULL_IMAGE:figures/full_fig_p025_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative results on ViT-AugReg 25 [PITH_FULL_IMAGE:figures/full_fig_p025_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Qualitative results on LV-ViT-S M Random Anchor Qualitative Results [PITH_FULL_IMAGE:figures/full_fig_p026_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Negative case of random anchor selection. When the anchor is inadvertently assigned [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Positive case of random anchor selection. When the randomly selected anchor fortuitously [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗
read the original abstract

Vision Transformers (ViTs) face severe computational bottlenecks due to the quadratic complexity of self-attention at high resolutions. Existing token reduction methods rely on local metrics - such as single-layer attention scores - that are inherently vulnerable to the attention sink phenomenon, where uninformative tokens are paradoxically preserved over salient foreground objects. We propose ASAP (Attention Sink Anchored Pruning), a training-free framework that recasts this sink as a feature. Modeling ViT information flow as a Lazy Random Walk, ASAP identifies the sink as a dominant accumulator of probability mass. By computing the diffusion distance to the sink within the cumulative transition matrix, ASAP partitions tokens via Radial Diffusion Clustering and compresses background redundancy through Transition Weight Pooling in a single shot. Extensive experiments across image, video, and vision-language tasks demonstrate ASAP outperforms state-of-the-art methods, accelerating throughput by up to 48% while maintaining - or even exceeding - baseline accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims to introduce ASAP (Attention Sink Anchored Pruning), a training-free framework for token reduction in Vision Transformers. By modeling ViT information flow as a Lazy Random Walk, it identifies the attention sink as a dominant accumulator of probability mass using diffusion distance in the cumulative transition matrix. Tokens are partitioned via Radial Diffusion Clustering and background redundancy is compressed through Transition Weight Pooling. Extensive experiments on image, video, and vision-language tasks are said to show that ASAP outperforms state-of-the-art methods, with throughput acceleration up to 48% while maintaining or exceeding baseline accuracy.

Significance. If the results hold, this work could advance token pruning techniques by turning the attention sink phenomenon into an advantage rather than a liability. The training-free aspect and application across multiple modalities are notable strengths. However, the soundness of the lazy random walk modeling is central to the claims, and without detailed verification, the significance remains conditional on resolving the identified modeling concerns.

major comments (2)
  1. Abstract: The abstract asserts outperformance and 48% throughput gain but supplies no quantitative tables, error bars, ablation details, or exact definitions of the cumulative transition matrix and radial clustering; central empirical claim cannot be verified from the given text alone.
  2. Lazy Random Walk modeling: The lazy-random-walk modeling is presented as a way to justify using the sink as anchor, yet no equations show whether the diffusion distance is derived independently or simply restates attention scores under a new name; this creates moderate risk that the claimed advantage is definitional rather than substantive.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive comments on our paper 'ASAP: Attention Sink Anchored Pruning'. We address each major comment in detail below and outline the revisions we plan to make.

read point-by-point responses
  1. Referee: Abstract: The abstract asserts outperformance and 48% throughput gain but supplies no quantitative tables, error bars, ablation details, or exact definitions of the cumulative transition matrix and radial clustering; central empirical claim cannot be verified from the given text alone.

    Authors: We acknowledge that the abstract, due to its brevity, does not include tables or detailed definitions. The full manuscript provides these in Sections 4 and 3, respectively, with quantitative results in Tables 1-4 showing comparisons, including standard deviations where relevant, and ablations in Section 4.3. The cumulative transition matrix is defined in Equation (2) as the product of per-layer transition matrices, and radial diffusion clustering is detailed in Algorithm 1. To improve clarity, we will revise the abstract to briefly reference the key performance metrics and direct readers to the relevant sections for definitions and details. revision: partial

  2. Referee: Lazy Random Walk modeling: The lazy-random-walk modeling is presented as a way to justify using the sink as anchor, yet no equations show whether the diffusion distance is derived independently or simply restates attention scores under a new name; this creates moderate risk that the claimed advantage is definitional rather than substantive.

    Authors: The lazy random walk is not merely a renaming of attention scores. We model the information flow with a transition matrix that includes a laziness factor to account for the sink's accumulation of probability mass over multiple layers, as described in Section 3.1. The diffusion distance is then computed using the cumulative transition matrix raised to power t, which integrates information across layers. This is distinct from single-layer attention. We will add explicit equations in the revised manuscript (e.g., expanding Equation (1) to show the lazy transition P = (1 - alpha)W + alpha I, where W is the normalized attention, and the diffusion distance d(i,j) = || (P^t)_i - (P^t)_j ||) to demonstrate the independent derivation and its advantages over local metrics. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the ASAP derivation chain

full rationale

The paper presents the Lazy Random Walk modeling of ViT information flow as an interpretive framework to recast the attention sink as an anchor, followed by explicit construction of a cumulative transition matrix, diffusion distance computation, Radial Diffusion Clustering, and Transition Weight Pooling. These steps are introduced as new operations rather than reductions of existing quantities by definition or self-citation. No equations in the provided text show a fitted parameter or attention score being renamed as a 'prediction' or 'derived distance.' The central claims rest on this modeling choice plus empirical results across tasks, making the derivation self-contained against external benchmarks with independent content.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on treating ViT attention as a lazy random walk whose cumulative matrix yields a meaningful diffusion distance to the sink; full paper would likely introduce additional clustering and pooling parameters whose values are not visible in the abstract.

free parameters (1)
  • number of diffusion steps or cluster radius
    Required for radial diffusion clustering and cumulative matrix construction; value not stated in abstract.
axioms (1)
  • domain assumption ViT token interactions can be faithfully modeled as a lazy random walk on the attention graph
    Invoked to identify the sink as dominant probability-mass accumulator and to define diffusion distance.

pith-pipeline@v0.9.0 · 5688 in / 1477 out tokens · 39516 ms · 2026-05-22T08:07:41.046874+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 5 internal anchors

  1. [1]

    Token merging: Your vit but faster

    Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Christoph Feichtenhofer, and Judy Hoffman. Token merging: Your vit but faster. InThe Eleventh International Conference on Learning Representations, 2023

  2. [2]

    An image is worth 1/2 tokens after layer 2: Plug-and-play inference acceleration for large vision-language models

    Liang Chen, Haozhe Zhao, Tianyu Liu, Shuai Bai, Junyang Lin, Chang Zhou, and Baobao Chang. An image is worth 1/2 tokens after layer 2: Plug-and-play inference acceleration for large vision-language models. InEuropean Conference on Computer Vision, pages 19–35. Springer, 2024

  3. [3]

    Ppt: Token pruning and pooling for efficient vision transformers.arXiv preprint arXiv:2310.01812, 2023

    Xinjian Wu, Fanhu Zeng, Xiudong Wang, and Xinghao Chen. Ppt: Token pruning and pooling for efficient vision transformers.arXiv preprint arXiv:2310.01812, 2023

  4. [4]

    Beyond text-visual attention: Exploiting visual cues for effective token pruning in vlms

    Qizhe Zhang, Aosong Cheng, Ming Lu, Renrui Zhang, Zhiyong Zhuo, Jiajun Cao, Shaobo Guo, Qi She, and Shanghang Zhang. Beyond text-visual attention: Exploiting visual cues for effective token pruning in vlms. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20857–20867, 2025

  5. [5]

    Zero-tprune: Zero-shot token pruning through leveraging of the attention graph in pre-trained transformers

    Hongjie Wang, Bhishma Dedhia, and Niraj K Jha. Zero-tprune: Zero-shot token pruning through leveraging of the attention graph in pre-trained transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16070–16079, 2024

  6. [6]

    Quantifying attention flow in transformers

    Samira Abnar and Willem Zuidema. Quantifying attention flow in transformers. InProceedings of the 58th annual meeting of the association for computational linguistics, pages 4190–4197, 2020

  7. [7]

    arXiv preprint arXiv:2202.07800 , year=

    Youwei Liang, Chongjian Ge, Zhan Tong, Yibing Song, Jue Wang, and Pengtao Xie. Not all patches are what you need: Expediting vision transformers via token reorganizations.arXiv preprint arXiv:2202.07800, 2022

  8. [8]

    Dynamicvit: Efficient vision transformers with dynamic token sparsification.Advances in neural information processing systems, 34:13937–13949, 2021

    Yongming Rao, Wenliang Zhao, Benlin Liu, Jiwen Lu, Jie Zhou, and Cho-Jui Hsieh. Dynamicvit: Efficient vision transformers with dynamic token sparsification.Advances in neural information processing systems, 34:13937–13949, 2021

  9. [9]

    SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

    Yuan Zhang, Chun-Kai Fan, Junpeng Ma, Wenzhao Zheng, Tao Huang, Kuan Cheng, Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, et al. Sparsevlm: Visual token sparsification for efficient vision-language model inference.arXiv preprint arXiv:2410.04417, 2024

  10. [10]

    Prune redundancy, preserve essence: Vision token compression in VLMs via synergistic importance-diversity

    Zhengyao Fang, Pengyuan Lyu, Chengquan Zhang, Guangming Lu, Jun Yu, and Wenjie Pei. Prune redundancy, preserve essence: Vision token compression in VLMs via synergistic importance-diversity. InThe Fourteenth International Conference on Learning Representations, 2026

  11. [11]

    Rollout-guided token pruning for efficient video understanding

    Yonatan Dinai, Ishay Goldin, Avraham Raviv, and Niv Zehngut. Rollout-guided token pruning for efficient video understanding. In2025 IEEE International Conference on Image Processing (ICIP), pages 37–42. IEEE, 2025. 10

  12. [12]

    Efficient Streaming Language Models with Attention Sinks

    Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis. Efficient streaming language models with attention sinks.arXiv preprint arXiv:2309.17453, 2023

  13. [13]

    When Attention Sink Emerges in Language Models: An Empirical View

    Xiangming Gu, Tianyu Pang, Chao Du, Qian Liu, Fengzhuo Zhang, Cunxiao Du, Ye Wang, and Min Lin. When attention sink emerges in language models: An empirical view.arXiv preprint arXiv:2410.10781, 2024

  14. [14]

    Vision transformers need registers

    Timothée Darcet, Maxime Oquab, Julien Mairal, and Piotr Bojanowski. Vision transformers need registers. InThe Twelfth International Conference on Learning Representations, 2024

  15. [15]

    Why do llms attend to the first token?arXiv preprint arXiv:2504.02732, 2025

    Federico Barbero, Alvaro Arroyo, Xiangming Gu, Christos Perivolaropoulos, Michael Bronstein, Petar Veliˇckovi´c, and Razvan Pascanu. Why do llms attend to the first token?arXiv preprint arXiv:2504.02732, 2025

  16. [16]

    What are you sinking? a geometric approach on attention sink.arXiv preprint arXiv:2508.02546, 2025

    Valeria Ruscio, Umberto Nanni, and Fabrizio Silvestri. What are you sinking? a geometric approach on attention sink.arXiv preprint arXiv:2508.02546, 2025

  17. [17]

    Cambridge university press, 2012

    Roger A Horn and Charles R Johnson.Matrix analysis. Cambridge university press, 2012

  18. [18]

    Diffusion maps.Applied and computational harmonic analysis, 21(1):5–30, 2006

    Ronald R Coifman and Stéphane Lafon. Diffusion maps.Applied and computational harmonic analysis, 21(1):5–30, 2006

  19. [19]

    Adaptive token sampling for efficient vision transformers

    Mohsen Fayyaz, Soroush Abbasi Koohpayegani, Farnoush Rezaei Jafari, Sunando Sengupta, Hamid Reza Vaezi Joze, Eric Sommerlade, Hamed Pirsiavash, and Jürgen Gall. Adaptive token sampling for efficient vision transformers. InEuropean conference on computer vision, pages 396–414. Springer, 2022

  20. [20]

    Berg, and Li Fei-Fei

    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge.International Journal of Computer Vision (IJCV), 115(3):211–252, 2015

  21. [21]

    Training data-efficient image transformers & distillation through attention

    Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021

  22. [22]

    How to train your vit? data, augmentation, and regularization in vision transformers,

    Andreas Steiner, Alexander Kolesnikov, Xiaohua Zhai, Ross Wightman, Jakob Uszkoreit, and Lucas Beyer. How to train your vit? data, augmentation, and regularization in vision transformers.arXiv preprint arXiv:2106.10270, 2021

  23. [23]

    All tokens matter: Token labeling for training better vision transformers

    Zi-Hang Jiang, Qibin Hou, Li Yuan, Daquan Zhou, Yujun Shi, Xiaojie Jin, Anran Wang, and Jiashi Feng. All tokens matter: Token labeling for training better vision transformers. In M. Ranzato, A. Beygelzimer, Y . Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 18590–18602. Curran Asso...

  24. [24]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

  25. [25]

    Improved baselines with visual instruction tuning

    Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 26296–26306, 2024

  26. [26]

    The Kinetics Human Action Video Dataset

    Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijaya- narasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et al. The kinetics human action video dataset.arXiv preprint arXiv:1705.06950, 2017

  27. [27]

    Llavanext: Improved reasoning, ocr, and world knowledge, 2024

    Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, and Yong Jae Lee. Llavanext: Improved reasoning, ocr, and world knowledge, 2024. 11

  28. [28]

    Making the v in vqa matter: Elevating the role of image understanding in visual question answering

    Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, and Devi Parikh. Making the v in vqa matter: Elevating the role of image understanding in visual question answering. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 6904–6913, 2017

  29. [29]

    Gqa: A new dataset for real-world visual reasoning and compositional question answering

    Drew A Hudson and Christopher D Manning. Gqa: A new dataset for real-world visual reasoning and compositional question answering. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6700–6709, 2019

  30. [30]

    Vizwiz grand challenge: Answering visual questions from blind people

    Danna Gurari, Qing Li, Abigale J Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, and Jeffrey P Bigham. Vizwiz grand challenge: Answering visual questions from blind people. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3608–3617, 2018

  31. [31]

    Learn to explain: Multimodal reasoning via thought chains for science question answering.Advances in neural information processing systems, 35:2507–2521, 2022

    Pan Lu, Swaroop Mishra, Tanglin Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, and Ashwin Kalyan. Learn to explain: Multimodal reasoning via thought chains for science question answering.Advances in neural information processing systems, 35:2507–2521, 2022

  32. [32]

    Evaluating object hallucination in large vision-language models

    Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Wayne Xin Zhao, and Ji-Rong Wen. Evaluating object hallucination in large vision-language models. InProceedings of the 2023 conference on empirical methods in natural language processing, pages 292–305, 2023

  33. [33]

    MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

    Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, et al. Mme: A comprehensive evaluation benchmark for multimodal large language models.arXiv preprint arXiv:2306.13394, 2023

  34. [34]

    Mmbench: Is your multi-modal model an all-around player? InEuropean conference on computer vision, pages 216–233

    Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, et al. Mmbench: Is your multi-modal model an all-around player? InEuropean conference on computer vision, pages 216–233. Springer, 2024

  35. [35]

    Visionzip: Longer is better but not necessary in vision language models

    Senqiao Yang, Yukang Chen, Zhuotao Tian, Chengyao Wang, Jingyao Li, Bei Yu, and Jiaya Jia. Visionzip: Longer is better but not necessary in vision language models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19792–19802, 2025. 12 A Preliminaries on Diffusion Distance We provide a brief review of diffusion dis...

  36. [36]

    Full Convergence: D(xi, xs) = 0⇐ ⇒x i has been fully absorbed into the sink, yielding identical information routingP (t∗) i,∗ =P (t∗) s,∗

  37. [37]

    Justification(1) Follows directly from the positive definiteness of the ℓ2 norm: ∥v∥2 = 0⇐ ⇒ v=0

    Trajectory Separation:Tokens with large D(xi, xs) exhibit distinct information routing patterns information trajectories from the sink, guaranteed by their geometric separation in theP (t∗) manifold. Justification(1) Follows directly from the positive definiteness of the ℓ2 norm: ∥v∥2 = 0⇐ ⇒ v=0. (2) A large separation D(xi, s) =δ >0 implies ∥P (t∗) i,∗ −...