pith. sign in

arxiv: 2604.07468 · v1 · submitted 2026-04-08 · 💻 cs.AI

M-ArtAgent: Evidence-Based Multimodal Agent for Implicit Art Influence Discovery

Pith reviewed 2026-05-10 18:19 UTC · model grok-4.3

classification 💻 cs.AI
keywords implicit art influencemultimodal agentart attributionevidence-based reasoningReAct protocolstyle analysisiconographic retrievalinfluence benchmark
0
0 comments X

The pith

M-ArtAgent reframes implicit art influence discovery as probabilistic adjudication using a four-phase evidence protocol.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an agent that treats undocumented artistic influences as a problem of building and testing verifiable evidence chains rather than measuring visual similarity alone. It follows a controlled sequence of investigation from images and biographies, corroboration against historical axioms, adversarial falsification by a separate critic, and final verdict. Specialized operators ground style comparisons in formal analysis and iconographic retrieval in established classification systems to keep intermediate steps auditable. On a balanced set of 100 artists and 2000 directed influence pairs, the approach yields strong detection performance that holds after explicit influence phrases are masked. This shows that domain-constrained verification can improve attribution reliability over pattern matching or unguided language model output.

Core claim

M-ArtAgent assembles evidence chains from images and biographies under art-historical axioms, subjects each hypothesis to prompt-isolated adversarial falsification, and reaches 83.7 percent positive-class F1, 0.666 Matthews correlation coefficient, and 0.910 ROC-AUC on the WIB-100 benchmark; these gains remain after leakage controls and phrase masking, establishing that historically grounded adjudication outperforms embedding similarity or unguided multimodal output for implicit influence attribution.

What carries the argument

Four-phase protocol (Investigation, Corroboration, Falsification, Verdict) run by a ReAct-style controller that deploys StyleComparator for formal style analysis and ConceptRetriever for ICONCLASS iconographic grounding to produce auditable claims.

If this is right

  • Attributions become traceable to specific image features, biographical facts, and axiomatic checks rather than opaque similarity scores.
  • Performance stays high when obvious influence language is removed, indicating the method relies on deeper visual and contextual reasoning.
  • The same controller and operators can in principle be applied to other attribution tasks that require domain rules and falsification steps.
  • Benchmarks built around directed pairs and leakage controls provide a clearer testbed for evaluating evidence-based agents in cultural domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be tested on influence relations across other visual media such as photography or film to check whether the same evidence protocol transfers.
  • Incorporating newly digitized archival documents as additional input sources might further strengthen the corroboration and falsification phases.
  • If the critic component is made more independent, the overall system might serve as a template for AI tools in fields like legal precedent analysis or scientific claim verification where falsification is essential.

Load-bearing premise

The protocol with its isolated critic and art-historical axioms produces attributions that align more closely with historical validity than embedding similarity or unguided model output, and the WIB-100 labels form an unbiased ground truth for implicit influence.

What would settle it

Direct comparison of the agent's attributions on a new set of artist pairs against independent judgments by multiple art historians who have no access to the agent's evidence chains or the original benchmark labels.

Figures

Figures reproduced from arXiv: 2604.07468 by Hanyi Liu, Heran Yang, Minghao Wang, Yuhang Xie, Zhonghao Jiu.

Figure 1
Figure 1. Figure 1: FIGURE 1 [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIGURE 2 [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Table VII shows that M-ArtAgent achieves the strongest overall performance among all compared sys￾tems. Over five folds, it reaches 83.2 ± 1.1% Macro￾F1 and 0.666 ± 0.021 MCC while preserving both high recall (86.0 ± 1.4%) and high specificity (80.5 ± 1.8%). GalleryGPT remains the strongest overall baseline, but it trails by 24.5 points in specificity and by 0.272 MCC. Among the newly added KG comparators,… view at source ↗
Figure 3
Figure 3. Figure 3: FIGURE 3 [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIGURE 4 [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIGURE 5 [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
read the original abstract

Implicit artistic influence, although visually plausible, is often undocumented and thus poses a historically constrained attribution problem: resemblance is necessary but not sufficient evidence. Most prior systems reduce influence discovery to embedding similarity or label-driven graph completion, while recent multimodal large language models (LLMs) remain vulnerable to temporal inconsistency and unverified attributions. This paper introduces M-ArtAgent, an evidence-based multimodal agent that reframes implicit influence discovery as probabilistic adjudication. It follows a four-phase protocol consisting of Investigation, Corroboration, Falsification, and Verdict governed by a Reasoning and Acting (ReAct)-style controller that assembles verifiable evidence chains from images and biographies, enforces art-historical axioms, and subjects each hypothesis to adversarial falsification via a prompt-isolated critic. Two theory-grounded operators, StyleComparator for Wolfflin formal analysis and ConceptRetriever for ICONCLASS-based iconographic grounding, ensure that intermediate claims are formally auditable. On the balanced WikiArt Influence Benchmark-100 (WIB-100) of 100 artists and 2,000 directed pairs, M-ArtAgent achieves 83.7% positive-class F1, 0.666 Matthews correlation coefficient (MCC), and 0.910 area under the receiver operating characteristic curve (ROC-AUC), with leakage-control and robustness checks confirming that the gains persist when explicit influence phrases are masked. By coupling multimodal perception with domain-constrained falsification, M-ArtAgent demonstrates that implicit influence analysis benefits from historically grounded adjudication rather than pattern matching alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

4 major / 1 minor

Summary. The paper introduces M-ArtAgent, a multimodal agent that uses a four-phase ReAct-style protocol (Investigation, Corroboration, Falsification, Verdict) with art-historical axioms, a prompt-isolated critic, StyleComparator for formal analysis, and ConceptRetriever for iconographic grounding to perform evidence-based discovery of implicit artistic influences. It claims this approach yields historically valid attributions superior to embedding similarity or unguided LLMs, supported by 83.7% positive-class F1, 0.666 MCC, and 0.910 ROC-AUC on the balanced WIB-100 benchmark (100 artists, 2000 directed pairs), with robustness shown under explicit-phrase masking.

Significance. If the WIB-100 labels constitute independent ground truth and the protocol components are validated through ablations, the work would offer a structured, falsifiable framework for multimodal agents in art history that prioritizes verifiable evidence chains over pattern matching. This could influence agent design in cultural heritage domains by demonstrating the utility of domain axioms and adversarial falsification. The explicit operators and leakage controls are strengths that provide a reproducible template, though the absence of statistical details and component ablations currently limits the strength of the performance claims.

major comments (4)
  1. [Abstract] Abstract: The reported metrics (83.7% F1, 0.666 MCC, 0.910 AUC) are presented without error bars, confidence intervals, number of runs, or statistical significance tests, making it impossible to determine whether the gains over baselines are robust or could arise from variance in the 2000-pair evaluation.
  2. [Abstract] Abstract: No description is given of how the 2000 directed pairs in WIB-100 were labeled (e.g., source of annotations, inter-annotator agreement, or restriction to primary sources), which is load-bearing for the central claim that the protocol produces historically valid attributions rather than retrieving associations from training data.
  3. [Abstract] Abstract: The manuscript provides no ablation isolating the contribution of the Falsification phase or the prompt-isolated critic, both of which are presented as essential to the evidence-based adjudication; without this, the superiority over unguided LLM output cannot be attributed to the claimed protocol elements.
  4. [Abstract] Abstract: The leakage-control experiment masks explicit influence phrases but does not address potential broader overlap between WIB-100 labels (sourced from art-historical texts) and LLM pretraining corpora, leaving open the possibility that performance reflects retrieval rather than adjudication of undocumented implicit influences.
minor comments (1)
  1. [Abstract] The abstract would benefit from a brief statement of the number of artists and pairs in WIB-100 earlier in the performance sentence for immediate clarity.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for their thorough review and constructive suggestions. We address each of the major comments point by point below, and indicate where revisions will be made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The reported metrics (83.7% F1, 0.666 MCC, 0.910 AUC) are presented without error bars, confidence intervals, number of runs, or statistical significance tests, making it impossible to determine whether the gains over baselines are robust or could arise from variance in the 2000-pair evaluation.

    Authors: We agree that providing statistical details is essential for assessing the robustness of our results. In the revised manuscript, we will report results from multiple independent runs (specifying the number, e.g., 5), include error bars and 95% confidence intervals for the metrics, and conduct appropriate statistical tests to compare against baselines. This will allow readers to evaluate the significance of the observed improvements. revision: yes

  2. Referee: [Abstract] Abstract: No description is given of how the 2000 directed pairs in WIB-100 were labeled (e.g., source of annotations, inter-annotator agreement, or restriction to primary sources), which is load-bearing for the central claim that the protocol produces historically valid attributions rather than retrieving associations from training data.

    Authors: The WIB-100 benchmark draws its influence labels from established art-historical sources associated with the WikiArt dataset. We acknowledge that the current manuscript lacks a detailed account of the labeling process. In the revision, we will include an expanded description of the benchmark, specifying the sources of the annotations (art history references), any available details on annotation methodology, and discuss the extent to which labels are based on documented rather than inferred influences. We will also note limitations regarding inter-annotator agreement if comprehensive statistics are not available. revision: yes

  3. Referee: [Abstract] Abstract: The manuscript provides no ablation isolating the contribution of the Falsification phase or the prompt-isolated critic, both of which are presented as essential to the evidence-based adjudication; without this, the superiority over unguided LLM output cannot be attributed to the claimed protocol elements.

    Authors: We recognize the value of component ablations to validate the contributions of the Falsification phase and the prompt-isolated critic. Although the manuscript includes overall performance and some robustness checks, dedicated ablations for these elements were not reported. We will add these ablations in the revised version, comparing performance with and without the Falsification phase and the critic, to better attribute the gains to the specific protocol components. revision: yes

  4. Referee: [Abstract] Abstract: The leakage-control experiment masks explicit influence phrases but does not address potential broader overlap between WIB-100 labels (sourced from art-historical texts) and LLM pretraining corpora, leaving open the possibility that performance reflects retrieval rather than adjudication of undocumented implicit influences.

    Authors: The leakage-control experiment was designed to test reliance on explicit phrases by masking them. We agree that it does not fully rule out retrieval from pretraining data for more implicit associations. In the revision, we will elaborate on this limitation in the discussion section and strengthen the argument by emphasizing how the falsification phase and evidence chain requirements mitigate pure retrieval. If possible, we will consider additional experiments, such as evaluating on influences documented after the model's training cutoff, though this may be constrained by data availability. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical metrics on external benchmark

full rationale

The paper presents an agent architecture evaluated via direct empirical measurements (F1, MCC, AUC) on the WIB-100 benchmark of 2000 directed pairs, with leakage controls that mask explicit phrases. No equations, fitted parameters, self-definitional loops, or load-bearing self-citations appear in the provided text. The four-phase ReAct protocol and operators (StyleComparator, ConceptRetriever) are described as design choices, not quantities derived from the reported performance numbers. The central claim reduces to measured accuracy against held-out labels rather than any construction that equates outputs to inputs by definition. This is a standard non-circular empirical evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence and enforceability of unspecified art-historical axioms plus the assumption that the WIB-100 labels are reliable ground truth for implicit influence.

axioms (1)
  • domain assumption Art-historical axioms can be enforced inside the agent loop to constrain attributions
    Abstract states the controller 'enforces art-historical axioms' without listing them or showing how enforcement is implemented.

pith-pipeline@v0.9.0 · 5586 in / 1430 out tokens · 29781 ms · 2026-05-10T18:19:05.804480+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    Data science and digital art history,

    L. Manovich, “Data science and digital art history,” Int. J. Digital Art History, no. 1, pp. 10–35, 2015

  2. [2]

    The shape of art history in the eyes of the machine,

    A. Elgammal, B. Liu, D. Kim, M. Elhoseiny, and M. Mazzone, “The shape of art history in the eyes of the machine,” in Proc. 32nd AAAI Conf. Artif. Intell., 2018

  3. [3]

    GalleryGPT: Analyzing paintings with large multimodal models,

    Y. Bin, W. Shi, Y. Ding, Z. Hu, Z. Wang, Y. Yang, S.-K. Ng, and H. T. Shen, “GalleryGPT: Analyzing paintings with large multimodal models,” in Proc. 32nd ACM Int. Conf. Multimedia (MM), 2024, pp. 7734–7743

  4. [4]

    Diffusion based augmentation for captioning and retrieval in cultural heritage,

    D. Cioni, L. Berlincioni, F. Becattini, and A. Del Bimbo, “Diffusion based augmentation for captioning and retrieval in cultural heritage,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshops (ICCVW), 2023

  5. [5]

    Caption generation in cultural heritage: Crowdsourced data and tuning multimodal large language models,

    A. Reshetnikov and M.-C. Marinescu, “Caption generation in cultural heritage: Crowdsourced data and tuning multimodal large language models,” in Proc. 1st Workshop Lang. Models Underserved Communities (LM4UC), 2025, pp. 42–50

  6. [6]

    Multimodal metadata assignment for cultural heritage arti- facts,

    L. Rei, D. Mladenić, M. Dorozynski, F. Rottensteiner, T. Schleider, R. Troncy, J. S. Lozano, and M. G. Salvatella, “Multimodal metadata assignment for cultural heritage arti- facts,” Multimedia Systems, vol. 29, pp. 847–869, 2023

  7. [7]

    Towards cross-modal retrieval in chinese cultural heritage documents: Dataset and solution,

    J. Yuan, J. Zhang, F. Wu, D. Lu, H. Lu, and Q. Wang, “Towards cross-modal retrieval in chinese cultural heritage documents: Dataset and solution,” in Proc. Int. Conf. Docu- ment Anal. Recognit. (ICDAR). Springer, 2025, pp. 570–586

  8. [8]

    Learning transferable visual models from natural language supervision,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry et al., “Learning transferable visual models from natural language supervision,” in Proc. Int. Conf. Mach. Learn. (ICML), 2021, pp. 8748–8763

  9. [9]

    BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models,

    J. Li, D. Li, S. Savarese, and S. Hoi, “BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models,” in Proc. Int. Conf. Mach. Learn. (ICML), 2023, pp. 19730–19742

  10. [10]

    A survey on knowledge- enhanced multimodal learning,

    M. Lymperaiou and G. Stamou, “A survey on knowledge- enhanced multimodal learning,” Artif. Intell. Rev., vol. 57, p. 284, 2024

  11. [11]

    A survey on large language model based autonomous agents,

    L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y. Lin, W. X. Zhao, Z. Wei, and J.-R. Wen, “A survey on large language model based autonomous agents,” Front. Comput. Sci., vol. 18, no. 6, p. 186345, 2024

  12. [12]

    ReAct: Synergizing reasoning and acting in language models,

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, “ReAct: Synergizing reasoning and acting in language models,” in Proc. Int. Conf. Learn. Representations (ICLR), 2023

  13. [13]

    Adaptation of agentic AI: A survey of post-training, memory, and skills.arXiv preprint arXiv:2512.16301, 2025

    P. Jiang, J. Lin, Z. Shi, Z. Wang, L. He, Y. Wu, M. Zhong, P. Song, Q. Zhang et al., “Adaptation of agentic AI,” arXiv preprint arXiv:2512.16301, 2025

  14. [14]

    Benchmarking vision language models for cultural understanding,

    S. Nayak, K. Jain, R. Awal, S. Reddy, S. V. Steenkiste, L. A. Hendricks, K. Stanczak, and A. Agrawal, “Benchmarking vision language models for cultural understanding,” in Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP), 2024, pp. 5769–5790

  15. [15]

    Pearl, Causality: Models, Reasoning, and Inference, 2nd ed

    J. Pearl, Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge Univ. Press, 2009

  16. [16]

    Toward automated discovery of artistic influence,

    B. Saleh, K. Abe, R. S. Arora, and A. Elgammal, “Toward automated discovery of artistic influence,” Multimedia Tools Appl., vol. 75, no. 7, pp. 3565–3591, 2016

  17. [17]

    Quantifying creativity in art networks,

    A. Elgammal and B. Saleh, “Quantifying creativity in art networks,” in Proc. 6th Int. Conf. Comput. Creativity (ICCC), 2015, pp. 39–46

  18. [18]

    WP-CLIP: Leveraging CLIP to predict Wölfflin’s principles in visual art,

    A. Ghildyal, L.-Y. Wang, and F. Liu, “WP-CLIP: Leveraging CLIP to predict Wölfflin’s principles in visual art,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshops (ICCVW), 2025, pp. 396–405

  19. [19]

    StyleBabel: Artistic style tagging and captioning,

    D. Ruta, A. Gilbert, P. Aggarwal, N. Marri, A. Kale, J. Briggs, C. Speed, H. Jin, B. Faieta, A. Filipkowski, Z. Lin, and J. Collomosse, “StyleBabel: Artistic style tagging and captioning,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2022, pp. 219–236

  20. [20]

    Lever- aging knowledge graphs and deep learning for automatic art analysis,

    G. Castellano, V. Digeno, G. Sansaro, and G. Vessio, “Lever- aging knowledge graphs and deep learning for automatic art analysis,” Knowl.-Based Syst., vol. 248, p. 108859, 2022

  21. [21]

    GNNBoost: Boosting artwork classification with graph embeddings,

    C. B. El Vaigh, N. Garcia, B. Renoust, C. Chu, Y. Nakashima, Y. Qian, and H. Nagahara, “GNNBoost: Boosting artwork classification with graph embeddings,” Multimedia Tools Appl., vol. 84, pp. 39353–39373, 2025

  22. [22]

    Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,

    C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,” Nature Mach. Intell., vol. 1, no. 5, pp. 206–215, 2019

  23. [23]

    Wölfflin, Principles of Art History: The Problem of the Development of Style in Later Art

    H. Wölfflin, Principles of Art History: The Problem of the Development of Style in Later Art. Dover, 1950

  24. [24]

    An image is worth 16x16 words: Transformers for image recog- nition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn et al., “An image is worth 16x16 words: Transformers for image recog- nition at scale,” in Proc. Int. Conf. Learn. Representations (ICLR), 2021

  25. [25]

    Sentence-BERT: Sentence embeddings using siamese BERT-networks,

    N. Reimers and I. Gurevych, “Sentence-BERT: Sentence embeddings using siamese BERT-networks,” in Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP), 2019, pp. 3982–3992

  26. [26]

    Billion-scale similarity search with GPUs,

    J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with GPUs,” IEEE Trans. Big Data, vol. 7, no. 3, pp. 535–547, 2021

  27. [27]

    Efficient and robust ap- proximate nearest neighbor search using hierarchical navigable small world graphs,

    Y. A. Malkov and D. A. Yashunin, “Efficient and robust ap- proximate nearest neighbor search using hierarchical navigable small world graphs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 4, pp. 824–836, 2020. VOLUME 11, 2023 15

  28. [28]

    The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,

    D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, p. 6, 2020

  29. [29]

    Translating embeddings for modeling multi- relational data,

    A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko, “Translating embeddings for modeling multi- relational data,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 26, 2013

  30. [30]

    Complex embeddings for simple link prediction,

    T.Trouillon,J.Welbl,S.Riedel,É.Gaussier,andG.Bouchard, “Complex embeddings for simple link prediction,” in Proc. Int. Conf. Mach. Learn. (ICML), 2016, pp. 2071–2080

  31. [31]

    Large language models are zero-shot reasoners,

    T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 35, 2022, pp. 22199– 22213

  32. [32]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 35, 2022, pp. 24824–24837

  33. [33]

    CLIP-Art: Contrastive pre-training for fine-grained art classification,

    M. V. Conde and K. Turgutlu, “CLIP-Art: Contrastive pre-training for fine-grained art classification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), 2021, pp. 3956–3960

  34. [34]

    Siamese neural networks for content-based visual art recommendation,

    R. Li, M. Moh, and T.-S. Moh, “Siamese neural networks for content-based visual art recommendation,” in Proc. 17th Int. Conf. Ubiquitous Inf. Manage. Commun. (IMCOM), 2023

  35. [35]

    MoRA: LoRA guided multi-modal disease diagnosis with missing modality,

    Z. Shi, J. Kim, W. Li, Y. Li, and H. Pfister, “MoRA: LoRA guided multi-modal disease diagnosis with missing modality,” in Proc. Med. Image Comput. Comput. Assisted Intervention (MICCAI), 2024, pp. 273–282

  36. [36]

    Task-specific directions: Definition, exploration, and utiliza- tion in parameter efficient fine-tuning,

    C. Si, Z. Shi, S. Zhang, X. Yang, H. Pfister, and W. Shen, “Task-specific directions: Definition, exploration, and utiliza- tion in parameter efficient fine-tuning,” IEEE Trans. Pattern Anal. Mach. Intell., 2026

  37. [37]

    Generalized tensor-based parameter-efficient fine-tuning via Lie group transformations,

    C. Si, Z. Shi, X. Wang, Y. Xiao, X. Yang, and W. Shen, “Generalized tensor-based parameter-efficient fine-tuning via Lie group transformations,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2025

  38. [38]

    DualEdit: Dual editing for knowledge updating in vision- language models,

    Z. Shi, B. Wang, C. Si, Y. Wu, J. Kim, and H. Pfister, “DualEdit: Dual editing for knowledge updating in vision- language models,” in Proc. Conf. Lang. Model. (COLM), 2025. HANYI LIU Hanyi Liu received the B.S. degree from Southeast University, Nanjing, China, and the M.A. degree from the Royal College of Art, London, U.K. She is cur- rently a researcher ...