NP-LoRA: Null Space Projection for Subject-Style LoRA Fusion
Pith reviewed 2026-05-25 07:56 UTC · model grok-4.3
The pith
Null space projection of content LoRA onto the style LoRA's complementary subspace suppresses interference during fusion.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NP-LoRA defines a projection operator that maps the content LoRA into the orthogonal complement of the dominant directions of the style LoRA, thereby attenuating parameter conflicts along those directions while retaining complementary content information; the soft variant interpolates continuously between ordinary linear merging and strict null-space projection via a single regularization parameter.
What carries the argument
Null-space projection operator that projects the content LoRA matrix onto the orthogonal complement of the principal subspace spanned by the style LoRA.
If this is right
- Fusion becomes a controllable geometric operation rather than an empirical averaging step.
- A single scalar parameter governs the strength of style-subspace suppression.
- The method requires no retraining or additional data for each new LoRA pair.
- Content information outside the style principal subspace is preserved by construction.
Where Pith is reading between the lines
- The same projection logic could be applied in reverse, projecting style onto the null space of content, to test symmetry of the interference.
- If the subspaces of multiple styles are known, successive projections might allow controlled multi-style composition.
- The closed-form solution suggests the method could extend to other low-rank adapters beyond diffusion models.
Load-bearing premise
The principal directions extracted from the style LoRA capture the main directions of interference with content updates.
What would settle it
Generate images from the same subject-style LoRA pair using the hard projection, the soft projection at multiple regularization values, and standard merging, then measure whether subject fidelity drops sharply when the projection strength increases.
Figures
read the original abstract
Low-Rank Adaptation (LoRA) fusion enables the composition of subject and style representations for controllable generation without retraining. However, existing approaches primarily operate through weight-level merging, without explicitly modeling how independently trained LoRAs interact in the shared parameter space. We adopt a geometric perspective on LoRA fusion, interpreting content and style LoRAs as occupying overlapping, non-orthogonal low-rank subspaces, where such overlap can lead to conflicting parameter updates that affect generation quality. This observation motivates us to reformulate LoRA fusion not merely as parameter combination, but as a problem of controlling how updates from overlapping subspaces are combined. Based on this insight, we propose Null Space Projection LoRA (NP-LoRA), a training-free framework that employs projection as a fusion operator to explicitly modulate cross-LoRA interactions. Specifically, NP-LoRA uses principal directions of the style LoRA to define a projection subspace and projects the content LoRA onto the complementary subspace (i.e., the null space of the style LoRA), suppressing interference along dominant style directions while preserving complementary information. To avoid the overly aggressive suppression of hard projection, we further formulate soft projection as a regularized optimization problem that balances content preservation against style-subspace suppression. This objective admits a closed-form solution, yielding a projection operator controlled by a single parameter that continuously interpolates between linear merging and hard projection. Extensive experiments across multiple pretrained LoRA pairs show that NP-LoRA achieves more balanced content-style composition compared to strong baselines, without requiring retraining.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes NP-LoRA, a training-free geometric method for fusing independently trained subject and style LoRAs. It interprets the LoRAs as occupying overlapping low-rank subspaces and defines a projection operator that projects the content LoRA onto the orthogonal complement of the top singular vectors of the style LoRA matrix, thereby suppressing interference along dominant style directions. A soft-projection variant is derived as the closed-form solution to a regularized optimization problem controlled by a single scalar that interpolates between linear merging and hard projection. The central claim is that this yields more balanced content-style composition than existing weight-merging baselines across multiple pretrained LoRA pairs.
Significance. If the subspace separation assumption holds and the reported improvements are reproducible, the approach would supply a lightweight, parameter-efficient operator for controllable generation that avoids retraining or additional fine-tuning. The closed-form soft-projection solution and the explicit modeling of non-orthogonality are technically clean contributions that could be adopted in other low-rank adaptation settings.
major comments (3)
- [Abstract] Abstract and experimental claims: the assertion that NP-LoRA 'achieves more balanced content-style composition compared to strong baselines' is presented without any quantitative metrics, error bars, dataset sizes, or subject-consistency scores. This absence is load-bearing because the central claim rests entirely on an unverified experimental assertion rather than on verifiable numbers.
- [Method] Method section (soft-projection formulation): the single tunable scalar that controls the regularized projection is a free parameter; the manuscript does not state whether its value is chosen by cross-validation on the same evaluation set used to report results. If so, this introduces a circularity that undermines the claim of training-free superiority.
- [Method] Geometric construction: the claim that the top singular vectors of the style LoRA isolate the interference subspace while leaving subject-specific content directions largely intact is not accompanied by any diagnostic (e.g., cosine overlap between content and style singular vectors or rank preservation after projection). Without such a check the weakest assumption remains untested and the null-space guarantee is not established.
minor comments (2)
- [Method] Notation for the projection operator and the regularization parameter should be introduced with explicit definitions and ranges before the closed-form derivation is presented.
- [Abstract] The abstract states 'extensive experiments across multiple pretrained LoRA pairs' but supplies no table or figure reference; a results table with per-pair metrics would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment point-by-point below, with clarifications on the abstract, parameter selection, and geometric assumptions, and indicate planned revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract and experimental claims: the assertion that NP-LoRA 'achieves more balanced content-style composition compared to strong baselines' is presented without any quantitative metrics, error bars, dataset sizes, or subject-consistency scores. This absence is load-bearing because the central claim rests entirely on an unverified experimental assertion rather than on verifiable numbers.
Authors: The abstract is intended as a concise overview; the full manuscript reports quantitative results including subject-consistency scores, style fidelity metrics, and comparisons over multiple LoRA pairs with dataset details. To address the concern directly, we will revise the abstract to include key quantitative highlights such as average improvements and evaluation scale. revision: yes
-
Referee: [Method] Method section (soft-projection formulation): the single tunable scalar that controls the regularized projection is a free parameter; the manuscript does not state whether its value is chosen by cross-validation on the same evaluation set used to report results. If so, this introduces a circularity that undermines the claim of training-free superiority.
Authors: The scalar is chosen via limited visual inspection on a small held-out set disjoint from the reported evaluation data, consistent with the training-free fusion claim. We will explicitly document this procedure in the revised method section to remove ambiguity. revision: yes
-
Referee: [Method] Geometric construction: the claim that the top singular vectors of the style LoRA isolate the interference subspace while leaving subject-specific content directions largely intact is not accompanied by any diagnostic (e.g., cosine overlap between content and style singular vectors or rank preservation after projection). Without such a check the weakest assumption remains untested and the null-space guarantee is not established.
Authors: We agree that direct diagnostics would strengthen the geometric claims. We will add cosine-similarity analysis between content and style singular vectors together with post-projection rank statistics in a new appendix or results subsection. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper's derivation introduces a geometric view of LoRA subspaces and defines NP-LoRA via principal directions of the style LoRA to construct a projection operator (hard and soft variants with closed-form solution). This construction is independent of the reported experimental outcomes; the performance claims rest on external evaluation across multiple pretrained LoRA pairs rather than any fitted parameter or self-referential definition. No self-citations, uniqueness theorems, or ansatzes from prior author work are invoked as load-bearing steps. The single tunable parameter in soft projection is presented as part of the method definition, not as a post-hoc fit renamed as prediction. The derivation chain is therefore self-contained against the stated benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- soft-projection regularization parameter
axioms (1)
- domain assumption Content and style LoRAs occupy overlapping, non-orthogonal low-rank subspaces whose overlap produces conflicting parameter updates.
Reference graph
Works this paper leans on
-
[1]
Lora: Low-rank adaptation of large language models
E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chen,et al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022
work page 2022
-
[2]
Denoising diffusion probabilistic models,
J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, vol. 33, pp. 6840– 6851, 2020
work page 2020
-
[3]
Denoising diffusion implicit models,
J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” inInternational Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=St1giarCHLP
work page 2021
-
[4]
Diffusion models beat gans on image synthesis,
P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” inAdvances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y . Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 8780–
work page 2021
-
[5]
[Online]. Available: https://proceedings.neurips.cc/paper files/ paper/2021/file/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf
work page 2021
-
[6]
High- resolution image synthesis with latent diffusion models,
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), June 2022, pp. 10 684–10 695
work page 2022
-
[7]
Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation,
N. Ruiz, Y . Li, V . Jampani, Y . Pritch, M. Rubinstein, and K. Aberman, “Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 22 500–22 510. 8
work page 2023
-
[8]
Ziplora: Any subject in any style by effectively merging loras,
V . Shah, N. Ruiz, F. Cole, E. Lu, S. Lazebnik, Y . Li, and V . Jampani, “Ziplora: Any subject in any style by effectively merging loras,” in European Conference on Computer Vision. Springer, 2024, pp. 422– 438
work page 2024
-
[9]
K-lora: Unlocking training-free fusion of any subject and style loras,
Z. Ouyang, Z. Li, and Q. Hou, “K-lora: Unlocking training-free fusion of any subject and style loras,” inCVPR, 2025
work page 2025
-
[10]
Lora.rar: Learning to merge loras via hypernetworks for subject-style conditioned image generation,
D. Shenaj, O. Bohdal, M. Ozay, P. Zanuttigh, and U. Michieli, “Lora.rar: Learning to merge loras via hypernetworks for subject-style conditioned image generation,” inProceedings of the IEEE/CVF International Con- ference on Computer Vision (ICCV), October 2025
work page 2025
-
[11]
A neural space- time representation for text-to-image personalization,
Y . Alaluf, E. Richardson, G. Metzer, and D. Cohen-Or, “A neural space- time representation for text-to-image personalization,”ACM Transac- tions on Graphics (TOG), vol. 42, no. 6, pp. 1–10, 2023
work page 2023
-
[12]
p+: Ex- tended textual conditioning in text-to-image generation,
A. V oynov, Q. Chu, D. Cohen-Or, and K. Aberman, “p+: Ex- tended textual conditioning in text-to-image generation,”arXiv preprint arXiv:2303.09522, 2023
-
[13]
Inversion-based style transfer with diffusion models,
Y . Zhang, N. Huang, F. Tang, H. Huang, C. Ma, W. Dong, and C. Xu, “Inversion-based style transfer with diffusion models,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 10 146–10 156
work page 2023
-
[14]
Multi- concept customization of text-to-image diffusion,
N. Kumari, B. Zhang, R. Zhang, E. Shechtman, and J.-Y . Zhu, “Multi- concept customization of text-to-image diffusion,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 1931–1941
work page 2023
-
[15]
Break-a-scene: Extracting multiple concepts from a single image,
O. Avrahami, K. Aberman, O. Fried, D. Cohen-Or, and D. Lischinski, “Break-a-scene: Extracting multiple concepts from a single image,” in SIGGRAPH Asia 2023 Conference Papers, 2023, pp. 1–12
work page 2023
-
[16]
Instantbooth: Personalized text-to-image generation without test-time finetuning,
J. Shi, W. Xiong, Z. Lin, and H. J. Jung, “Instantbooth: Personalized text-to-image generation without test-time finetuning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 8543–8552
work page 2024
-
[17]
Fastcomposer: Tuning-free multi-subject image generation with localized attention,
G. Xiao, T. Yin, W. T. Freeman, F. Durand, and S. Han, “Fastcomposer: Tuning-free multi-subject image generation with localized attention,” International Journal of Computer Vision, vol. 133, no. 3, pp. 1175– 1194, 2025
work page 2025
-
[18]
Smartbrush: Text and shape guided object inpainting with diffusion model,
S. Xie, Z. Zhang, Z. Lin, T. Hinz, and K. Zhang, “Smartbrush: Text and shape guided object inpainting with diffusion model,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 22 428–22 437
work page 2023
-
[19]
Lora+: Efficient low rank adaptation of large models,
S. Hayou, N. Ghosh, and B. Yu, “Lora+: Efficient low rank adaptation of large models,”arXiv preprint arXiv:2402.12354, 2024
-
[20]
Vera: Vector-based random matrix adaptation,
D. J. Kopiczko, T. Blankevoort, and Y . M. Asano, “Vera: Vector-based random matrix adaptation,”arXiv preprint arXiv:2310.11454, 2023
-
[21]
Melora: Mini-ensemble low-rank adapters for parameter-efficient fine-tuning,
P. Ren, C. Shi, S. Wu, M. Zhang, Z. Ren, M. de Rijke, Z. Chen, and J. Pei, “Melora: Mini-ensemble low-rank adapters for parameter-efficient fine-tuning,”arXiv preprint arXiv:2402.17263, 2024
-
[22]
LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning
L. Zhang, L. Zhang, S. Shi, X. Chu, and B. Li, “Lora-fa: Memory- efficient low-rank adaptation for large language models fine-tuning,” arXiv preprint arXiv:2308.03303, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[23]
Lora-drop: Efficient lora parameter pruning based on output evaluation,
H. Zhou, X. Lu, W. Xu, C. Zhu, T. Zhao, and M. Yang, “Lora-drop: Efficient lora parameter pruning based on output evaluation,”arXiv preprint arXiv:2402.07721, 2024
-
[24]
Delta-lora: Fine-tuning high-rank parameters with the delta of low-rank matrices,
B. Zi, X. Qi, L. Wang, J. Wang, K.-F. Wong, and L. Zhang, “Delta-lora: Fine-tuning high-rank parameters with the delta of low-rank matrices,” arXiv preprint arXiv:2309.02411, 2023
-
[25]
Duolora : Cycle-consistent and rank-disentangled content-style personalization,
A. Roy, S. Borse, S. Kadambi, D. Das, S. Mahajan, R. Garrepalli, H. Park, A. Nayak, R. Chellappa, M. Hayat, and F. Porikli, “Duolora : Cycle-consistent and rank-disentangled content-style personalization,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2025, pp. 15 395–15 404
work page 2025
-
[26]
Implicit style- content separation using b-lora,
Y . Frenkel, Y . Vinker, A. Shamir, and D. Cohen-Or, “Implicit style- content separation using b-lora,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 181–198
work page 2024
-
[27]
Csgo: Content-style composition in text-to-image generation,
P. Xing, H. Wang, Y . Sun, Q. Wang, X. Bai, H. Ai, R. Huang, and Z. Li, “Csgo: Content-style composition in text-to-image generation,” arXiv preprint arXiv:2408.16766, 2024
-
[28]
How to continually adapt text-to-image diffusion models for flexible customization?
J. Dong, W. Liang, H. Li, D. Zhang, M. Cao, H. Ding, S. H. Khan, and F. Shahbaz Khan, “How to continually adapt text-to-image diffusion models for flexible customization?”Advances in Neural Information Processing Systems, vol. 37, pp. 130 057–130 083, 2024
work page 2024
-
[29]
Y . Gu, X. Wang, J. Z. Wu, Y . Shi, Y . Chen, Z. Fan, W. Xiao, R. Zhao, S. Chang, W. Wu,et al., “Mix-of-show: Decentralized low-rank adap- tation for multi-concept customization of diffusion models,”Advances in Neural Information Processing Systems, vol. 36, pp. 15 890–15 902, 2023
work page 2023
-
[30]
Mcˆ 2: Multi-concept guidance for customized multi-concept genera- tion,
J. Jiang, Y . Zhang, K. Feng, X. Wu, W. Li, R. Pei, F. Li, and W. Zuo, “Mcˆ 2: Multi-concept guidance for customized multi-concept genera- tion,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 2802–2812
work page 2025
-
[31]
Cones: Concept neurons in diffusion models for customized generation,
Z. Liu, R. Feng, K. Zhu, Y . Zhang, K. Zheng, Y . Liu, D. Zhao, J. Zhou, and Y . Cao, “Cones: Concept neurons in diffusion models for customized generation,”arXiv preprint arXiv:2303.05125, 2023
-
[32]
Y . Yang, W. Wang, L. Peng, C. Song, Y . Chen, H. Li, X. Yang, Q. Lu, D. Cai, B. Wu,et al., “Lora-composer: Leveraging low-rank adaptation for multi-concept customization in training-free diffusion models,”arXiv preprint arXiv:2403.11627, 2024
-
[33]
Multi-lora composition for image generation,
M. Zhong, Y . Shen, S. Wang, Y . Lu, Y . Jiao, S. Ouyang, D. Yu, J. Han, and W. Chen, “Multi-lora composition for image generation,” Transactions on Machine Learning Research, vol. 2024, 2024
work page 2024
-
[34]
A. Zhang, X. Ding, H. Wang, S. McDonagh, and S. Kaski, “Rethinking inter-lora orthogonality in adapter merging: Insights from orthogonal monte carlo dropout,”arXiv preprint arXiv:2510.03262, 2025
-
[35]
Subject or style: Adaptive and training- free mixture of loras,
J.-C. Zhang and Y .-J. Xiong, “Subject or style: Adaptive and training- free mixture of loras,”arXiv preprint arXiv:2508.02165, 2025
-
[36]
Model merging with svd to tie the knots,
G. Stoica, P. Ramesh, B. Ecsedi, L. Choshen, and J. Hoffman, “Model merging with svd to tie the knots,” inThe Thirteenth International Conference on Learning Representations
-
[37]
Subzero: Composing subject, style, and action via zero-shot personalization,
S. Borse, K. Bhardwaj, M. R. K. Dastjerdi, H. Park, S. Kadambi, S. Shiv- akumar, P. Mandke, A. Nayak, H. Teague, M. Hayat,et al., “Subzero: Composing subject, style, and action via zero-shot personalization,” arXiv preprint arXiv:2502.19673, 2025
-
[38]
Zero-shot adaptation of parameter-efficient fine-tuning in diffusion models,
F. Farhadzadeh, D. Das, S. Borse, and F. Porikli, “Zero-shot adaptation of parameter-efficient fine-tuning in diffusion models,” inProceedings of the 42nd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, A. Singh, M. Fazel, D. Hsu, S. Lacoste-Julien, F. Berkenkamp, T. Maharaj, K. Wagstaff, and J. Zhu, Eds., vol....
work page 2025
-
[39]
Alphaedit: Null-space constrained knowledge editing for lan- guage models,
J. Fang, H. Jiang, K. Wang, Y . Ma, J. Shi, X. Wang, X. He, and T.-S. Chua, “Alphaedit: Null-space constrained knowledge editing for lan- guage models,” inThe Thirteenth International Conference on Learning Representations
-
[40]
Y . Zhang, C. Cao, C. Yu, and J. Zhu, “Lion-lora: Rethinking lora fusion to unify controllable spatial and temporal generation for video diffusion,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 14 569–14 579
work page 2025
-
[41]
Lora-null: Low-rank adaptation via null space for large language models,
P. Tang, Y . Liu, D. Zhang, X. Wu, and D. Zhang, “Lora-null: Low-rank adaptation via null space for large language models,”arXiv preprint arXiv:2503.02659, 2025
-
[42]
Image style transfer using convolutional neural networks,
L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2414–2423
work page 2016
-
[43]
Arbitrary style transfer in real-time with adaptive instance normalization,
X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” inProceedings of the IEEE interna- tional conference on computer vision, 2017, pp. 1501–1510
work page 2017
-
[44]
Universal style transfer via feature transforms,
Y . Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang, “Universal style transfer via feature transforms,”Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[45]
A. Tikhonov and V . Arsenin,Solutions of Ill-posed Problems, ser. Halsted Press book. Winston, 1977. [Online]. Available: https://books.google.co.jp/books?id=ECrvAAAAMAAJ
work page 1977
-
[46]
M. Woodbury and P. U. D. of Statistics,Inverting Modified Matrices, ser. Memorandum Report / Statistical Research Group, Princeton. Department of Statistics, Princeton University, 1950. [Online]. Available: https://books.google.co.jp/books?id= zAnzgEACAAJ
work page 1950
-
[47]
Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation,
N. Ruiz, Y . Li, V . Jampani, Y . Pritch, M. Rubinstein, and K. Aberman, “Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 22 500–22 510
work page 2023
-
[48]
Styledrop: Text-to-image generation in any style,
K. Sohn, N. Ruiz, K. Lee, D. C. Chin, I. Blok, H. Chang, J. Barber, L. Jiang, G. Entis, Y . Li,et al., “Styledrop: Text-to-image generation in any style,”arXiv preprint arXiv:2306.00983, 2023
-
[49]
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. M ¨uller, J. Penna, and R. Rombach, “Sdxl: Improving latent diffusion models for high-resolution image synthesis,”arXiv preprint arXiv:2307.01952, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[50]
Learning transferable visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” inProceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. Meila and T. Zhang, Ed...
work page 2021
-
[51]
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y . Shum, “Dino: Detr with improved denoising anchor boxes for end-to- end object detection,”arXiv preprint arXiv:2203.03605, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[52]
From clip to dino: Visual encoders shout in multi-modal large language models,
D. Jiang, Y . Liu, S. Liu, J. Zhao, H. Zhang, Z. Gao, X. Zhang, J. Li, and H. Xiong, “From clip to dino: Visual encoders shout in multi-modal large language models,” 2024. [Online]. Available: https://arxiv.org/abs/2310.08825
-
[53]
Van Rijsbergen,Information Retrieval
C. Van Rijsbergen,Information Retrieval. Butterworths, 1979. [Online]. Available: https://books.google.co.jp/books?id=t-pTAAAAMAAJ
work page 1979
-
[54]
N. Chinchor, “Muc-4 evaluation metrics,” inProceedings of the 4th Conference on Message Understanding, ser. MUC4 ’92. USA: Association for Computational Linguistics, 1992, p. 22–29. [Online]. Available: https://doi.org/10.3115/1072064.1072067
-
[55]
Performance measures and a data set for multi-target, multi-camera tracking,
E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Performance measures and a data set for multi-target, multi-camera tracking,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2016, pp. 17–35, proposes IDF1, a harmonic mean of ID precision and recall for tracking evaluation
work page 2016
-
[56]
Zero-shot learning – the good, the bad and the ugly,
Y . Xian, C. H. Lampert, B. Schiele, and Z. Akata, “Zero-shot learning – the good, the bad and the ugly,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4582– 4591, introduces the harmonic mean (H-score) of seen/unseen accuracies for balanced evaluation in generalized zero-shot learning. SUPPLEMENTARYMATE...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.