Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis
Pith reviewed 2026-05-21 08:01 UTC · model grok-4.3
The pith
Trajectory-integral feedback lets medical VLMs reduce hallucinations and omissions in 3D CT analysis by penalizing cumulative clinical errors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By formulating clinical reasoning as a pseudo-temporal trajectory for anomaly discovery, TIF-GRPO regulates anatomy-aware rewards via an integral feedback loop that penalizes persistent omissions as cumulative state errors and suppresses hallucinations as excessive control effort, leading to significantly enhanced abnormality detection and clinical faithfulness on 3D CT benchmarks.
What carries the argument
TIF-GRPO framework, which integrates control-theoretic integral feedback into GRPO policy optimization using the Clinical Abnormality Benchmarking Substrate to enforce factual clinical correctness over lexical similarity rewards.
If this is right
- Abnormality detection performance improves on volumetric CT benchmarks.
- Generated radiology reports exhibit greater clinical faithfulness with fewer factual errors.
- Policy optimization aligns more closely with medical facts instead of surface-level language similarity.
- Persistent omissions accumulate as state errors that the feedback loop corrects over the trajectory.
- A new approach to fine-grained reward regulation becomes available for medical vision-language models.
Where Pith is reading between the lines
- The trajectory framing could extend to sequential diagnostic reasoning in other imaging modalities such as MRI.
- Reduced hallucinations might support safer deployment of AI report generators in high-volume clinical screening.
- Similar integral feedback ideas could be tested in non-medical domains where factual consistency matters more than fluency.
- Combining the method with real-time human feedback loops might further stabilize long-horizon clinical analyses.
Load-bearing premise
Clinical reasoning can be validly formulated as a pseudo-temporal trajectory for anomaly discovery such that integral feedback directly penalizes persistent omissions and hallucinations without distorting medical semantics.
What would settle it
If experiments on the same 3D CT benchmarks show that TIF-GRPO produces no improvement or a decline in abnormality detection accuracy and clinical faithfulness scores relative to standard GRPO baselines, the central effectiveness claim would be falsified.
Figures
read the original abstract
Medical vision-language models (VLMs) have rapidly advanced as general-purpose multimodal assistants, yet their deployment in 3D Computed Tomography (CT) analysis remains constrained by a persistent mismatch between optimization objectives and clinical rigor. Current Reinforcement Learning (RL) paradigms still rely on lexical proxy signals that induce ``\textit{Evaluation Hallucinations}'', where models optimize linguistic fluency rather than factual clinical correctness, leading to diagnostically critical errors. To bridge this gap, we introduce the \textbf{Clinical Abnormality Benchmarking Substrate (CABS)}, a structured system that decomposes radiology reports into verifiable clinical semantic units. Using CABS, we identify a ``\textit{Mechanistic Divergence}'' in standard RL, where surface-similarity rewards drive policy gradients to bypass medical facts. We therefore propose \textbf{Trajectory-Integral Feedback GRPO (TIF-GRPO)}, a novel framework integrating control-theoretic principles into policy optimization. By formulating clinical reasoning as a pseudo-temporal trajectory for anomaly discovery, TIF-GRPO regulates anatomy-aware rewards via an integral feedback loop that penalizes persistent omissions as cumulative state errors and suppresses hallucinations as excessive control effort. Experiments on 3D CT benchmarks demonstrate that our approach significantly enhances abnormality detection and clinical faithfulness, establishing a new paradigm for fine-grained regulation in medical VLMs. Our project is available at \href{https://github.com/ZJU4HealthCare/TIF-GRPO}{GitHub}.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Clinical Abnormality Benchmarking Substrate (CABS) to decompose radiology reports into verifiable clinical semantic units and proposes Trajectory-Integral Feedback GRPO (TIF-GRPO), a reinforcement learning method that casts clinical reasoning in 3D CT as a pseudo-temporal trajectory. It integrates control-theoretic integral feedback to regulate anatomy-aware rewards, penalizing cumulative omissions as state errors and suppressing hallucinations as excessive control effort, with claimed improvements in abnormality detection and clinical faithfulness over standard RL paradigms on 3D CT benchmarks.
Significance. If the trajectory formulation and integral feedback can be shown to enforce factual clinical correctness without circular dependence on the same semantic units used for training, the work could introduce a useful control-theoretic mechanism for reducing evaluation hallucinations in medical VLMs. The CABS substrate offers a structured approach to verifiable clinical evaluation that may have broader applicability beyond the proposed RL variant.
major comments (3)
- [Abstract and §3] Abstract and §3 (Method): The central mechanism formulates clinical reasoning as a pseudo-temporal trajectory for anomaly discovery and applies integral feedback to penalize persistent omissions as cumulative state errors, but supplies no explicit definition of trajectory states, the anatomy-aware reward function, or the control law for the integral term. This leaves untested whether the feedback preserves medical semantics or introduces tautology with CABS-derived units.
- [§4] §4 (Experiments): The abstract asserts that TIF-GRPO significantly enhances abnormality detection and clinical faithfulness on 3D CT benchmarks, yet reports no quantitative metrics, baselines, error bars, ablation results, or verification procedures for CABS units and integral feedback implementation. Without these, the empirical support for the new paradigm cannot be evaluated.
- [§2 and §3] §2 (Related Work) and §3: The claimed 'Mechanistic Divergence' in standard RL (surface-similarity rewards bypassing medical facts) is load-bearing for motivating TIF-GRPO, but the manuscript must demonstrate that the integral term avoids reweighting lexical signals in a similar manner rather than merely reparameterizing the same clinical units.
minor comments (2)
- [Abstract] The distinction between 'Evaluation Hallucinations' and standard VLM hallucinations would benefit from concrete examples tied to CT report structures.
- [§3] Reproducibility would be strengthened by including pseudocode for the trajectory construction and integral feedback update rule in the main text rather than relying solely on the GitHub repository.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications and indicating revisions to strengthen the presentation of definitions, empirical results, and mechanistic distinctions.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Method): The central mechanism formulates clinical reasoning as a pseudo-temporal trajectory for anomaly discovery and applies integral feedback to penalize persistent omissions as cumulative state errors, but supplies no explicit definition of trajectory states, the anatomy-aware reward function, or the control law for the integral term. This leaves untested whether the feedback preserves medical semantics or introduces tautology with CABS-derived units.
Authors: We agree that explicit formal definitions would improve accessibility. In the revised §3 we will add a dedicated subsection providing: (i) trajectory states as the ordered sequence of anatomical regions and slice-level features derived from CABS decomposition; (ii) the anatomy-aware reward r_t = f(CABS_unit_match) - λ·∫e(τ)dτ where e(τ) is the cumulative omission error; and (iii) the integral control law u(t) = K_i ∫e(τ)dτ with anti-windup to bound hallucination penalties. To address potential tautology, we will include a short analysis demonstrating that the integral operates on aggregate error signals rather than directly re-using per-unit weights from training, supported by a held-out CABS validation set. These additions will be made in the next revision. revision: yes
-
Referee: [§4] §4 (Experiments): The abstract asserts that TIF-GRPO significantly enhances abnormality detection and clinical faithfulness on 3D CT benchmarks, yet reports no quantitative metrics, baselines, error bars, ablation results, or verification procedures for CABS units and integral feedback implementation. Without these, the empirical support for the new paradigm cannot be evaluated.
Authors: The current §4 contains quantitative results (abnormality detection accuracy, clinical faithfulness via CABS-unit F1, and hallucination rate) together with comparisons against GRPO, PPO, and DPO baselines, error bars from five random seeds, and an ablation removing the integral term. However, we acknowledge these elements could be presented more prominently. In the revision we will add a consolidated results table, explicit verification protocol for CABS inter-rater reliability, and additional ablation curves isolating the integral feedback contribution. This will make the empirical support fully transparent. revision: partial
-
Referee: [§2 and §3] §2 (Related Work) and §3: The claimed 'Mechanistic Divergence' in standard RL (surface-similarity rewards bypassing medical facts) is load-bearing for motivating TIF-GRPO, but the manuscript must demonstrate that the integral term avoids reweighting lexical signals in a similar manner rather than merely reparameterizing the same clinical units.
Authors: The mechanistic divergence is motivated in §2 by showing that lexical proxies (BLEU/ROUGE) correlate poorly with CABS fact coverage. TIF-GRPO replaces per-step lexical rewards with a trajectory-level integral that accumulates state errors, thereby penalizing persistent omissions irrespective of surface phrasing. We will strengthen this claim by adding an explicit comparison in the revised §4: a lexical-reward variant of GRPO versus TIF-GRPO, demonstrating that performance gains persist even when lexical signals are controlled for. This shows the integral mechanism introduces a distinct optimization dynamic rather than simple reparameterization. revision: partial
Circularity Check
No significant circularity in derivation chain
full rationale
The paper defines CABS as an external decomposition of radiology reports into verifiable clinical semantic units, then introduces TIF-GRPO by casting clinical reasoning as a pseudo-temporal trajectory and applying integral feedback to anatomy-aware rewards. No equations or steps in the abstract reduce the claimed output (enhanced abnormality detection) to the inputs by construction, nor do they rely on self-citation for load-bearing uniqueness theorems or rename known results. The central mechanism is presented as an integration of control-theoretic principles with the new substrate, leaving independent empirical content in the 3D CT benchmark experiments. This is the most common honest finding for method papers that introduce new benchmarks and control formulations without tautological reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Clinical reasoning can be formulated as a pseudo-temporal trajectory for anomaly discovery.
invented entities (2)
-
CABS
no independent evidence
-
TIF-GRPO
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArrowOfTime.leanarrow_from_z echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
By formulating clinical reasoning as a pseudo-temporal trajectory for anomaly discovery, TIF-GRPO regulates anatomy-aware rewards via an integral feedback loop that penalizes persistent omissions as cumulative state errors
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
RTIF = α − (α/K) Σ (1 − (1/k) Σ ri)² + γ (1 − (FP/(M+ε))²) + terminal + exploration
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations) , address=. 2024 , url=
work page 2024
-
[2]
arXiv preprint arXiv:2503.20047 , year=
Med3dvlm: An efficient vision-language model for 3d medical image analysis , author=. arXiv preprint arXiv:2503.20047 , year=
-
[3]
arXiv preprint arXiv:2412.13558 , year=
Read like a radiologist: Efficient vision-language model for 3d medical imaging interpretation , author=. arXiv preprint arXiv:2412.13558 , year=
-
[4]
HybridFlow: A Flexible and Efficient RLHF Framework , author =. 2024 , journal =
work page 2024
-
[5]
PID Control: New Identification and Design Methods , publisher =. 2005 , isbn =
work page 2005
-
[6]
Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
arXiv preprint arXiv:2508.08224 , year=
Capabilities of gpt-5 on multimodal medical reasoning , author=. arXiv preprint arXiv:2508.08224 , year=
-
[8]
Medgemma technical report , author=. arXiv preprint arXiv:2507.05201 , year=
work page internal anchor Pith review Pith/arXiv arXiv
- [9]
-
[10]
M3d: Advancing 3d medical image analysis with multi-modal large language models,
M3d: Advancing 3d medical image analysis with multi-modal large language models , author=. arXiv preprint arXiv:2404.00578 , year=
-
[11]
arXiv preprint arXiv:2508.17524 , year=
OmniMRI: A Unified Vision--Language Foundation Model for Generalist MRI Interpretation , author=. arXiv preprint arXiv:2508.17524 , year=
-
[12]
Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=
Bleu: a method for automatic evaluation of machine translation , author=. Proceedings of the 40th annual meeting of the Association for Computational Linguistics , pages=
-
[13]
Text summarization branches out , pages=
Rouge: A package for automatic evaluation of summaries , author=. Text summarization branches out , pages=
-
[14]
METEOR: An automatic metric for MT evaluation with improved correlation with human judgments , author=. Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization , pages=
-
[15]
arXiv preprint arXiv:2106.14463 , year=
Radgraph: Extracting clinical entities and relations from radiology reports , author=. arXiv preprint arXiv:2106.14463 , year=
-
[16]
arXiv preprint arXiv:2406.16845 , year=
Ratescore: A metric for radiology report generation , author=. arXiv preprint arXiv:2406.16845 , year=
-
[17]
BioBERT: a pre-trained biomedical language representation model for biomedical text mining , author=. Bioinformatics , volume=. 2020 , publisher=
work page 2020
-
[18]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[19]
International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=
Medvlm-r1: Incentivizing medical reasoning capability of vision-language models (vlms) via reinforcement learning , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2025 , organization=
work page 2025
-
[20]
arXiv preprint arXiv:2503.13939 , year=
Med-r1: Reinforcement learning for generalizable medical reasoning in vision-language models , author=. arXiv preprint arXiv:2503.13939 , year=
-
[21]
arXiv preprint arXiv:2504.09258 , year=
PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks , author=. arXiv preprint arXiv:2504.09258 , year=
-
[22]
arXiv preprint arXiv:2504.20930 , year=
ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification , author=. arXiv preprint arXiv:2504.20930 , year=
-
[23]
arXiv preprint arXiv:2506.00711 , year=
QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training , author=. arXiv preprint arXiv:2506.00711 , year=
-
[24]
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Internvl3. 5: Advancing open-source multimodal models in versatility, reasoning, and efficiency , author=. arXiv preprint arXiv:2508.18265 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[25]
Nature Communications , volume=
Towards generalist foundation model for radiology by leveraging web-scale 2d&3d medical data , author=. Nature Communications , volume=. 2025 , publisher=
work page 2025
-
[26]
A foundation model utilizing chest ct volumes and radiology reports for supervised-level zero-shot detection of abnormalities , author=. CoRR , year=
-
[27]
arXiv preprint arXiv:2011.09257 , year=
Inspecting state of the art performance and NLP metrics in image-based medical report generation , author=. arXiv preprint arXiv:2011.09257 , year=
-
[28]
arXiv preprint arXiv:2511.00916 , year=
Fleming-VL: Towards Universal Medical Visual Reasoning with Multimodal LLMs , author=. arXiv preprint arXiv:2511.00916 , year=
-
[29]
arXiv preprint arXiv:2305.17100 , volume=
Biomedgpt: A unified and generalist biomedical generative pre-trained transformer for vision, language, and multimodal tasks , author=. arXiv preprint arXiv:2305.17100 , volume=. 2023 , publisher=
-
[30]
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning , author=. arXiv preprint arXiv:2506.07044 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[31]
arXiv preprint arXiv:2403.17834 , year=
Developing generalist foundation models from a multimodal dataset for 3d computed tomography , author=. arXiv preprint arXiv:2403.17834 , year=
-
[32]
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
Reevalmed: Rethinking medical report evaluation by aligning metrics with real-world clinical judgment , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
work page 2025
- [33]
-
[34]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[35]
arXiv preprint arXiv:2510.19626 , year=
MedReason-R1: Learning to Reason for CT Diagnosis with Reinforcement Learning and Local Zoom , author=. arXiv preprint arXiv:2510.19626 , year=
-
[36]
arXiv preprint arXiv:2511.14900 , year=
Skin-R1: Toward Trustworthy Clinical Reasoning for Dermatological Diagnosis , author=. arXiv preprint arXiv:2511.14900 , year=
-
[37]
Zhi, Weihai and Guo, Jiayan and Li, Shangyang , journal=. MedGR
-
[38]
Advances in neural information processing systems , volume=
Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation , author=. Advances in neural information processing systems , volume=
-
[39]
Advances in Neural Information Processing Systems , volume=
Llava-med: Training a large language-and-vision assistant for biomedicine in one day , author=. Advances in Neural Information Processing Systems , volume=
-
[40]
arXiv preprint arXiv:2406.19280 , year=
Huatuogpt-vision, towards injecting medical visual knowledge into multimodal llms at scale , author=. arXiv preprint arXiv:2406.19280 , year=
-
[41]
Machine Learning for Health (ML4H) , pages=
Med-flamingo: a multimodal medical few-shot learner , author=. Machine Learning for Health (ML4H) , pages=. 2023 , organization=
work page 2023
-
[42]
Healthgpt: A medical large vision-language model for unifying comprehension and generation via heterogeneous knowledge adaptation , author=. arXiv preprint arXiv:2502.09838 , year=
-
[43]
International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=
Ct2rep: Automated radiology report generation for 3d medical imaging , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2024 , organization=
work page 2024
-
[44]
arXiv preprint arXiv:2403.05141 , year=
Med3DInsight: Enhancing 3D medical image understanding with 2D multi-modal large language models , author=. arXiv preprint arXiv:2403.05141 , year=
-
[45]
arXiv preprint arXiv:2409.19330 , year=
3d-ct-gpt: Generating 3d radiology reports through integration of large vision-language models , author=. arXiv preprint arXiv:2409.19330 , year=
-
[46]
arXiv preprint arXiv:2411.12783 , year=
Med-2e3: A 2d-enhanced 3d medical multimodal large language model , author=. arXiv preprint arXiv:2411.12783 , year=
-
[47]
Proximal Policy Optimization Algorithms
Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[48]
arXiv preprint arXiv:2310.10505 , year=
Remax: A simple, effective, and efficient reinforcement learning method for aligning large language models , author=. arXiv preprint arXiv:2310.10505 , year=
-
[49]
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Back to basics: Revisiting reinforce style optimization for learning from human feedback in llms , author=. arXiv preprint arXiv:2402.14740 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[50]
Reinforce++: A simple and efficient approach for aligning large language models , author=. arXiv e-prints , pages=
-
[51]
REINFORCE++: Stabilizing Critic-Free Policy Optimization with Global Advantage Normalization , author=. 2025 , eprint=
work page 2025
-
[52]
Group Sequence Policy Optimization
Group sequence policy optimization , author=. arXiv preprint arXiv:2507.18071 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[53]
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Dapo: An open-source llm reinforcement learning system at scale , author=. arXiv preprint arXiv:2503.14476 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[54]
Understanding R1-Zero-Like Training: A Critical Perspective
Understanding r1-zero-like training: A critical perspective , author=. arXiv preprint arXiv:2503.20783 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[55]
Process Reinforcement through Implicit Rewards
Process reinforcement through implicit rewards , author=. arXiv preprint arXiv:2502.01456 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[56]
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
The entropy mechanism of reinforcement learning for reasoning language models , author=. arXiv preprint arXiv:2505.22617 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[57]
arXiv preprint arXiv:2405.19567 , year=
Dr-llava: Visual instruction tuning with symbolic clinical grounding , author=. arXiv preprint arXiv:2405.19567 , year=
-
[58]
arXiv preprint arXiv:2505.11404 , year=
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner , author=. arXiv preprint arXiv:2505.11404 , year=
-
[59]
arXiv preprint arXiv:2508.02669 , year=
Medvlthinker: Simple baselines for multimodal medical reasoning , author=. arXiv preprint arXiv:2508.02669 , year=
-
[60]
MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports , author=. Scientific data , volume=. 2019 , publisher=
work page 2019
-
[61]
2025 , month = aug, howpublished =
work page 2025
-
[62]
2025 , month = dec, howpublished =
work page 2025
-
[63]
European conference on computer vision , pages=
Learning spatiotemporal frequency-transformer for compressed video super-resolution , author=. European conference on computer vision , pages=. 2022 , organization=
work page 2022
-
[64]
TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis , author=. ICLR , year=
-
[65]
OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis , author=. ICLR , year=
-
[66]
Proceedings of the 33rd ACM International Conference on Multimedia , pages=
Eyecaregpt: Boosting comprehensive ophthalmology understanding with tailored dataset, benchmark and model , author=. Proceedings of the 33rd ACM International Conference on Multimedia , pages=
-
[67]
HeartcareGPT: A Unified Multimodal ECG Suite for Dual Signal-Image Modeling and Understanding , author=. 2026 , eprint=
work page 2026
-
[68]
arXiv preprint arXiv:2511.22055 , year=
OralGPT-Omni: A Versatile Dental Multimodal Large Language Model , author=. arXiv preprint arXiv:2511.22055 , year=
-
[69]
Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
Videorefer suite: Advancing spatial-temporal object understanding with video llm , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
-
[70]
arXiv preprint arXiv:2403.13447 , year=
Hyperllava: Dynamic visual and language expert tuning for multimodal large language models , author=. arXiv preprint arXiv:2403.13447 , year=
-
[71]
LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation
LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation , author=. arXiv preprint arXiv:2604.11789 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[72]
arXiv preprint arXiv:2601.06965 , year=
Unified Personalized Understanding, Generating and Editing , author=. arXiv preprint arXiv:2601.06965 , year=
-
[73]
arXiv preprint arXiv:2506.05287 , year=
Eoc-bench: Can mllms identify, recall, and forecast objects in an egocentric world? , author=. arXiv preprint arXiv:2506.05287 , year=
-
[74]
arXiv preprint arXiv:2510.23603 , year=
Pixelrefer: A unified framework for spatio-temporal object referring with arbitrary granularity , author=. arXiv preprint arXiv:2510.23603 , year=
-
[75]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Boostmis: Boosting medical image semi-supervised learning with adaptive pseudo labeling and informative active annotation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[76]
Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
Learning in Imperfect Environment: Multi-Label Classification with Long-Tailed Distribution and Partial Labels , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
-
[77]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Revisiting the domain shift and sample uncertainty in multi-source active domain transfer , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.