pith. sign in
Pith Number

pith:27P4QCOH

pith:2025:27P4QCOHDNSQZP674SDDZJAAFE
not attested not anchored not stored refs resolved

NoisyGRPO: Incentivizing Multimodal CoT Reasoning via Noise Injection and Bayesian Estimation

Jiaxuan Sun, Longtian Qiu, Shan Ning, Xuming He

Noise injection into visual inputs and Bayesian advantage estimation improve generalization in multimodal chain-of-thought reasoning.

arxiv:2510.21122 v3 · 2025-10-24 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{27P4QCOHDNSQZP674SDDZJAAFE}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experiments on standard CoT quality, general capability, and hallucination benchmarks demonstrate that NoisyGRPO substantially improves generalization and robustness, especially in RL settings with small-scale MLLMs such as Qwen2.5-VL 3B.

C2weakest assumption

The assumption that the injected Gaussian noise level can be directly used as a prior in a Bayesian model whose posterior advantage estimate reliably prefers visually grounded trajectories over those that succeed only under noise; this premise is invoked in the description of Bayesian Advantage Estimation without further justification of the likelihood model or prior calibration.

C3one line summary

NoisyGRPO is an RL framework that perturbs visual inputs with Gaussian noise for exploration and computes trajectory advantages via Bayesian posterior fusion of noise prior and reward likelihood to improve multimodal CoT generalization.

References

63 extracted · 63 resolved · 34 Pith anchors

[1] Gpt-4 technical report 2023
[2] Flamingo: a visual language model for few-shot learning.Advances in Neural Information Processing Systems, 35:23716–23736 2022
[3] Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond 2023 · arXiv:2308.12966
[4] Qwen2.5-VL Technical Report 2025 · arXiv:2502.13923
[5] Are We on the Right Way for Evaluating Large Vision-Language Models? 2024 · arXiv:2403.20330

Formal links

2 machine-checked theorem links

Cited by

2 papers in Pith

Receipt and verification
First computed 2026-06-08T01:03:50.303122Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d7dfc809c71b650cbfdfe4863ca4002937cc5242ed4283c9829873f921addd41

Aliases

arxiv: 2510.21122 · arxiv_version: 2510.21122v3 · doi: 10.48550/arxiv.2510.21122 · pith_short_12: 27P4QCOHDNSQ · pith_short_16: 27P4QCOHDNSQZP67 · pith_short_8: 27P4QCOH
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/27P4QCOHDNSQZP674SDDZJAAFE \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d7dfc809c71b650cbfdfe4863ca4002937cc5242ed4283c9829873f921addd41
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "a4f0131f9e76647b87de5c63a815a71e55aef40f8f739a33f990cc727e40a340",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-10-24T03:23:34Z",
    "title_canon_sha256": "6d195fa51f33faadd7c1379b4d88688ddf3cbbf68b8dcd5aedc87f7aad6e88f8"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2510.21122",
    "kind": "arxiv",
    "version": 3
  }
}