{"paper":{"title":"Thyme: Think Beyond Images","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Thyme lets multimodal models autonomously generate and run code to manipulate images and perform calculations during reasoning.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Bin Wen, Changyi Liu, Chaoyou Fu, Fan Yang, Guorui Zhou, Haojie Ding, Haonan Fan, Jiankang Chen, Kaibing Chen, Kaiyu Jiang, Kaiyu Tang, Liang Wang, Shukang Yin, Tianke Zhang, Tingting Gao, Wei Chen, Xiao Hu, Xingyu Lu, Yi-Fan Zhang, Zhang Zhang","submitted_at":"2025-08-15T17:59:49Z","abstract_excerpt":"Following OpenAI's introduction of the ``thinking with images'' concept, recent efforts have explored stimulating the use of visual information in the reasoning process to enhance model performance in perception and reasoning tasks. However, to the best of our knowledge, no open-source work currently offers a feature set as rich as proprietary models (O3), which can perform diverse image manipulations and simultaneously enhance logical reasoning capabilities through code. In this paper, we make a preliminary attempt in this direction by introducing Thyme (Think Beyond Images), a novel paradigm"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Thyme yields significant and consistent performance gains, particularly in challenging high-resolution perception and complex reasoning tasks.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the RL phase with GRPO-ATS will produce reliable autonomous decisions on when and how to apply code-based image manipulations without introducing execution errors or overfitting to the manually collected high-resolution QA pairs.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Thyme trains MLLMs to autonomously generate executable code for image processing and math computations, yielding gains on high-resolution perception and complex reasoning benchmarks.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Thyme lets multimodal models autonomously generate and run code to manipulate images and perform calculations during reasoning.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"a86b02f87756171900cbad8265d5db6c92c91b9bd57a339f8a368a9c0312e418"},"source":{"id":"2508.11630","kind":"arxiv","version":1},"verdict":{"id":"dac0483f-13bc-4c54-b7fd-eb255261162c","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T00:28:32.861606Z","strongest_claim":"Thyme yields significant and consistent performance gains, particularly in challenging high-resolution perception and complex reasoning tasks.","one_line_summary":"Thyme trains MLLMs to autonomously generate executable code for image processing and math computations, yielding gains on high-resolution perception and complex reasoning benchmarks.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the RL phase with GRPO-ATS will produce reliable autonomous decisions on when and how to apply code-based image manipulations without introducing execution errors or overfitting to the manually collected high-resolution QA pairs.","pith_extraction_headline":"Thyme lets multimodal models autonomously generate and run code to manipulate images and perform calculations during reasoning."},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"b6ce30ad5966026126835fb3a7680046ba90f4cc73d2c183ad1393529e2ba4d8"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}