{"paper":{"title":"Training-Free Multimodal Large Language Model Orchestration","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"A training-free framework uses an off-the-shelf LLM to route and sequence separate modality experts into one unified multimodal system.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Jiayi Ji, Rongrong Ji, Tat-Seng Chua, Tianyu Xie, Wang Chen, Xiawu Zheng, Yuexiao Ma, Yuhang Wu","submitted_at":"2025-08-06T16:17:29Z","abstract_excerpt":"Building interactive omni-modal assistants often relies on end-to-end multimodal alignment to fuse heterogeneous modalities, which incurs substantial data and compute costs and limits extensibility. We present Training-Free Large Language Model Orchestration (LLM Orchestration), a training-free orchestration framework that integrates off-the-shelf modality experts into a unified multimodal input--output system without additional gradient-based training for integration. LLM Orchestration comprises three components: (1) an LLM controller that infers user intent and emits explicit control tokens "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Across diverse multimodal benchmarks, LLM Orchestration achieves strong performance under standard evaluation constraints while maintaining low orchestration overhead and modular upgradeability, providing a practical alternative to costly joint training for omni-modal systems.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That an off-the-shelf LLM controller can reliably infer user intent and emit correct explicit control tokens for expert selection and sequencing without introducing routing errors that degrade overall performance.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A training-free orchestration framework integrates off-the-shelf modality experts via an LLM controller, text-centric cross-modal memory, and unified interaction layer to enable multimodal input-output without joint training.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A training-free framework uses an off-the-shelf LLM to route and sequence separate modality experts into one unified multimodal system.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"bc00d38d97cb8b59067d36e045c46db0905d74748539a2eb1537c9ec35cd39d1"},"source":{"id":"2508.10016","kind":"arxiv","version":4},"verdict":{"id":"d5843d97-2344-442c-b6f0-8178b8168352","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T00:09:26.309979Z","strongest_claim":"Across diverse multimodal benchmarks, LLM Orchestration achieves strong performance under standard evaluation constraints while maintaining low orchestration overhead and modular upgradeability, providing a practical alternative to costly joint training for omni-modal systems.","one_line_summary":"A training-free orchestration framework integrates off-the-shelf modality experts via an LLM controller, text-centric cross-modal memory, and unified interaction layer to enable multimodal input-output without joint training.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That an off-the-shelf LLM controller can reliably infer user intent and emit correct explicit control tokens for expert selection and sequencing without introducing routing errors that degrade overall performance.","pith_extraction_headline":"A training-free framework uses an off-the-shelf LLM to route and sequence separate modality experts into one unified multimodal system."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2508.10016/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"c8f3035a31a58b1b80f66c0530d5838a8af6aca5b6948db126fd44fdb7b8e73c"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}