pith. sign in

arxiv: 2604.10784 · v2 · pith:4XV7OROVnew · submitted 2026-04-12 · 💻 cs.AI

TorchUMM: A Unified Multimodal Model Codebase for Evaluation, Analysis, and Post-training

Pith reviewed 2026-05-21 08:55 UTC · model grok-4.3

classification 💻 cs.AI
keywords unified multimodal modelsevaluation frameworkmultimodal understandingmultimodal generationmultimodal editingpost-trainingreproducible benchmarking
0
0 comments X

The pith

TorchUMM supplies the first unified codebase for evaluating, analyzing, and post-training diverse unified multimodal models across tasks and datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TorchUMM to tackle the problem of comparing unified multimodal models that vary widely in architecture and training approach. It builds a single framework that handles evaluation on understanding, generation, and editing tasks while incorporating both standard and new datasets for measuring perception, reasoning, compositionality, and instruction following. A unified interface together with fixed protocols is meant to remove inconsistencies that arise when each research group writes its own test code. If this approach works, different model designs could be ranked on equal terms, making it clearer which choices improve performance on concrete abilities. The effort also includes tools for post-training so that insights from evaluation can be turned directly into model improvements.

Core claim

TorchUMM is presented as the first unified codebase for comprehensive evaluation, analysis, and post-training across diverse UMM backbones, tasks, and datasets. It supports a broad spectrum of models covering a wide range of scales and design paradigms. The benchmark covers three core task dimensions—multimodal understanding, generation, and editing—and integrates both established and novel datasets to evaluate perception, reasoning, compositionality, and instruction-following abilities through a unified interface and standardized protocols.

What carries the argument

The unified interface and standardized evaluation protocols that let heterogeneous models be tested under the same conditions.

If this is right

  • Researchers can run reproducible comparisons across models that differ in scale and design paradigm.
  • Strengths and limitations in perception, reasoning, and instruction-following become visible under consistent conditions.
  • Post-training routines can be applied uniformly to improve models after evaluation.
  • New datasets can be added to the same benchmark structure without rewriting test harnesses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread use of one codebase could cut the time teams spend re-implementing evaluation pipelines for each new model release.
  • The structure might make it easier to test whether gains on one task transfer to the other two task dimensions.
  • Community contributions could expand the set of post-training methods available inside the same standardized setting.

Load-bearing premise

A single unified interface and standardized protocols can fairly compare models with fundamentally different architectures and training paradigms without introducing framework-specific biases or implementation artifacts.

What would settle it

Evaluating the same set of models once inside TorchUMM and once with each model's original author-provided evaluation scripts, then finding large differences in reported scores or reversed model rankings, would show that the unified protocols do not remove bias.

Figures

Figures reproduced from arXiv: 2604.10784 by Hao Chen, Hayes Bai, Hongyu Zhu, Jindong Wang, Marios Savvides, Pan He, Sharon Li, Wenwen Wang, Yinyi Luo.

Figure 1
Figure 1. Figure 1: Overview of TorchUMM. et al., 2023]. A model that achieves notable gains on certain benchmarks may simultaneously experience performance degradation on others, or across different capability dimensions, including understanding, generation, and image editing. This inconsistency suggests that many reported improvements are localized rather than indicative of a holistic enhancement in model capability, raisin… view at source ↗
Figure 2
Figure 2. Figure 2: Representative UEval cases across models with different paradigms of unification. The first row [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Query-variation analysis under two backbone–model pairings. In each row, the left panel shows [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
read the original abstract

Recent advances in unified multimodal models (UMMs) have led to a proliferation of architectures capable of understanding, generating, and editing across visual and textual modalities. However, developing a unified framework for UMMs remains challenging due to the diversity of model architectures and the heterogeneity of training paradigms and implementation details. In this paper, we present TorchUMM, the first unified codebase for comprehensive evaluation, analysis, and post-training across diverse UMM backbones, tasks, and datasets. TorchUMM supports a broad spectrum of models covering a wide range of scales and design paradigms. Our benchmark encompasses three core task dimensions: multimodal understanding, generation, and editing, and integrates both established and novel datasets to evaluate perception, reasoning, compositionality, and instruction-following abilities. By providing a unified interface and standardized evaluation protocols, TorchUMM enables fair and reproducible comparisons across heterogeneous models and fosters deeper insights into their strengths and limitations, facilitating the development of more capable unified multimodal systems. Code is available at: https://github.com/AIFrontierLab/TorchUMM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript presents TorchUMM as the first unified codebase for comprehensive evaluation, analysis, and post-training of diverse unified multimodal models (UMMs). It supports a broad range of model backbones spanning scales and design paradigms, covers three core task dimensions (multimodal understanding, generation, and editing), integrates established and novel datasets for assessing perception, reasoning, compositionality, and instruction-following, and supplies a unified interface with standardized protocols claimed to enable fair and reproducible comparisons across heterogeneous models.

Significance. If the abstraction layer successfully unifies heterogeneous tokenizers, vision encoders, fusion mechanisms, and objectives without introducing measurable implementation artifacts, TorchUMM could become a valuable community resource that reduces redundant engineering effort and promotes standardized benchmarking in multimodal AI.

major comments (1)
  1. [Abstract] Abstract: the central claim that the unified interface and standardized protocols 'enable fair and reproducible comparisons across heterogeneous models' is load-bearing yet unsupported by any empirical evidence in the manuscript, such as side-by-side re-evaluations of models in TorchUMM versus their original codebases or ablations quantifying performance shifts attributable to the abstraction layer.
minor comments (2)
  1. The manuscript would benefit from an explicit table or section enumerating all supported UMM backbones together with the precise integration points (e.g., tokenizer wrappers, vision-encoder adapters) used to achieve unification.
  2. Clarify whether post-training routines are implemented uniformly or require model-specific overrides, and document any such overrides to allow readers to assess potential bias.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript describing TorchUMM. We address the single major comment below and outline the planned revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the unified interface and standardized protocols 'enable fair and reproducible comparisons across heterogeneous models' is load-bearing yet unsupported by any empirical evidence in the manuscript, such as side-by-side re-evaluations of models in TorchUMM versus their original codebases or ablations quantifying performance shifts attributable to the abstraction layer.

    Authors: We acknowledge that the manuscript currently lacks direct empirical validation of this claim, such as side-by-side performance comparisons between TorchUMM and original model implementations or ablations isolating effects of the abstraction layer. The paper emphasizes the design of the unified interface to support heterogeneous tokenizers, encoders, fusion mechanisms, and objectives while aiming to avoid implementation artifacts, but does not quantify this through new experiments. To address the concern, we will add a dedicated subsection in the revised manuscript (likely in Section 4 or an appendix) presenting side-by-side evaluations on a representative subset of models and tasks. These will compare results obtained via TorchUMM against those from the original codebases or reported numbers, along with ablations measuring any performance shifts due to the standardization layer. This will provide the requested evidence for fair and reproducible comparisons. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering codebase presentation with no derivation chain

full rationale

The manuscript introduces TorchUMM as a new unified codebase and standardized evaluation protocols for multimodal models. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided text. The central claim is the existence and utility of the software artifact itself, which is externally verifiable via the released code and independent benchmarks rather than reducing to any self-referential input, self-citation chain, or ansatz. All enumerated circularity patterns are absent; the work is self-contained as an engineering contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software-tool paper whose central claim rests on the existence and correctness of the released codebase rather than on mathematical axioms or fitted parameters. No free parameters, domain axioms, or invented entities are invoked in the abstract.

pith-pipeline@v0.9.0 · 5745 in / 987 out tokens · 38039 ms · 2026-05-21T08:55:01.344111+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. UniPath: Adaptive Coordination of Understanding and Generation for Unified Multimodal Reasoning

    cs.MM 2026-05 unverdicted novelty 7.0

    UniPath adaptively models coordination-path diversity in unified multimodal models by training a path-conditioned executor and using a lightweight planner for input-dependent selection, improving performance over fixe...

  2. LatentUMM: Dual Latent Alignment for Unified Multimodal Models

    cs.CV 2026-05 unverdicted novelty 6.0

    LatentUMM proposes dual latent alignment at modality and capacity levels plus latent dynamics stabilization to reduce semantic drift and improve consistency in unified multimodal models.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · cited by 2 Pith papers · 23 internal anchors

  1. [1]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,

  2. [2]

    Qwen3-VL Technical Report

    Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631,

  3. [3]

    BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

    Jiuhai Chen, Zhiyang Xu, Xichen Pan, Yushi Hu, Can Qin, Tom Goldstein, Lifu Huang, Tianyi Zhou, Saining Xie, Silvio Savarese, et al. Blip3-o: A family of fully open unified multimodal models-architecture, training and dataset.arXiv preprint arXiv:2505.09568, 2025a. Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi ...

  4. [4]

    Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

    Xiaokang Chen, Zhiyu Wu, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, and Chong Ruan. Janus-pro: Unified multimodal understanding and generation with data and model scaling.arXiv preprint arXiv:2501.17811, 2025b. OpenCompass Contributors. Opencompass: a universal evaluation platform for foundation models (2023). URL https://github. com/open...

  5. [5]

    Yufeng Cui, Honghao Chen, Haoge Deng, Xu Huang, Xinghang Li, Jirong Liu, Yang Liu, Zhuoyan Luo, Jinsheng Wang, Wenxuan Wang, et al. Emu3. 5: Native multimodal models are world learners.arXiv preprint arXiv:2510.26583,

  6. [6]

    Emerging Properties in Unified Multimodal Pretraining

    Chaorui Deng, Deyao Zhu, Kunchang Li, Chenhui Gou, Feng Li, Zeyu Wang, Shu Zhong, Weihao Yu, Xiaonan Nie, Ziang Song, et al. Emerging properties in unified multimodal pretraining.arXiv preprint arXiv:2505.14683,

  7. [7]

    From multimodal llm to human-level ai: Modality, instruction, reasoning, efficiency and beyond

    Hao Fei, Yuan Yao, Zhuosheng Zhang, Fuxiao Liu, Ao Zhang, and Tat-Seng Chua. From multimodal llm to human-level ai: Modality, instruction, reasoning, efficiency and beyond. InProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024): Tutorial Summaries, pages 1–8,

  8. [8]

    Style outweighs substance: Failure modes of llm judges in alignment benchmarking.arXiv preprint arXiv:2409.15268,

    Benjamin Feuer, Micah Goldblum, Teresa Datta, Sanjana Nambiar, Raz Besaleli, Samuel Dooley, Max Cembalest, and John P Dickerson. Style outweighs substance: Failure modes of llm judges in alignment benchmarking.arXiv preprint arXiv:2409.15268,

  9. [9]

    MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

    15 Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, et al. Mme: A comprehensive evaluation benchmark for multimodal large language models.arXiv preprint arXiv:2306.13394,

  10. [10]

    TokenFlow: Consistent Diffusion Features for Consistent Video Editing

    Michal Geyer, Omer Bar-Tal, Shai Bagon, and Tali Dekel. Tokenflow: Consistent diffusion features for consistent video editing.arXiv preprint arXiv:2307.10373,

  11. [11]

    Unicorn: Towards self-improving unified multimodal models through self- generated supervision.arXiv preprint arXiv:2601.03193,

    Ruiyan Han, Zhen Fang, XinYu Sun, Yuchen Ma, Ziheng Wang, Yu Zeng, Zehui Chen, Lin Chen, Wenxuan Huang, Wei-Jie Xu, et al. Unicorn: Towards self-improving unified multimodal models through self- generated supervision.arXiv preprint arXiv:2601.03193,

  12. [12]

    ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

    Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, and Gang Yu. Ella: Equip diffusion models with llm for enhanced semantic alignment.arXiv preprint arXiv:2403.05135,

  13. [13]

    Interleaving reasoning for better text-to-image generation

    Wenxuan Huang, Shuang Chen, Zheyong Xie, Shaosheng Cao, Shixiang Tang, Yufan Shen, Qingyu Yin, Wenbo Hu, Xiaoman Wang, Yuntian Tang, et al. Interleaving reasoning for better text-to-image generation. arXiv preprint arXiv:2509.06945,

  14. [14]

    Ueval: A benchmark for unified multimodal generation.arXiv preprint arXiv:2601.22155,

    Bo Li, Yida Yin, Wenhao Chai, Xingyu Fu, and Zhuang Liu. Ueval: A benchmark for unified multimodal generation.arXiv preprint arXiv:2601.22155,

  15. [15]

    Step1X-Edit: A Practical Framework for General Image Editing

    Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, Jiaqi Liao, Yingming Wang, Honghao Fu, Chunrui Han, et al. Step1x-edit: A practical framework for general image editing.arXiv preprint arXiv:2504.17761,

  16. [16]

    MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

    Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, and Jianfeng Gao. Mathvista: Evaluating mathematical reasoning of foundation models in visual contexts.arXiv preprint arXiv:2310.02255,

  17. [17]

    Do instruction-tuned models always perform better than base models? evidence from math and domain-shifted benchmarks.arXiv preprint arXiv:2601.13244,

    16 Prateek Munjal, Clement Christophe, Ronnie Rajan, and Praveenkumar Kanithi. Do instruction-tuned models always perform better than base models? evidence from math and domain-shifted benchmarks.arXiv preprint arXiv:2601.13244,

  18. [18]

    WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation

    Yuwei Niu, Munan Ning, Mengren Zheng, Weiyang Jin, Bin Lin, Peng Jin, Jiaqi Liao, Chaoran Feng, Kunpeng Ning, Bin Zhu, et al. Wise: A world knowledge-informed semantic evaluation for text-to-image generation.arXiv preprint arXiv:2503.07265,

  19. [19]

    Uni-cot: Towards unified chain-of-thought reasoning across text and vision.arXiv preprint arXiv:2508.05606,

    Luozheng Qin, Jia Gong, Yuqing Sun, Tianjiao Li, Mengping Yang, Xiaomeng Yang, Chao Qu, Zhiyu Tan, and Hao Li. Uni-cot: Towards unified chain-of-thought reasoning across text and vision.arXiv preprint arXiv:2508.05606,

  20. [20]

    Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

    Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026a. URL https://qwen.ai/ blog?id=qwen3.5. Qwen Team. Qwen3-vl-embedding-8b, 2026b. URL https://huggingface.co/Qwen/ Qwen3-VL-Embedding-8B. Hugging Face model card. Rui Shao, Wei Li, Lingsen Zhang, Renshan Zhang, Zhiyang Liu, Ran Chen, and Liqiang Nie. Large vlm- based vision-language-action...

  21. [21]

    OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

    DataFlow Team et al. Openworldlib: A unified codebase and definition of advanced world models.arXiv preprint arXiv:2604.04707,

  22. [22]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805,

  23. [23]

    Quantifying the gap between understanding and generation within unified multimodal models.arXiv preprint arXiv:2602.02140, 2026a

    Chenlong Wang, Yuhang Chen, Zhihan Hu, Dongping Chen, Wenhu Chen, Sarah Wiegreffe, and Tianyi Zhou. Quantifying the gap between understanding and generation within unified multimodal models.arXiv preprint arXiv:2602.02140, 2026a. Dianyi Wang, Ruihang Li, Feng Han, Chaofan Ma, Wei Song, Siyuan Wang, Yibin Wang, Yi Xin, Hongjian Liu, Zhixiong Zhang, et al. ...

  24. [24]

    Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

    Chengyue Wu, Xiaokang Chen, Zhiyu Wu, Yiyang Ma, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan, et al. Janus: Decoupling visual encoding for unified multimodal understanding and generation.arXiv preprint arXiv:2410.13848,

  25. [25]

    OmniGen2: Towards Instruction-Aligned Multimodal Generation

    Chenyuan Wu, Pengfei Zheng, Ruiran Yan, Shitao Xiao, Xin Luo, Yueze Wang, Wanli Li, Xiyan Jiang, Yexin Liu, Junjie Zhou, et al. Omnigen2: Exploration to advanced multimodal generation.arXiv preprint arXiv:2506.18871, 2025a. Mingrui Wu, Hang Liu, Jiayi Ji, Xiaoshuai Sun, and Rongrong Ji. Micon-bench: Benchmarking and enhancing multi-image context image gen...

  26. [26]

    Openuni: A simple baseline for unified multimodal understanding and generation.arXiv preprint arXiv:2505.23661, 2025b

    Size Wu, Zhonghua Wu, Zerui Gong, Qingyi Tao, Sheng Jin, Qinyue Li, Wei Li, and Chen Change Loy. Openuni: A simple baseline for unified multimodal understanding and generation.arXiv preprint arXiv:2505.23661, 2025b. Ji Xie, Trevor Darrell, Luke Zettlemoyer, and XuDong Wang. Reconstruction alignment improves unified multimodal models. InICLR,

  27. [27]

    Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

    Jinheng Xie, Weijia Mao, Zechen Bai, David Junhao Zhang, Weihao Wang, Kevin Qinghong Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, and Mike Zheng Shou. Show-o: One single transformer to unify multimodal understanding and generation.arXiv preprint arXiv:2408.12528,

  28. [28]

    Show-o2: Improved Native Unified Multimodal Models

    Jinheng Xie, Zhenheng Yang, and Mike Zheng Shou. Show-o2: Improved native unified multimodal models. arXiv preprint arXiv:2506.15564,

  29. [29]

    MMaDA: Multimodal Large Diffusion Language Models

    Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, and Mengdi Wang. Mmada: Multimodal large diffusion language models.arXiv preprint arXiv:2505.15809,

  30. [30]

    ImgEdit: A Unified Image Editing Dataset and Benchmark

    Yang Ye, Xianyi He, Zongjian Li, Bin Lin, Shenghai Yuan, Zhiyuan Yan, Bohan Hou, and Li Yuan. Imgedit: A unified image editing dataset and benchmark.arXiv preprint arXiv:2505.20275,

  31. [31]

    MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

    Weihao Yu, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, and Lijuan Wang. Mm-vet: Evaluating large multimodal models for integrated capabilities.arXiv preprint arXiv:2308.02490,

  32. [32]

    Lmms-eval: Reality check on the evaluation of large multimodal models

    Kaichen Zhang, Bo Li, Peiyuan Zhang, Fanyi Pu, Joshua Adrian Cahyono, Kairui Hu, Shuai Liu, Yuanhan Zhang, Jingkang Yang, Chunyuan Li, et al. Lmms-eval: Reality check on the evaluation of large multimodal models. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 881–916,

  33. [33]

    Unified multimodal understanding and generation models: Advances, challenges, and opportunities.arXiv preprint arXiv:2505.02567,

    Shanshan Zhao, Xinjie Zhang, Jintao Guo, Jiakui Hu, Lunhao Duan, Minghao Fu, Yong Xien Chng, Guo- Hua Wang, Qing-Guo Chen, Zhao Xu, et al. Unified multimodal understanding and generation models: Advances, challenges, and opportunities.arXiv preprint arXiv:2505.02567,

  34. [34]

    Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

    Kai Zou, Ziqi Huang, Yuhao Dong, Shulin Tian, Dian Zheng, Hongbo Liu, Jingwen He, Bin Liu, Yu Qiao, and Ziwei Liu. Uni-mmmu: A massive multi-discipline multimodal unified benchmark.arXiv preprint arXiv:2510.13759,

  35. [35]

    Table 7: Geneval sub-score. model single_object two_object counting colors position color_attr overall bagel(w/o think) 99.38 94.19 78.75 87.77 51 61.75 78.81 blip3o 98.12 93.18 73.44 86.17 72.75 64.5 81.36 show_o2(7B) 97.81 71.46 48.75 78.46 20 42.75 59.87 show_o2(1.5B) 96.88 64.39 46.88 76.06 16.75 32 55.49 Janus_pro 97.81 86.62 57.5 89.36 76 66.25 78.9...