arxiv: 2604.05522 · v1 · submitted 2026-04-07 · 💻 cs.CL

Cross-Modal Coreference Alignment: Enabling Reliable Information Transfer in Omni-LLMs

Hongcheng Liu , Yuhao Wang , Zhe Chen , Pingjie Wang , Zhiyuan Zhu , Yixuan Hou , Yanfeng Wang , Yu Wang This is my paper

Pith reviewed 2026-05-10 19:05 UTC · model grok-4.3

classification 💻 cs.CL

keywords cross-modal coreferenceomni-llmsmultimodal reasoningin-context learningsft+grporeferent alignmentCrossOmni datasetinformation transfer

0 comments

The pith

Omni-LLMs gain reliable cross-modal referent tracking through induced coreference-aware patterns.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that Omni-LLMs fail at complex omni-modal reasoning because they lack the ability to identify and track the same object or concept when it appears in different modalities. It formalizes this as the cross-modal coreference problem and releases the CrossOmni dataset of nine tasks with explicit reasoning rationales to measure the gap. The authors show that targeted interventions can induce the missing thinking patterns and produce measurable gains on the benchmark that also carry over to collaborative reasoning.

Core claim

We formalize cross-modal coreference as the requirement to localize a referent in a source modality and re-identify it in a target modality. We introduce the CrossOmni dataset comprising nine tasks with human-designed rationales. Experiments across thirteen Omni-LLMs reveal systematic weaknesses that we attribute to the absence of coreference-aware thinking patterns. We address this with a training-free In-Context Learning method and a training-based SFT+GRPO framework that induce such patterns, delivering substantial performance improvements that generalize to collaborative reasoning tasks.

What carries the argument

Cross-modal coreference alignment, the explicit localization and re-identification of shared referents across modalities, carried out through In-Context Learning examples or SFT+GRPO optimization to induce coreference-aware reasoning patterns.

If this is right

Models equipped with the proposed ICL or SFT+GRPO methods achieve higher accuracy on the nine CrossOmni tasks.
The same methods improve performance on collaborative reasoning tasks that require information transfer across modalities.
Systematic evaluation of thirteen existing Omni-LLMs confirms the coreference weakness is widespread rather than model-specific.
Human-designed reasoning rationales in the dataset provide a template for eliciting the desired thinking patterns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Architectures that bake coreference alignment into their attention or memory mechanisms could reduce reliance on post-hoc prompting or fine-tuning.
The same pattern-induction approach may apply to other fine-grained alignment problems such as temporal or spatial correspondence across modalities.
If coreference-aware patterns prove measurable in model activations, they could serve as a diagnostic for multimodal reasoning capacity.

Load-bearing premise

The weaknesses observed in Omni-LLMs are caused by missing coreference-aware thinking patterns that the ICL and SFT+GRPO methods specifically create, rather than by insufficient data diversity or other architectural limits.

What would settle it

A controlled comparison in which models receive equivalent additional training data or examples but without coreference-focused rationales or GRPO objectives, then show no comparable gains on the CrossOmni tasks.

Figures

Figures reproduced from arXiv: 2604.05522 by Hongcheng Liu, Pingjie Wang, Yanfeng Wang, Yixuan Hou, Yuhao Wang, Yu Wang, Zhe Chen, Zhiyuan Zhu.

**Figure 2.** Figure 2: Overview of the CROSSOMNI dataset. The top section illustrates the annotation pipeline, including preprocess, data annotation, modality alignment, and QA generation. The bottom section presents example instances and the distribution of the test set. Type # Train # Test # Total Videos 2,725 1,422 4,147 QA pairs 37,266 2,460 39,726 [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Average performance on fixed-source and fixed-target cross-modal questions. we construct two controlled settings. For fixedsource questions, we select 150 questions from each single-modality coreference task and convert them into cross-modality coreference questions by changing only the target modality while keeping the source information fixed. For fixed-target questions, we follow a similar process but… view at source ↗

**Figure 4.** Figure 4: CoT case study on text→audio task. The green denotes important information, the blue denotes the right choice, and the red denotes the error rationales. The model fails to establish the correct bridge between the textual information (the person who experienced the car accident is a man) and the visual information (the man is sitting on the chair), and therefore cannot answer the question correctly. The ful… view at source ↗

**Figure 5.** Figure 5: Accuracy comparison of 13 models under default and Chain-of-Thought prompting. The performance gap between single-modality and cross-modality coreference shrinks from 24.37% to 11.01% under ICL. These results are consistent with our diagnosis that the primary limitation in cross-modal alignment lies in coreferenceaware thinking patterns, indicating that future work should focus on explicitly learning an… view at source ↗

**Figure 6.** Figure 6: Case study on text [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Case study on visual→audio task via Qwen2.5-Omni-7b after training. The blue option denotes the right choice [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Case study on audio→text task via Qwen2.5-Omni-3b after training. The blue option denotes the right choice [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

read the original abstract

Omni Large Language Models (Omni-LLMs) have demonstrated impressive capabilities in holistic multi-modal perception, yet they consistently falter in complex scenarios requiring synergistic omni-modal reasoning. Beyond understanding global multimodal context, effective reasoning also hinges on fine-grained cross-modal alignment, especially identifying shared referents across modalities, yet this aspect has been largely overlooked. To bridge this gap, we formalize the challenge as a cross-modal coreference problem, where a model must localize a referent in a source modality and re-identify it in a target modality. Building on this paradigm, we introduce CrossOmni, a dataset comprising nine tasks equipped with human-designed reasoning rationales to evaluate and enhance this capability. Experiments on 13 Omni-LLMs reveal systematic weaknesses in cross-modal coreference, which we attribute to the absence of coreference-aware thinking patterns. To address this, we enhance cross-modal alignment via two strategies: a training-free In-Context Learning method and a training-based SFT+GRPO framework designed to induce such thinking patterns. Both approaches yield substantial performance gains and generalize effectively to collaborative reasoning tasks. Overall, our findings highlight cross-modal coreference as a crucial missing piece for advancing robust omni-modal reasoning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper carves out cross-modal coreference as a distinct issue and ships a new dataset with nine tasks, but the claim that failures stem from missing specific thinking patterns and that the fixes induce them rests on weak controls.

read the letter

The paper's main point is that omni-LLMs struggle with fine-grained cross-modal alignment, specifically the need to localize a referent in one modality and re-identify it in another. The authors formalize this as cross-modal coreference, release the CrossOmni dataset covering nine tasks with human rationales, evaluate thirteen models, and test two fixes: a training-free in-context learning method and an SFT+GRPO training framework. Both produce gains that carry over to collaborative reasoning tasks.

Referee Report

3 major / 1 minor

Summary. The paper formalizes cross-modal coreference as a key challenge for Omni-LLMs, introduces the CrossOmni dataset consisting of nine tasks with human-designed reasoning rationales, evaluates 13 Omni-LLMs revealing systematic weaknesses attributed to the absence of coreference-aware thinking patterns, and proposes two methods—a training-free In-Context Learning approach and a training-based SFT+GRPO framework—to induce these patterns, reporting substantial performance gains that generalize to collaborative reasoning tasks.

Significance. If the causal attribution to specific thinking patterns holds and the methods are shown to target them specifically rather than providing general improvements, this could be a significant contribution to multi-modal AI by highlighting and addressing fine-grained cross-modal alignment issues. The new dataset provides a valuable benchmark for future work in omni-modal reasoning.

major comments (3)

[Abstract] Abstract: The attribution of observed weaknesses in cross-modal coreference to the 'absence of coreference-aware thinking patterns' is presented without supporting controls, such as output tracing of referent localization steps, ablations against generic prompting or training methods, or comparisons demonstrating pattern-specific gains versus task-general capability increases.
[Experiments] Experiments: The reported performance gains on 13 Omni-LLMs lack details on baselines, statistical significance testing, error bars, or exact definitions of the nine tasks in CrossOmni, making it difficult to verify the claims and assess the reliability of the improvements.
[Methods] Methods: The SFT+GRPO framework and ICL method are claimed to 'induce such thinking patterns,' but the manuscript provides no analysis (e.g., qualitative examination of generated rationales) or ablation studies to confirm that the gains stem from coreference awareness rather than broader enhancements in model capabilities.

minor comments (1)

[Abstract] Abstract: The abstract mentions 'substantial performance gains' without quantifying them or referencing specific tables/figures for the results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which identifies key areas where our claims on cross-modal coreference can be strengthened with additional evidence and details. We respond point-by-point to the major comments below and describe the revisions we will make to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: The attribution of observed weaknesses in cross-modal coreference to the 'absence of coreference-aware thinking patterns' is presented without supporting controls, such as output tracing of referent localization steps, ablations against generic prompting or training methods, or comparisons demonstrating pattern-specific gains versus task-general capability increases.

Authors: We agree that the causal link would be stronger with explicit controls. The systematic failures across 13 models with different architectures provide correlational support for a shared deficiency in coreference handling, and our methods explicitly incorporate referent localization and re-identification in their rationales and reward design. In revision we will add ablations against generic ICL and SFT baselines plus qualitative output examples tracing improved localization steps. Exhaustive per-step tracing across all instances would require new annotations beyond current resources, so we will provide representative traces instead. revision: partial
Referee: [Experiments] Experiments: The reported performance gains on 13 Omni-LLMs lack details on baselines, statistical significance testing, error bars, or exact definitions of the nine tasks in CrossOmni, making it difficult to verify the claims and assess the reliability of the improvements.

Authors: We will substantially expand the Experiments section. Revisions will include: precise textual definitions and input/output formats for each of the nine CrossOmni tasks with illustrative examples; explicit baseline descriptions (zero-shot, standard CoT, and generic prompting); results of statistical significance tests (paired t-tests with p-values); and error bars showing standard deviation across three independent runs with different seeds. These additions will enable direct verification of the reported gains. revision: yes
Referee: [Methods] Methods: The SFT+GRPO framework and ICL method are claimed to 'induce such thinking patterns,' but the manuscript provides no analysis (e.g., qualitative examination of generated rationales) or ablation studies to confirm that the gains stem from coreference awareness rather than broader enhancements in model capabilities.

Authors: We will add a dedicated analysis subsection. This will contain qualitative comparisons of model-generated rationales before and after our interventions, highlighting the appearance of explicit cross-modal referent localization and re-identification steps. We will also report ablation results that isolate the coreference-specific prompt components and GRPO reward terms against otherwise identical generic training or prompting setups, thereby distinguishing pattern-specific effects from general capability gains. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical claims with no derivations or self-referential reductions

full rationale

The paper is an empirical study that introduces the CrossOmni dataset, evaluates 13 Omni-LLMs on cross-modal coreference tasks, attributes observed failures to missing thinking patterns based on experimental results, and reports performance gains from ICL and SFT+GRPO interventions. No equations, formal derivations, uniqueness theorems, or parameter-fitting steps appear in the provided text. The central attribution and method descriptions are interpretive and data-driven rather than reducing by construction to prior inputs or self-citations. The work is self-contained against external benchmarks (new dataset + model evaluations) with no load-bearing self-referential loops.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claims rest on the assumption that coreference-aware thinking patterns are the missing ingredient and that the proposed methods induce them; no explicit free parameters, new physical entities, or ad-hoc axioms are named.

axioms (1)

domain assumption Weaknesses in Omni-LLMs on cross-modal tasks stem from absence of coreference-aware thinking patterns
Invoked when attributing poor performance on the new dataset to missing internal patterns rather than other causes.

pith-pipeline@v0.9.0 · 5538 in / 1385 out tokens · 52729 ms · 2026-05-10T19:05:50.287406+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we formalize the challenge as a cross-modal coreference problem... introduce two strategies: a training-free In-Context Learning method and a training-based SFT+GRPO framework designed to induce such thinking patterns
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Experiments on 13 Omni-LLMs reveal systematic weaknesses in cross-modal coreference

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

[1]

M2-omni: Advancing omni-mllm for com- prehensive modality support with competitive performance.arXiv preprint arXiv:2502.18778, 2025

M2-omni: Advancing omni-mllm for com- prehensive modality support with competitive perfor- mance.arXiv preprint arXiv:2502.18778. Jack Hong, Shilin Yan, Jiayin Cai, Xiaolong Jiang, Yao Hu, and Weidi Xie. 2025. Worldsense: Evaluating real-world omnimodal understanding for multimodal llms.arXiv preprint arXiv:2502.04326. Inclusion, Bowen Ma, Cheng Zou, Canx...

work page arXiv 2025
[2]

Ming-flash-omni: A sparse, unified architec- ture for multimodal perception and generation.arXiv preprint arXiv:2510.24821. Xu Jin, Zhifang Guo, Hangrui Hu, Yunfei Chu, Xiong Wang, Jinzheng He, Yuxuan Wang, Xian Shi, Ting He, Xinfa Zhu, Yuanjun Lv, Yongqi Wang, Dake Guo, He Wang, Linhan Ma, Pei Zhang, Xinyu Zhang, Hongkun Hao, Zishan Guo, and 19 oth- ers....

work page arXiv 2025
[3]

Hongcheng Liu, Pingjie Wang, Yu Wang, and Yanfeng Wang

Anchornet: Adaptive anchor token enhance- ment in video-grounded dialogue generation.IEEE Journal of Selected Topics in Signal Processing, pages 1–11. Hongcheng Liu, Pingjie Wang, Yu Wang, and Yanfeng Wang. 2024b. M2k-vdg: Model-adaptive multimodal knowledge anchor enhanced video-grounded dia- logue generation.arXiv preprint arXiv:2402.11875. Hongcheng Li...

work page arXiv 2024
[4]

Echoink-r1: Exploring audio-visual reasoning in multimodal llms via reinforcement learning.arXiv preprint arXiv:2505.04623,

Mmlu-pro: A more robust and challenging multi-task language understanding benchmark.Ad- vances in Neural Information Processing Systems, 37:95266–95290. Zhenghao Xing, Xiaowei Hu, Chi-Wing Fu, Wen- hai Wang, Jifeng Dai, and Pheng-Ann Heng. 2025. Echoink-r1: Exploring audio-visual reasoning in multimodal llms via reinforcement learning.arXiv preprint arXiv...

work page arXiv 2025
[5]

Omnivinci: Enhancing architecture and data for omni-modal understanding llm.arXiv preprint arXiv:2510.15870, 2025

Omnivinci: Enhancing architecture and data for omni-modal understanding llm.arXiv preprint arXiv:2510.15870. LI Yizhi, Ge Zhang, Yinghao Ma, Ruibin Yuan, King Zhu, Hangyu Guo, Yiming Liang, Jiaheng Liu, Noah Wang, Jian Yang, and 1 others. 2025. Omnibench: Towards the future of universal omni-language mod- els. Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zh...

work page arXiv 2025
[6]

In paral- lel, many works have been proposed to probe uni- modal reasoning capabilities (Yizhi et al., 2025; Xing et al., 2025)

for image understanding and MMLU (Wang et al., 2024) for text comprehension. In paral- lel, many works have been proposed to probe uni- modal reasoning capabilities (Yizhi et al., 2025; Xing et al., 2025). For example, DailyOmni (Zhou et al., 2025) targets audio and visual reasoning in everyday scenarios, and WorldSense (Hong et al.,

work page 2024
[7]

focuses on assessing collaborative under- standing and reasoning over omni-modal inputs. While these datasets are effective for measuring overall unimodal or holistic omni-modal perfor- mance, they typically do not isolate the key step of cross-modality coreference alignment, which lim- its their ability to diagnose why unimodal success does not translate...

work page 2025
[8]

Ola- 7B in our table), indicating that data/recipe matters as much as parameter count

Model size helps but is not sufficient: We observe that a smaller model can outperform a larger one (e.g., Qwen2.5-Omni-3B vs. Ola- 7B in our table), indicating that data/recipe matters as much as parameter count

work page
[9]

Architecture, especially the audio encoder, is critical: models using stronger Whisper variants tend to be markedly better, consis- tent with MiniCPM-o building on Whisper- medium while Qwen2.5-Omni is designed with Whisper-large-v3

work page
[10]

thinking

Training paradigm matters: “thinking”- enhanced omni variants (e.g., Qwen3-Omni- Thinking) show stronger reasoning and can improve cross-modal alignment

work page
[11]

think- ing

Data scaling is essential: expanding alignment-focused data consistently improves both SFT and GRPO in our experiments, highlighting the importance of sufficient, well-curated training signals. Overall, for cross-modal alignment, strong results typically require adequate model capacity, a high- quality audio front-end, explicit reasoning training, and suf...

work page 1985
[12]

b.Track any changes in their appearance or actions across images

People: a.Describe the individuals in each image (clothing, appearance, expressions, body language). b.Track any changes in their appearance or actions across images. 2.Background and Environment: a.Include detailed descriptions of the setting (e.g., buildings, nature, objects). b.Provide spatial context (left, right, foreground, background). c.Mention an...

work page
[13]

**Overall Video Description**: This description provides the general background, theme, and main content of the video

work page
[14]

**Segment Descriptions**: These provide detailed descriptions of each segment of the video, which may contain specific events or scenes. Please generate the final, complete description as follows: - Integrate the overall description with the individual segment descriptions, ensuring the final output covers all aspects of the video. - If there are any inco...

work page
[15]

**Place of birth and upbringing**: Where they were born and raised, family environment, education background, etc

work page
[16]

**Significant life events**: Major life events or turning points that shaped their life

work page
[17]

**Career and achievements**: Their career path, important achievements, and contributions

work page
[18]

**Relationships**: Key relationships with others, such as family, friends, enemies, or partners

work page
[19]

Ensure the biography covers all provided background details, and the name used in the biography must match the one provided

**Personality and psychological development**: Their personality traits and any psychological or emotional growth. Ensure the biography covers all provided background details, and the name used in the biography must match the one provided. The format is: Name + Biography. <background information>: Table 13: Prompt for modality annotation. Prompt Audio Dou...

work page
[20]

b.Track any changes in their appearance or actions across images

People: a.Describe the individuals in each image (clothing, appearance, expressions, body language). b.Track any changes in their appearance or actions across images. 2.Background and Environment: a.Include detailed descriptions of the setting (e.g., buildings, nature, objects). b.Provide spatial context (left, right, foreground, background). c.Mention an...

work page
[21]

child girl); mutually exclusive key actions/events (e.g., driving vs

People conflicts: significantly different number of main people; clearly contradictory identity/type (e.g., adult man vs. child girl); mutually exclusive key actions/events (e.g., driving vs. cooking at a desk)

work page
[22]

highway); strongly contradictory environment (e.g., heavy rain outdoors vs

Background conflicts: clearly different main scene type (e.g., kitchen vs. highway); strongly contradictory environment (e.g., heavy rain outdoors vs. quiet indoor office); incompatible main objects/interactions

work page
[23]

Allowed Differences (do NOT count as fundamental conflicts):

Narrative conflicts: overall story/sequence cannot reasonably align as the same video; descriptions refer to different scenarios/storylines. Allowed Differences (do NOT count as fundamental conflicts):

work page
[24]

Different level of detail (one richer, one concise)

work page
[25]

Minor omissions or reordering without contradicting main events

work page
[26]

Person Double Check Analyzes the consistency of descriptions for each name and outputs whether they are correct or not

Small secondary-detail differences (e.g., unmentioned background object, ambiguous colors/positions) when core people/setting/actions match. Person Double Check Analyzes the consistency of descriptions for each name and outputs whether they are correct or not. Rules:

work page
[27]

If the same name has multiple completely different descriptions, choose the description that appears most frequently as the final description

work page
[28]

If any description contains terms like ’failed’, ’cannot describe’, or anything that indicates the description is not valid, directly output <Failed>

work page
[29]

If different names have the same description (roughly similar), directly output <Failed>

work page
[30]

The format is: name + description

If there no failed casess, output the name and its description. The format is: name + description. Table 14: Prompt for modality annotation double check. Prompt QA Generation Design a question where the model infers **factual description** from **visual actions**, using the following information: - **Overall Descriptions** - **person biography** - **objec...

work page
[31]

**Visual Actions**: - These include body language, gestures, and facial expressions

work page
[32]

<failure>

**Factual description**: - These refer to stated details such as names, dates, locations, occupations, and major life events (e.g., education, career milestones, relationships). Design a question to infer **factual description** from **visual actions**. If you cannot design an appropriate question, directly output “<failure>”. QA Double Check: Unique Task...

work page
[33]

2) You MUST act as if all background/identity information about people comes from the person biography

You MUST act as if all visual information (who/what appears, appearance, position, actions, environment, etc.) comes directly from watching the video. 2) You MUST act as if all background/identity information about people comes from the person biography

work page
[34]

You MUST act as if all background/identity information about objects comes from the object biography

work page
[35]

Do not mention or allude to them

The Visual description and Person description are only internal tools to help you reconstruct what WOULD HA VE BEEN seen in the video and how it connects to the biographies. Do not mention or allude to them

work page
[36]

Reasoning Details Rules:

In your reasoning, always frame information as observations from: - watching the video, and - reading the relevant biography (person or object). Reasoning Details Rules:

work page
[37]

First, determine whether the Question is about a person or an object, and identify which specific person or object it refers to, as if you are using only the video content (internally you may rely on the Visual description, but never mention it)

work page
[38]

Second, if the Question is about a person, infer their name and identity as if you know it from the person’s biography (internally using the Person description + person biography); if it is about an object, infer its identity/type as if you recognize it from the video and the object biography

work page
[39]

Third, match this person to the person biography, or this object to the object biography, and explain how their background or properties are relevant to the Question

work page
[40]

Make the reasoning explicit and multi-step, not just one short sentence

Then, using the relevant biography information (person or object) together with what is seen in the video, logically derive and justify the correct Answer step by step. Make the reasoning explicit and multi-step, not just one short sentence

work page
[41]

Your final conclusion MUST exactly match the hidden Answer above, but you MUST NOT reveal that this Answer was given to you

work page
[42]

A”, “B”, “C

The Reasoning Details must be entirely in English. Output format: <Reasoning Details>: Your step-by-step reasoning here <Final Answer>: State the final answer here, matching the hidden Answer Table 16: Prompt for CoT generation. We use the visual→text questions as a case study. Prompt In-Context Learning Answer the question based on the video and the text...

work page
[43]

Question-guided visual localization (1 point): The reasoning interprets the question and uses it to identify and localize the relevant person/object in the video, including describing its visual appearance, rather than directly searching the text

work page
[44]

Text retrieval via visual description (1 point): Using this visual description, the reasoning explicitly finds the corresponding Additional Information about this person/object in the text (e.g., matching name or described appearance), clearly linking video appearance to a specific text entry

work page
[45]

Scoring rule: - Award 1 point for each criterion that is clearly satisfied

Answer grounded in retrieved text (1 point): The reasoning then uses the retrieved textual Additional Information as the key evidence to select the final option A/B/C/D, with the answer clearly supported by that text and without major unsupported assumptions or contradictions. Scoring rule: - Award 1 point for each criterion that is clearly satisfied. - T...

work page