MulDimIF: A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models

· 2025 · cs.CL · arXiv 2505.07591

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

Instruction following refers to the ability of large language models (LLMs) to generate outputs that satisfy all specified constraints. Existing research has primarily focused on constraint categories, offering limited evaluation dimensions and little guidance for improving instruction-following abilities. To address this gap, we introduce MulDimIF, a multi-dimensional constraint framework encompassing three constraint patterns, four constraint categories, and four difficulty levels. Based on this framework, we design a controllable instruction generation pipeline. Through constraint expansion, conflict detection, and instruction rewriting, we construct 9,106 code-verifiable samples. We evaluate 18 LLMs from six model families and find marked performance differences across constraint settings. For instance, average accuracy decreases from 80.82% at Level I to 36.76% at Level IV. Moreover, training with data generated by our framework significantly improves instruction following without compromising general performance. In-depth analysis indicates that these gains stem largely from parameter updates in attention modules, which strengthen constraint recognition and adherence. Code and data are available in https://github.com/Junjie-Ye/MulDimIF.

citation-role summary

background 1

citation-polarity summary

unclear 1

representative citing papers

Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks

cs.CL · 2026-04-03 · unverdicted · novelty 6.0

RTT bridges response-level rubrics to token-level rewards via a relevance discriminator and intra-sample group normalization, yielding higher instruction and rubric accuracy than baselines.

Auditing Data Membership in Reinforcement Learning With Verifiable Rewards

cs.CR · 2025-11-18 · unverdicted · novelty 6.0

DIBA detects membership of prompts in RLVR training by measuring reward success changes and policy behavioral drift between pre- and post-RLVR model checkpoints.

Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments

cs.CL · 2025-08-12 · unverdicted · novelty 5.0

An automated environment construction pipeline plus verifiable rewards enables RL training that improves LLM tool-use performance across scales without harming general capabilities.

MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

cs.CL · 2026-02-13 · unverdicted · novelty 4.0

MedXIAOHE is a medical MLLM that claims state-of-the-art benchmark performance through specialized pretraining to cover long-tail diseases and RL-based reasoning training.

citing papers explorer

Showing 4 of 4 citing papers.

Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks cs.CL · 2026-04-03 · unverdicted · none · ref 25 · internal anchor
RTT bridges response-level rubrics to token-level rewards via a relevance discriminator and intra-sample group normalization, yielding higher instruction and rubric accuracy than baselines.
Auditing Data Membership in Reinforcement Learning With Verifiable Rewards cs.CR · 2025-11-18 · unverdicted · none · ref 42 · internal anchor
DIBA detects membership of prompts in RLVR training by measuring reward success changes and policy behavioral drift between pre- and post-RLVR model checkpoints.
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments cs.CL · 2025-08-12 · unverdicted · none · ref 55 · internal anchor
An automated environment construction pipeline plus verifiable rewards enables RL training that improves LLM tool-use performance across scales without harming general capabilities.
MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs cs.CL · 2026-02-13 · unverdicted · none · ref 68 · internal anchor
MedXIAOHE is a medical MLLM that claims state-of-the-art benchmark performance through specialized pretraining to cover long-tail diseases and RL-based reasoning training.

MulDimIF: A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer