hub

Think-While-Generating: On-the-Fly Reasoning for Personalized Long-Form Generation.arXiv preprint arXiv:2512.06690.2025

Chengbing Wang, Yang Zhang, Wenjie Wang, Xiaoyan Zhao, Fuli Feng, Xiangnan He, Tat-Seng Chua · 2025 · arXiv 2512.06690

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment

cs.CL · 2026-03-09 · unverdicted · novelty 8.0

AlpsBench supplies 2500 real-dialogue sequences with verified memories to benchmark LLM extraction, updating, retrieval, and utilization of personalized information.

Do LLMs Capture Embodied Cognition and Cultural Variation? Cross-Linguistic Evidence from Demonstratives

cs.CL · 2026-04-28 · unverdicted · novelty 7.0

LLMs fail to capture embodied cognition and cultural variation in demonstrative use, unlike humans who show language-specific proximal-distal and perspective-taking patterns.

Breaking User-Centric Agency: A Tri-Party Framework for Agent-Based Recommendation

cs.IR · 2026-03-11 · unverdicted · novelty 7.0

TriRec is a two-stage LLM-agent recommender that uses item self-promotion followed by platform-level sequential re-ranking to jointly optimize user utility, item exposure, and exposure fairness.

Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation

cs.IR · 2026-04-30 · unverdicted · novelty 6.0

PAD-Rec augments standard draft models with item-position and step-position embeddings plus learnable gates, delivering up to 3.1x wall-clock speedup and 5% average gain over strong speculative-decoding baselines on four datasets while largely preserving recommendation quality.

Learning to Control Summaries with Score Ranking

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

A score-ranking loss enables controllable summarization by aligning outputs to evaluation scores, matching SOTA performance with dimension-specific control on LLaMA, Qwen, and Mistral.

Agent-GWO: Collaborative Agents for Dynamic Prompt Optimization in Large Language Models

cs.NE · 2026-04-14 · unverdicted · novelty 6.0

Agent-GWO uses collaborative grey-wolf-inspired agents to jointly optimize LLM prompts and decoding settings, yielding higher accuracy and stability than prior single-agent prompt optimization methods on math and hybrid reasoning benchmarks.

FACT-E: Causality-Inspired Evaluation for Trustworthy Chain-of-Thought Reasoning

cs.AI · 2026-04-12 · unverdicted · novelty 6.0

FACT-E uses controlled perturbations as an instrumental signal to measure intra-chain faithfulness in CoT reasoning and combines it with answer consistency to select trustworthy trajectories.

cs.CL · 2026-04-25 · unverdicted · novelty 5.0

A hybrid graph-based training-free framework for LLM context compression matches strong baselines and shows larger gains on long-document benchmarks.

CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning

cs.AI · 2026-04-25 · unverdicted · novelty 5.0

CAP-CoT uses iterative adversarial prompt cycles to improve CoT accuracy, stability, and robustness across six benchmarks and four LLM backbones.

Calibrating Model-Based Evaluation Metrics for Summarization

cs.CL · 2026-04-19 · unverdicted · novelty 5.0

A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.

AGSC: Adaptive Granularity and Semantic Clustering for Uncertainty Quantification in Long-text Generation

cs.CL · 2026-04-08 · unverdicted · novelty 5.0

AGSC combines NLI neutral probabilities for adaptive granularity with GMM semantic clustering to improve uncertainty quantification in long-text LLM generation, claiming SOTA factuality correlation and 60% faster inference.

Medical Reasoning with Large Language Models: A Survey and MR-Bench

cs.CL · 2026-03-17 · accept · novelty 5.0

LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.

Small Language Model Helps Resolve Semantic Ambiguity of LLM Prompt

cs.CL · 2026-04-25 · unverdicted · novelty 4.0

A small language model resolves semantic risks and conflicts in prompts via multi-perspective consistency checks, yielding a 2.5-point gain in LLM reasoning performance at $0.02 cost.

citing papers explorer

Showing 13 of 13 citing papers.

AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment cs.CL · 2026-03-09 · unverdicted · none · ref 44
AlpsBench supplies 2500 real-dialogue sequences with verified memories to benchmark LLM extraction, updating, retrieval, and utilization of personalized information.
Do LLMs Capture Embodied Cognition and Cultural Variation? Cross-Linguistic Evidence from Demonstratives cs.CL · 2026-04-28 · unverdicted · none · ref 1
LLMs fail to capture embodied cognition and cultural variation in demonstrative use, unlike humans who show language-specific proximal-distal and perspective-taking patterns.
Breaking User-Centric Agency: A Tri-Party Framework for Agent-Based Recommendation cs.IR · 2026-03-11 · unverdicted · none · ref 24
TriRec is a two-stage LLM-agent recommender that uses item self-promotion followed by platform-level sequential re-ranking to jointly optimize user utility, item exposure, and exposure fairness.
Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation cs.IR · 2026-04-30 · unverdicted · none · ref 49
PAD-Rec augments standard draft models with item-position and step-position embeddings plus learnable gates, delivering up to 3.1x wall-clock speedup and 5% average gain over strong speculative-decoding baselines on four datasets while largely preserving recommendation quality.
Learning to Control Summaries with Score Ranking cs.CL · 2026-04-19 · unverdicted · none · ref 58
A score-ranking loss enables controllable summarization by aligning outputs to evaluation scores, matching SOTA performance with dimension-specific control on LLaMA, Qwen, and Mistral.
Agent-GWO: Collaborative Agents for Dynamic Prompt Optimization in Large Language Models cs.NE · 2026-04-14 · unverdicted · none · ref 8
Agent-GWO uses collaborative grey-wolf-inspired agents to jointly optimize LLM prompts and decoding settings, yielding higher accuracy and stability than prior single-agent prompt optimization methods on math and hybrid reasoning benchmarks.
FACT-E: Causality-Inspired Evaluation for Trustworthy Chain-of-Thought Reasoning cs.AI · 2026-04-12 · unverdicted · none · ref 34
FACT-E uses controlled perturbations as an instrumental signal to measure intra-chain faithfulness in CoT reasoning and combines it with answer consistency to select trustworthy trajectories.
From Similarity to Structure: Training-free LLM Context Compression with Hybrid Graph Priors cs.CL · 2026-04-25 · unverdicted · none · ref 15
A hybrid graph-based training-free framework for LLM context compression matches strong baselines and shows larger gains on long-document benchmarks.
CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning cs.AI · 2026-04-25 · unverdicted · none · ref 34
CAP-CoT uses iterative adversarial prompt cycles to improve CoT accuracy, stability, and robustness across six benchmarks and four LLM backbones.
Calibrating Model-Based Evaluation Metrics for Summarization cs.CL · 2026-04-19 · unverdicted · none · ref 177
A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.
AGSC: Adaptive Granularity and Semantic Clustering for Uncertainty Quantification in Long-text Generation cs.CL · 2026-04-08 · unverdicted · none · ref 5
AGSC combines NLI neutral probabilities for adaptive granularity with GMM semantic clustering to improve uncertainty quantification in long-text LLM generation, claiming SOTA factuality correlation and 60% faster inference.
Medical Reasoning with Large Language Models: A Survey and MR-Bench cs.CL · 2026-03-17 · accept · none · ref 30
LLMs show strong exam performance on medical tasks but exhibit a clear gap in accuracy on authentic clinical decision-making as measured by the new MR-Bench benchmark and unified evaluations.
Small Language Model Helps Resolve Semantic Ambiguity of LLM Prompt cs.CL · 2026-04-25 · unverdicted · none · ref 3
A small language model resolves semantic risks and conflicts in prompts via multi-perspective consistency checks, yielding a 2.5-point gain in LLM reasoning performance at $0.02 cost.

Think-While-Generating: On-the-Fly Reasoning for Personalized Long-Form Generation.arXiv preprint arXiv:2512.06690.2025

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer