hub

Text summarization branches out , pages=

Rouge: A package for automatic evaluation of summaries , author=

34 Pith papers cite this work. Polarity classification is still indexing.

34 Pith papers citing it

browse 34 citing papers

hub tools

JSON dossier citing papers JSON

representative citing papers

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

cs.CL · 2023-08-28 · unverdicted · novelty 8.0

LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).

Erase Persona, Forget Lore: Benchmarking Multimodal Copyright Unlearning in Large Vision Language Models

cs.CV · 2026-05-05 · unverdicted · novelty 7.0

CoVUBench is the first benchmark framework for evaluating multimodal copyright unlearning in LVLMs via synthetic data, systematic variations, and a dual protocol for forgetting efficacy and utility preservation.

Where Do Prompt Perturbations Break Generation? A Segment-Level View of Robustness in LoRA-Tuned Language Models

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

S²R² improves robustness of LoRA-tuned LLMs to prompt perturbations by penalizing semantic-segment drift while preserving clean performance and cross-dataset transfer.

MTA: Multi-Granular Trajectory Alignment for Large Language Model Distillation

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

MTA improves LLM knowledge distillation by aligning representations along layer-wise trajectories with adaptive granularity from words to phrases using dynamic structural and hidden representation alignment losses.

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

cs.CR · 2026-04-21 · unverdicted · novelty 7.0

ProjLens shows that backdoor parameters in MLLMs are encoded in low-rank subspaces of the projector and that embeddings shift toward the target direction with magnitude linear in input norm, activating only on poisoned samples.

Dual-Anchoring: Addressing State Drift in Vision-Language Navigation

cs.CV · 2026-04-19 · unverdicted · novelty 7.0

Dual-Anchoring Framework mitigates progress drift via structured instruction tokens and memory drift via landmark-centric retrospective prediction, yielding 15.2% success rate gain and 24.7% on long trajectories.

Detecting Pretraining Data from Large Language Models

cs.CL · 2023-10-25 · conditional · novelty 7.0

Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

cs.CL · 2021-04-18 · conditional · novelty 7.0

Presents the NATURAL INSTRUCTIONS meta-dataset and shows generative pre-trained language models achieve 19% better generalization to unseen tasks when using task instructions.

Assisted Counterspeech Writing at the Crossroads of Hate Speech and Misinformation

cs.CL · 2026-05-21 · conditional · novelty 6.0

LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.

Evaluating Multi-turn Human-AI Interaction

cs.HC · 2026-05-18 · unverdicted · novelty 6.0

Introduces the TCR framework to evaluate educational LLM assistants on transparency, consistency, and refinement in multi-turn interactions, complementing aggregate metrics.

Learning Faster with Better Tokens: Parameter-Efficient Vocabulary Adaptation for Specialized Text Summarization

cs.CL · 2026-05-17 · unverdicted · novelty 6.0

Vocabulary adaptation via targeted token addition and replacement improves semantic similarity, domain word usage, and training efficiency for LLM summarization in legal and medical domains.

From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents

cs.CL · 2026-05-14 · unverdicted · novelty 6.0

A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.

Learning Perturbations to Extrapolate Your LLM

stat.ML · 2026-05-13 · unverdicted · novelty 6.0

A learnable continuous perturbation framework for LLM token prefixes via latent vector transformations, optimized through unbiased estimating equations, yields gains in out-of-domain performance.

Perturbation is All You Need for Extrapolating Language Models

stat.ML · 2026-05-05 · unverdicted · novelty 6.0

Perturbing prefixes to semantic neighbors during training creates a hierarchical noise model that improves language model predictions on token sequences outside the training corpus support.

Before Forgetting, Learn to Remember: Revisiting Foundational Learning Failures in LVLM Unlearning Benchmarks

cs.CV · 2026-05-05 · unverdicted · novelty 6.0

LVLM unlearning benchmarks fail due to initial memorization failures on fictitious data; ReMem benchmark with multi-hop and multi-image scaling plus Exposure metric enables reliable learning and unlearning diagnosis.

Learn-to-learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM

cs.CL · 2026-05-03 · unverdicted · novelty 6.0

A hypernetwork generates meta-gating parameters for SwiGLU blocks to let LLMs adapt their nonlinearity to arbitrary textual conditions, outperforming finetuning and meta-learning baselines with reasonable generalization to unseen cases.

Where and What: Reasoning Dynamic and Implicit Preferences in Situated Conversational Recommendation

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

SiPeR improves recommendation accuracy and response quality in situated conversations by estimating scene transitions and performing Bayesian inverse inference with multimodal LLMs.

Whose Story Gets Told? Positionality and Bias in LLM Summaries of Life Narratives

cs.CL · 2026-04-22 · unverdicted · novelty 6.0

A proposed pipeline shows LLMs introduce detectable race and gender biases when summarizing life narratives, creating potential for representational harm in research.

Bangla Key2Text: Text Generation from Keywords for a Low Resource Language

cs.CL · 2026-04-21 · conditional · novelty 6.0

Bangla Key2Text releases 2.6M keyword-text pairs and demonstrates that fine-tuned mT5 and BanglaT5 outperform zero-shot LLMs on keyword-conditioned Bangla text generation.

Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

Unsupervised single-generation confidence calibration for reasoning LLMs via offline self-consistency proxy distillation outperforms baselines on math and QA tasks and improves selective prediction.

Are Large Language Models Economically Viable for Industry Deployment?

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

Small LLMs under 2B parameters achieve better economic break-even, energy efficiency, and hardware density than larger models on legacy GPUs for industrial tasks.

Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

LLMs show mixed results on authorship verification, post generation, and attribute inference from Twitter data, with new frameworks and user studies establishing benchmarks for these analytics tasks.

MFMDQwen: Multilingual Financial Misinformation Detection Based on Large Language Model

cs.CE · 2026-04-20 · unverdicted · novelty 6.0

MFMDQwen is the first open-source LLM for multilingual financial misinformation detection, backed by a new instruction dataset and benchmark on which it outperforms other open-source models.

Learning to Control Summaries with Score Ranking

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

A score-ranking loss enables controllable summarization by aligning outputs to evaluation scores, matching SOTA performance with dimension-specific control on LLaMA, Qwen, and Mistral.

citing papers explorer

Showing 34 of 34 citing papers.

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding cs.CL · 2023-08-28 · unverdicted · none · ref 51
LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).
Erase Persona, Forget Lore: Benchmarking Multimodal Copyright Unlearning in Large Vision Language Models cs.CV · 2026-05-05 · unverdicted · none · ref 5
CoVUBench is the first benchmark framework for evaluating multimodal copyright unlearning in LVLMs via synthetic data, systematic variations, and a dual protocol for forgetting efficacy and utility preservation.
Where Do Prompt Perturbations Break Generation? A Segment-Level View of Robustness in LoRA-Tuned Language Models cs.CL · 2026-05-02 · unverdicted · none · ref 62
S²R² improves robustness of LoRA-tuned LLMs to prompt perturbations by penalizing semantic-segment drift while preserving clean performance and cross-dataset transfer.
MTA: Multi-Granular Trajectory Alignment for Large Language Model Distillation cs.CL · 2026-05-02 · unverdicted · none · ref 74
MTA improves LLM knowledge distillation by aligning representations along layer-wise trajectories with adaptive granularity from words to phrases using dynamic structural and hidden representation alignment losses.
ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety cs.CR · 2026-04-21 · unverdicted · none · ref 105
ProjLens shows that backdoor parameters in MLLMs are encoded in low-rank subspaces of the projector and that embeddings shift toward the target direction with magnitude linear in input norm, activating only on poisoned samples.
Dual-Anchoring: Addressing State Drift in Vision-Language Navigation cs.CV · 2026-04-19 · unverdicted · none · ref 106
Dual-Anchoring Framework mitigates progress drift via structured instruction tokens and memory drift via landmark-centric retrospective prediction, yielding 15.2% success rate gain and 24.7% on long trajectories.
Detecting Pretraining Data from Large Language Models cs.CL · 2023-10-25 · conditional · none · ref 147
Min-K% Prob detects pretraining data in LLMs by flagging outlier low-probability words in text, achieving 7.4% better performance than prior methods on the new WIKIMIA benchmark.
Cross-Task Generalization via Natural Language Crowdsourcing Instructions cs.CL · 2021-04-18 · conditional · none · ref 63
Presents the NATURAL INSTRUCTIONS meta-dataset and shows generative pre-trained language models achieve 19% better generalization to unseen tasks when using task instructions.
Assisted Counterspeech Writing at the Crossroads of Hate Speech and Misinformation cs.CL · 2026-05-21 · conditional · none · ref 229
LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.
Evaluating Multi-turn Human-AI Interaction cs.HC · 2026-05-18 · unverdicted · none · ref 43
Introduces the TCR framework to evaluate educational LLM assistants on transparency, consistency, and refinement in multi-turn interactions, complementing aggregate metrics.
Learning Faster with Better Tokens: Parameter-Efficient Vocabulary Adaptation for Specialized Text Summarization cs.CL · 2026-05-17 · unverdicted · none · ref 7
Vocabulary adaptation via targeted token addition and replacement improves semantic similarity, domain word usage, and training efficiency for LLM summarization in legal and medical domains.
From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents cs.CL · 2026-05-14 · unverdicted · none · ref 148
A dataset-agnostic framework converts text tool-calling benchmarks to paired audio evaluations via TTS, speaker variation and noise, then evaluates seven omni-modal models showing model- and task-dependent performance with small text-to-voice gaps.
Learning Perturbations to Extrapolate Your LLM stat.ML · 2026-05-13 · unverdicted · none · ref 43
A learnable continuous perturbation framework for LLM token prefixes via latent vector transformations, optimized through unbiased estimating equations, yields gains in out-of-domain performance.
Perturbation is All You Need for Extrapolating Language Models stat.ML · 2026-05-05 · unverdicted · none · ref 33
Perturbing prefixes to semantic neighbors during training creates a hierarchical noise model that improves language model predictions on token sequences outside the training corpus support.
Before Forgetting, Learn to Remember: Revisiting Foundational Learning Failures in LVLM Unlearning Benchmarks cs.CV · 2026-05-05 · unverdicted · none · ref 5
LVLM unlearning benchmarks fail due to initial memorization failures on fictitious data; ReMem benchmark with multi-hop and multi-image scaling plus Exposure metric enables reliable learning and unlearning diagnosis.
Learn-to-learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM cs.CL · 2026-05-03 · unverdicted · none · ref 92
A hypernetwork generates meta-gating parameters for SwiGLU blocks to let LLMs adapt their nonlinearity to arbitrary textual conditions, outperforming finetuning and meta-learning baselines with reasonable generalization to unseen cases.
Where and What: Reasoning Dynamic and Implicit Preferences in Situated Conversational Recommendation cs.AI · 2026-04-22 · unverdicted · none · ref 71
SiPeR improves recommendation accuracy and response quality in situated conversations by estimating scene transitions and performing Bayesian inverse inference with multimodal LLMs.
Whose Story Gets Told? Positionality and Bias in LLM Summaries of Life Narratives cs.CL · 2026-04-22 · unverdicted · none · ref 53
A proposed pipeline shows LLMs introduce detectable race and gender biases when summarizing life narratives, creating potential for representational harm in research.
Bangla Key2Text: Text Generation from Keywords for a Low Resource Language cs.CL · 2026-04-21 · conditional · none · ref 36
Bangla Key2Text releases 2.6M keyword-text pairs and demonstrates that fine-tuned mT5 and BanglaT5 outperform zero-shot LLMs on keyword-conditioned Bangla text generation.
Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation cs.LG · 2026-04-21 · unverdicted · none · ref 92
Unsupervised single-generation confidence calibration for reasoning LLMs via offline self-consistency proxy distillation outperforms baselines on math and QA tasks and improves selective prediction.
Are Large Language Models Economically Viable for Industry Deployment? cs.CL · 2026-04-21 · unverdicted · none · ref 60
Small LLMs under 2B parameters achieve better economic break-even, energy efficiency, and hardware density than larger models on legacy GPUs for industrial tasks.
Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest cs.CL · 2026-04-21 · unverdicted · none · ref 77
LLMs show mixed results on authorship verification, post generation, and attribute inference from Twitter data, with new frameworks and user studies establishing benchmarks for these analytics tasks.
MFMDQwen: Multilingual Financial Misinformation Detection Based on Large Language Model cs.CE · 2026-04-20 · unverdicted · none · ref 2
MFMDQwen is the first open-source LLM for multilingual financial misinformation detection, backed by a new instruction dataset and benchmark on which it outperforms other open-source models.
Learning to Control Summaries with Score Ranking cs.CL · 2026-04-19 · unverdicted · none · ref 37
A score-ranking loss enables controllable summarization by aligning outputs to evaluation scores, matching SOTA performance with dimension-specific control on LLaMA, Qwen, and Mistral.
Lessons from the Trenches on Reproducible Evaluation of Language Models cs.CL · 2024-05-23 · accept · none · ref 95
The paper compiles practical lessons on reproducible LM evaluation and introduces the lm-eval library to mitigate common methodological problems in NLP.
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive cs.CL · 2024-02-20 · conditional · none · ref 120
DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting cs.CL · 2023-10-17 · conditional · none · ref 54
LLMs are highly sensitive to prompt formatting in few-shot settings, with accuracy varying by up to 76 points across formats; FormatSpread samples formats to report performance intervals without model weights.
Exploring the Effectiveness of Using LLMs for Automated Assessment of Student Self Explanations in Programming Education cs.HC · 2026-05-20 · unverdicted · none · ref 51
Compares LLMs against semantic similarity for binary classification of student self-explanations in programming education.
Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis cs.CV · 2026-05-19 · unverdicted · none · ref 13
TIF-GRPO uses integral feedback on pseudo-temporal trajectories to regulate anatomy-aware rewards in RL for clinical faithfulness in volumetric CT analysis.
SRA: Span Representation Alignment for Large Language Model Distillation cs.CL · 2026-05-02 · unverdicted · none · ref 74
SRA reframes cross-tokenizer LLM distillation as alignment of attention-weighted span centers of mass in a multi-particle dynamical system and reports consistent gains over prior CTKD baselines.
FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion cs.LG · 2026-04-21 · unverdicted · none · ref 55
FedProxy replaces weak adapters with a proxy SLM for federated LLM fine-tuning, outperforming prior methods and approaching centralized performance via compression, heterogeneity-aware aggregation, and training-free fusion.
Think before Go: Hierarchical Reasoning for Image-goal Navigation cs.RO · 2026-04-19 · unverdicted · none · ref 120
HRNav decomposes image-goal navigation into VLM-based short-horizon planning and RL-based execution with a wandering suppression penalty to improve performance in complex unseen settings.
Calibrating Model-Based Evaluation Metrics for Summarization cs.CL · 2026-04-19 · unverdicted · none · ref 49
A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence cs.AI · 2025-07-28 · accept · none · ref 287
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

Text summarization branches out , pages=

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer