hub Canonical reference

InFindings of the Association for Computational Linguistics: EMNLP 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.)

Kaiyi Zhang, Ang Lv, Yuhan Chen, Hansen Ha, Tao Xu, Rui Yan · 2024 · DOI 10.18653/v1/2024.findings-

Canonical reference. 100% of citing Pith papers cite this work as background.

18 Pith papers citing it

Background 100% of classified citations

open at publisher browse 18 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 6

citation-polarity summary

background 6

representative citing papers

Self-Improving In-Context Learning

cs.CL · 2026-05-22 · unverdicted · novelty 7.0

A test-time zeroth-order optimization of prompt embeddings using a bounded self-supervised proxy from demonstration log-probabilities improves ICL accuracy and correlates with gains across tasks.

Code Generation by Differential Test Time Scaling

cs.SE · 2026-05-19 · unverdicted · novelty 7.0

DiffCodeGen clusters code candidates by behavioral similarity from fuzzing-synthesized inputs and selects the largest cluster's medoid, matching or exceeding prior test-time scaling methods with far less token and time cost.

Hydra: Efficient, Correct Code Generation via Checkpoint-and-Rollback Support

cs.SE · 2026-05-14 · unverdicted · novelty 7.0

Hydra enables asynchronous static error checking and targeted checkpoint-rollback repair during LLM code generation, cutting latency by up to 71% and token use by up to 70% versus post-hoc repair on C/C++ tasks.

SimDiff: Depth Pruning via Similarity and Difference

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

SimDiff uses similarity and difference metrics to prune LLM layers more effectively than cosine similarity alone, retaining over 91% performance at 25% pruning on LLaMA2-7B.

On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability

cs.IR · 2026-04-17 · unverdicted · novelty 7.0

LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,

IDRBench: Understanding the Capability of Large Language Models on Interdisciplinary Research

cs.CL · 2025-07-21 · unverdicted · novelty 7.0

IDRBench is presented as the first benchmark framework consisting of datasets and three evaluation tasks to measure LLMs' ability to perform interdisciplinary research.

Tracking Capabilities for Safer Agents

cs.AI · 2026-03-01 · unverdicted · novelty 6.0

AI agents can generate code in a capability-safe Scala dialect that statically prevents information leakage and malicious side effects while preserving task performance.

VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents

cs.CL · 2025-09-09 · unverdicted · novelty 6.0

VeriOS-Agent is an OS agent that proactively queries humans in untrustworthy scenarios via a query-driven framework and three-stage training, achieving 19.72% higher step-wise success rate over baselines while preserving normal performance.

What Would GPT Click: Practical Effects of Human-AI Behavioral Misalignment and the Cost of Synthetic Participants in User Experience

cs.HC · 2026-05-18 · unverdicted · novelty 5.0

GPT produces click distributions significantly different from real humans in 53% of UX first-click tasks, with prompting techniques like personas and chain-of-thought failing to improve alignment.

Context Convergence Improves Answering Inferential Questions

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

Passages made from high-convergence sentences improve LLM performance on inferential questions compared to cosine similarity selection.

Beyond Isolated Utterances: Cue-Guided Interaction for Context-Dependent Conversational Multimodal Understanding

cs.MM · 2026-04-28 · unverdicted · novelty 5.0

CUCI-Net abstracts context-utterance dependency into an interpretation cue that combines local modality signals with global context and feeds it into the final multimodal interaction for context-conditioned predictions.

STAR: Semantic-Tuned and Tail-Adaptive Retriever for Graph-Augmented Generation

cs.IR · 2026-04-11 · unverdicted · novelty 5.0

STAR is a semantic-tuned and tail-adaptive retriever for GraphRAG that uses cross-attention interaction learning and path-weighted contrastive learning to mitigate Semantic Shortcut Bias and Long-Tail Path Bias, reporting 1.8% retrieval and 2.2% QA gains.

A Multi-Agent Approach to Validate and Refine LLM-Generated Personalized Math Problems

cs.CY · 2026-04-06 · unverdicted · novelty 5.0

A multi-agent generate-validate-revise framework reduces failures in realism and authenticity for LLM-personalized math problems, with one iteration helping and different strategies varying by criterion.

Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks

cs.AI · 2026-03-12 · unverdicted · novelty 5.0

Introduces Explicit Logic Channel (ELC) with LLM, VFM and probabilistic inference for validating, selecting and enhancing MLLMs on zero-shot tasks using Consistency Rate and cross-channel integration.

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

cs.AI · 2025-03-12 · unverdicted · novelty 5.0

The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.

A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions

cs.AI · 2025-01-27 · unverdicted · novelty 5.0

A survey of 87 agents for computer use and 33 datasets that introduces a three-dimensional taxonomy across domain, interaction, and agent perspectives and identifies six research gaps.

RAG-Enhanced Large Language Models for Dynamic Content Expiration Prediction in Web Search

cs.IR · 2026-05-13 · unverdicted · novelty 4.0

An LLM framework with RAG predicts query-specific validity horizons for web content expiration and shows gains in production A/B tests.

Multilingual Vision-Language Models, A Survey

cs.CL · 2025-09-26 · accept · novelty 3.0

The survey identifies a key tension in multilingual vision-language models between language neutrality via contrastive learning and cultural awareness via diverse data, with most benchmarks relying on translation-based evaluation.

citing papers explorer

Showing 18 of 18 citing papers.

Self-Improving In-Context Learning cs.CL · 2026-05-22 · unverdicted · none · ref 61
A test-time zeroth-order optimization of prompt embeddings using a bounded self-supervised proxy from demonstration log-probabilities improves ICL accuracy and correlates with gains across tasks.
Code Generation by Differential Test Time Scaling cs.SE · 2026-05-19 · unverdicted · none · ref 39
DiffCodeGen clusters code candidates by behavioral similarity from fuzzing-synthesized inputs and selects the largest cluster's medoid, matching or exceeding prior test-time scaling methods with far less token and time cost.
Hydra: Efficient, Correct Code Generation via Checkpoint-and-Rollback Support cs.SE · 2026-05-14 · unverdicted · none · ref 3
Hydra enables asynchronous static error checking and targeted checkpoint-rollback repair during LLM code generation, cutting latency by up to 71% and token use by up to 70% versus post-hoc repair on C/C++ tasks.
SimDiff: Depth Pruning via Similarity and Difference cs.AI · 2026-04-21 · unverdicted · none · ref 17
SimDiff uses similarity and difference metrics to prune LLM layers more effectively than cosine similarity alone, retaining over 91% performance at 25% pruning on LLaMA2-7B.
On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability cs.IR · 2026-04-17 · unverdicted · none · ref 9
LLM-based dense retrievers generalize better when instruction-tuned but pay a specialization tax when optimized for reasoning; they resist typos and corpus poisoning better than encoder-only baselines yet remain vulnerable to semantic perturbations, with larger models and certain embedding geometry,
IDRBench: Understanding the Capability of Large Language Models on Interdisciplinary Research cs.CL · 2025-07-21 · unverdicted · none · ref 39
IDRBench is presented as the first benchmark framework consisting of datasets and three evaluation tasks to measure LLMs' ability to perform interdisciplinary research.
Tracking Capabilities for Safer Agents cs.AI · 2026-03-01 · unverdicted · none · ref 84
AI agents can generate code in a capability-safe Scala dialect that statically prevents information leakage and malicious side effects while preserving task performance.
VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents cs.CL · 2025-09-09 · unverdicted · none · ref 77
VeriOS-Agent is an OS agent that proactively queries humans in untrustworthy scenarios via a query-driven framework and three-stage training, achieving 19.72% higher step-wise success rate over baselines while preserving normal performance.
What Would GPT Click: Practical Effects of Human-AI Behavioral Misalignment and the Cost of Synthetic Participants in User Experience cs.HC · 2026-05-18 · unverdicted · none · ref 36
GPT produces click distributions significantly different from real humans in 53% of UX first-click tasks, with prompting techniques like personas and chain-of-thought failing to improve alignment.
Context Convergence Improves Answering Inferential Questions cs.CL · 2026-05-12 · unverdicted · none · ref 21
Passages made from high-convergence sentences improve LLM performance on inferential questions compared to cosine similarity selection.
Beyond Isolated Utterances: Cue-Guided Interaction for Context-Dependent Conversational Multimodal Understanding cs.MM · 2026-04-28 · unverdicted · none · ref 54
CUCI-Net abstracts context-utterance dependency into an interpretation cue that combines local modality signals with global context and feeds it into the final multimodal interaction for context-conditioned predictions.
STAR: Semantic-Tuned and Tail-Adaptive Retriever for Graph-Augmented Generation cs.IR · 2026-04-11 · unverdicted · none · ref 15
STAR is a semantic-tuned and tail-adaptive retriever for GraphRAG that uses cross-attention interaction learning and path-weighted contrastive learning to mitigate Semantic Shortcut Bias and Long-Tail Path Bias, reporting 1.8% retrieval and 2.2% QA gains.
A Multi-Agent Approach to Validate and Refine LLM-Generated Personalized Math Problems cs.CY · 2026-04-06 · unverdicted · none · ref 3
A multi-agent generate-validate-revise framework reduces failures in realism and authenticity for LLM-personalized math problems, with one iteration helping and different strategies varying by criterion.
Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks cs.AI · 2026-03-12 · unverdicted · none · ref 17
Introduces Explicit Logic Channel (ELC) with LLM, VFM and probabilistic inference for validating, selecting and enhancing MLLMs on zero-shot tasks using Consistency Rate and cross-channel integration.
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models cs.AI · 2025-03-12 · unverdicted · none · ref 131
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.
A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions cs.AI · 2025-01-27 · unverdicted · none · ref 48
A survey of 87 agents for computer use and 33 datasets that introduces a three-dimensional taxonomy across domain, interaction, and agent perspectives and identifies six research gaps.
RAG-Enhanced Large Language Models for Dynamic Content Expiration Prediction in Web Search cs.IR · 2026-05-13 · unverdicted · none · ref 17
An LLM framework with RAG predicts query-specific validity horizons for web content expiration and shows gains in production A/B tests.
Multilingual Vision-Language Models, A Survey cs.CL · 2025-09-26 · accept · none · ref 124
The survey identifies a key tension in multilingual vision-language models between language neutrality via contrastive learning and cultural awareness via diverse data, with most benchmarks relying on translation-based evaluation.

InFindings of the Association for Computational Linguistics: EMNLP 2024, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.)

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer