Mixed citations

Title resolution pending

Alfred V · 1972

Mixed citation behavior. Most common role is background (50%).

123 Pith papers citing it

Background 50% of classified citations

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 11 method 2 other 1

citation-polarity summary

background 7 unclear 5 use method 2

representative citing papers

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

Agentic CLEAR automates multi-level evaluation of LLM agents, generating textual insights at system, trace, and node granularity that align with human annotations and predict task success.

From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

Introduces Causal Functional Signatures grounded in causal evidence and ILP-learned architectural signatures to enable explicit, comparable, and portable mechanistic claims across model scales.

Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation

cs.CL · 2026-05-14 · unverdicted · novelty 7.0

New metrics KSS and KPS are introduced to evaluate multilingual machine unlearning quality and cross-language consistency in LLMs, addressing limitations of single-language evaluation protocols.

LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

LongBEL improves biomedical entity linking consistency by combining full-document context with memory of previous predictions trained via cross-validation rather than gold labels.

LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics

cs.CL · 2026-05-13 · accept · novelty 7.0

LLMs can provide cost-effective annotation of credibility in Danish asylum texts but produce inconsistent errors that vary by model and prompt, requiring checks beyond single-model accuracy.

Human-Grounded Multimodal Benchmark with 900K-Scale Aggregated Student Response Distributions from Japan's National Assessment of Academic Ability

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

A new benchmark dataset drawn from Japan's National Assessment of Academic Ability supplies real exam layouts, diagrams, Japanese text, and nationwide student response distributions for evaluating multimodal LLMs.

The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

Semantic Softmax aggregates probabilities from semantic synonyms around target labels to correct renormalization bias in zero-shot LLM classification, lowering calibration error and raising AUROC and F1.

CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.

Accurate and Efficient Statistical Testing for Word Semantic Breadth

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

A new permutation test uses Householder reflection to align word embedding clouds before testing dispersion differences, cutting Type-I error by 32.5% and speeding up 23x on GPU.

Logic-Regularized Verifier Elicits Reasoning from LLMs

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

LOVER creates an unsupervised logic-regularized verifier that reaches 95% of supervised verifier performance on reasoning tasks across 10 datasets.

POSTCONDBENCH: Benchmarking Correctness and Completeness in Formal Postcondition Inference

cs.SE · 2026-05-05 · unverdicted · novelty 7.0

POSTCONDBENCH is a new multilingual benchmark that evaluates LLM postcondition generation on real code using defect discrimination to assess completeness beyond surface matching.

Where Do Prompt Perturbations Break Generation? A Segment-Level View of Robustness in LoRA-Tuned Language Models

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

S²R² improves robustness of LoRA-tuned LLMs to prompt perturbations by penalizing semantic-segment drift while preserving clean performance and cross-dataset transfer.

A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

Presents MBFC-2025 dataset and multi-view embeddings with fusion methods for media bias and factuality, reporting SOTA results on ACL-2020 and new benchmarks on MBFC-2025.

OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving

cs.CL · 2026-04-23 · unverdicted · novelty 7.0

OptiVerse is a new benchmark spanning neglected optimization domains that shows LLMs suffer sharp accuracy drops on hard problems due to modeling and logic errors, with a Dual-View Auditor Agent proposed to improve performance.

Decoding Text Spans for Efficient and Accurate Named-Entity Recognition

cs.CL · 2026-04-22 · unverdicted · novelty 7.0

SpanDec achieves competitive NER accuracy with improved efficiency by using a final-stage lightweight decoder for span representations and early candidate filtering to reduce redundant computation.

ATIR: Towards Audio-Text Interleaved Contextual Retrieval

cs.SD · 2026-04-22 · unverdicted · novelty 7.0

Defines ATIR task and benchmark for mixed audio-text queries; MLLM model with token compression shows substantial gains over strong baselines.

Memory-Augmented LLM-based Multi-Agent System for Automated Feature Generation on Tabular Data

cs.AI · 2026-04-22 · unverdicted · novelty 7.0

MALMAS is a memory-augmented multi-agent LLM system that generates diverse, high-quality features for tabular data via agent decomposition, routing, and iterative memory-guided refinement.

Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context

cs.CL · 2026-04-22 · unverdicted · novelty 7.0

Quantile tokens inserted into LLM inputs combined with neighbor retrieval enable direct prediction of full distributions, yielding lower MAPE and narrower intervals than baselines on Airbnb and StackSample tasks.

Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages

eess.AS · 2026-04-21 · unverdicted · novelty 7.0

Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.

Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

Translation function vectors extracted from English to one target language improve correct token ranking for translations to multiple other unseen target languages in decoder-only multilingual LLMs.

Structure Guided Retrieval-Augmented Generation for Factual Queries

cs.IR · 2026-04-21 · unverdicted · novelty 7.0

SG-RAG frames retrieval as subgraph matching to ensure LLMs meet every condition in factual queries and reports large gains over baselines on a new 120k-pair ERQA dataset.

From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

MAGEO is a multi-agent system that distills validated editing patterns into reusable optimization skills for generative engines, outperforming heuristic baselines on visibility and fidelity via a new benchmark and evaluation protocol.

Cell-Based Representation of Relational Binding in Language Models

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

Large language models encode relational bindings via a cell-based representation: a low-dimensional linear subspace in which each cell corresponds to an entity-relation index pair and attributes are retrieved from the matching cell.

citing papers explorer

Showing 9 of 9 citing papers after filters.

Token-weighted Direct Preference Optimization with Attention cs.CL · 2026-05-21 · unreviewed · ref 1
PRISM: Preference-Aware Influence Function Based Data Selection Method for Efficient Fine-Tuning cs.LG · 2026-05-20 · unreviewed · ref 1
Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation cs.CL · 2026-05-20 · unreviewed · ref 4
Language-Switching Triggers Take a Latent Detour Through Language Models cs.CL · 2026-05-18 · unreviewed · ref 1
An Efficient Streaming Video Understanding Framework with Agentic Control cs.CV · 2026-05-18 · unreviewed · ref 1
Dual-Cluster Memory Agent: Resolving Multi-Paradigm Ambiguity in Optimization Problem Solving cs.CL · 2026-04-22 · unreviewed · ref 111
BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps cs.SD · 2026-04-21 · unreviewed · ref 1
KnowledgeBerg: Evaluating Systematic Knowledge Coverage and Compositional Reasoning in Large Language Models cs.AI · 2026-04-19 · unreviewed · ref 1
Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning cs.CL · 2026-04-19 · unreviewed · ref 1

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer