super hub Mixed citations

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R · 2015

Mixed citation behavior. Most common role is background (50%).

128 Pith papers citing it

Background 50% of classified citations

browse 128 citing papers more from Mohammad Sadegh Rasooli and Joel R

hub tools

JSON dossier citing papers JSON

citation-role summary

background 7 other 1

citation-polarity summary

background 4 unclear 4

claims ledger

background comment-reply dataset for (dis) agreement detection in online debates. InThirty-fifth conference on neural information processing systems datasets and bench- marks track (round 2). Miklos Z Rácz and Daniel E Rigobon. 2023. Towards consensus: Reducing polarization by perturbing so- cial networks.IEEE Transactions on Network Sci- ence and Engineering, 10(6):3450-3464. ZP Rosen and Rick Dale. 2025. Antisemitic and islamophobic hate speech precedes a decrease in lexico-semantic diversity in comment
background 2005. Ha- hacronym: A computational humor system. InPro- ceedings of the ACL Interactive Poster and Demon- stration Sessions, pages 113-116. David Tomás, Reynier Ortega-Bueno, Guobiao Zhang, Paolo Rosso, and Rossano Schifanella. 2023. Transformer-based models for multimodal irony de- tection.Journal of Ambient Intelligence and Human- ized Computing, 14(6):7399-7410. Robert West and Eric Horvitz. 2019. Reverse- engineering satire, or "paper on computational hu- mor accepted despite making serious
background We define N scales with two adapter sets: G= {G1, . . . ,GN } (MGFA) and C={C 1, . . . ,CN } (MCFA). At each scale n, features are reshaped to a grid X (0) v ∈R H×W×D v and downsampled by Down(·,2 n−1): X (n) v = Down(X(0) v ,2 n−1).(4) Let Xv,n = Seq(X (n) v ) denote the flattened se- quence. We then refine and fuse: Gn =G n(Xv,n), C n =C n(Xv,n, Xt),(5) ˜Xv,n =G n +w C n,(6) where w balances global and cross-modal adapta- tion. An interleave-repeat upsampling restores the (a) MGFA Module. (b)
background Householder mean-direction alignment.The nuisance mean-direction difference is removed by mapping the sample mean direction of X onto that of Y via Householder reflection. Let ¯x= 1 n Pn i=1 xi, ¯y= 1 m Pm j=1 yj, ˆµx = ¯x ∥¯x∥2 , ˆµy = ¯y ∥¯y∥2 . If ˆµx ̸= ˆµy, the Householder axis is defined as u= ˆµx − ˆµy ∥ˆµx − ˆµy∥2 ,(5) and the reflection matrix is H=I−2uu ⊤,(6) which satisfies Hˆµx = ˆµy and H⊤H=I . We then alignXby applyingHto every vector inX: x′ i =Hx i (i= 1, . . . , n),(7) and Y is
other t→1 as the query requires more changes, thus (1−t)→1 as the query increases in accuracy. 3.6 Query Mutation Given the mutation temperaturet and assessment A from the critic, the original candidate QC is then rewritten via LLMmutate, which is prompted to produce an updated query candidate QC′ that in- corporates the changes recommended by the critic: QC′ =LLM mutate(Q, S′ i, QC, H, A, t)(6) We consider a single refinement step to consist of a call to the critic, followed by a subsequent call to t
background contribution of Q and P without the CoT rationale. Correspondingly, al no-CoT represents the attention activation excluding CoT. The additional term WV R(WKR)T q represents the contribution of the CoT rationale R to the hid- den activation. We can get the hidden activation by transforming the attention activation by a non- linear functionf: hl ≈h l no-CoT +f WV R(WKR)T q (7) Thus, we conclude that the rationale R in the CoT primarily contributes a shift in hidden acti- vation values, emphasi

authors

Mohammad Sadegh Rasooli and Joel R

co-cited works

representative citing papers

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

Agentic CLEAR automates multi-level evaluation of LLM agents, generating textual insights at system, trace, and node granularity that align with human annotations and predict task success.

From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

Introduces Causal Functional Signatures grounded in causal evidence and ILP-learned architectural signatures to enable explicit, comparable, and portable mechanistic claims across model scales.

Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation

cs.CL · 2026-05-14 · unverdicted · novelty 7.0

New metrics KSS and KPS are introduced to evaluate multilingual machine unlearning quality and cross-language consistency in LLMs, addressing limitations of single-language evaluation protocols.

LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

LongBEL improves biomedical entity linking consistency by combining full-document context with memory of previous predictions trained via cross-validation rather than gold labels.

Human-Grounded Multimodal Benchmark with 900K-Scale Aggregated Student Response Distributions from Japan's National Assessment of Academic Ability

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

A new benchmark dataset drawn from Japan's National Assessment of Academic Ability supplies real exam layouts, diagrams, Japanese text, and nationwide student response distributions for evaluating multimodal LLMs.

The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

Semantic Softmax aggregates probabilities from semantic synonyms around target labels to correct renormalization bias in zero-shot LLM classification, lowering calibration error and raising AUROC and F1.

CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.

Accurate and Efficient Statistical Testing for Word Semantic Breadth

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

A new permutation test uses Householder reflection to align word embedding clouds before testing dispersion differences, cutting Type-I error by 32.5% and speeding up 23x on GPU.

Logic-Regularized Verifier Elicits Reasoning from LLMs

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

LOVER creates an unsupervised logic-regularized verifier that reaches 95% of supervised verifier performance on reasoning tasks across 10 datasets.

POSTCONDBENCH: Benchmarking Correctness and Completeness in Formal Postcondition Inference

cs.SE · 2026-05-05 · unverdicted · novelty 7.0

POSTCONDBENCH is a new multilingual benchmark that evaluates LLM postcondition generation on real code using defect discrimination to assess completeness beyond surface matching.

Where Do Prompt Perturbations Break Generation? A Segment-Level View of Robustness in LoRA-Tuned Language Models

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

S²R² improves robustness of LoRA-tuned LLMs to prompt perturbations by penalizing semantic-segment drift while preserving clean performance and cross-dataset transfer.

A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

Presents MBFC-2025 dataset and multi-view embeddings with fusion methods for media bias and factuality, reporting SOTA results on ACL-2020 and new benchmarks on MBFC-2025.

OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving

cs.CL · 2026-04-23 · unverdicted · novelty 7.0

OptiVerse is a new benchmark spanning neglected optimization domains that shows LLMs suffer sharp accuracy drops on hard problems due to modeling and logic errors, with a Dual-View Auditor Agent proposed to improve performance.

Decoding Text Spans for Efficient and Accurate Named-Entity Recognition

cs.CL · 2026-04-22 · unverdicted · novelty 7.0

SpanDec achieves competitive NER accuracy with improved efficiency by using a final-stage lightweight decoder for span representations and early candidate filtering to reduce redundant computation.

ATIR: Towards Audio-Text Interleaved Contextual Retrieval

cs.SD · 2026-04-22 · unverdicted · novelty 7.0

Defines ATIR task and benchmark for mixed audio-text queries; MLLM model with token compression shows substantial gains over strong baselines.

Memory-Augmented LLM-based Multi-Agent System for Automated Feature Generation on Tabular Data

cs.AI · 2026-04-22 · unverdicted · novelty 7.0

MALMAS is a memory-augmented multi-agent LLM system that generates diverse, high-quality features for tabular data via agent decomposition, routing, and iterative memory-guided refinement.

Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context

cs.CL · 2026-04-22 · unverdicted · novelty 7.0

Quantile tokens inserted into LLM inputs combined with neighbor retrieval enable direct prediction of full distributions, yielding lower MAPE and narrower intervals than baselines on Airbnb and StackSample tasks.

Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages

eess.AS · 2026-04-21 · unverdicted · novelty 7.0

Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.

Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

Translation function vectors extracted from English to one target language improve correct token ranking for translations to multiple other unseen target languages in decoder-only multilingual LLMs.

Structure Guided Retrieval-Augmented Generation for Factual Queries

cs.IR · 2026-04-21 · unverdicted · novelty 7.0

SG-RAG frames retrieval as subgraph matching to ensure LLMs meet every condition in factual queries and reports large gains over baselines on a new 120k-pair ERQA dataset.

From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

MAGEO is a multi-agent system that distills validated editing patterns into reusable optimization skills for generative engines, outperforming heuristic baselines on visibility and fidelity via a new benchmark and evaluation protocol.

Cell-Based Representation of Relational Binding in Language Models

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

Large language models encode relational bindings via a cell-based representation: a low-dimensional linear subspace in which each cell corresponds to an entity-relation index pair and attributes are retrieved from the matching cell.

LQM: Linguistically Motivated Multidimensional Quality Metrics for Machine Translation

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

LQM introduces a six-level linguistically motivated error taxonomy for MT evaluation and applies it via expert annotation to LLM outputs on a new 3,850-sentence multi-dialect Arabic corpus.

citing papers explorer

Showing 50 of 128 citing papers.

Evaluating Very Long-Term Conversational Memory of LLM Agents cs.CL · 2024-02-27 · unverdicted · none · ref 6
Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.
Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents cs.CL · 2026-05-21 · unverdicted · none · ref 9
Agentic CLEAR automates multi-level evaluation of LLM agents, generating textual insights at system, trace, and node granularity that align with human annotations and predict task success.
From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach cs.LG · 2026-05-20 · unverdicted · none · ref 6
Introduces Causal Functional Signatures grounded in causal evidence and ILP-learned architectural signatures to enable explicit, comparable, and portable mechanistic claims across model scales.
Knowledge Beyond Language: Bridging the Gap in Multilingual Machine Unlearning Evaluation cs.CL · 2026-05-14 · unverdicted · none · ref 6
New metrics KSS and KPS are introduced to evaluate multilingual machine unlearning quality and cross-language consistency in LLMs, addressing limitations of single-language evaluation protocols.
LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking cs.CL · 2026-05-13 · unverdicted · none · ref 49
LongBEL improves biomedical entity linking consistency by combining full-document context with memory of previous predictions trained via cross-validation rather than gold labels.
Human-Grounded Multimodal Benchmark with 900K-Scale Aggregated Student Response Distributions from Japan's National Assessment of Academic Ability cs.CL · 2026-05-12 · unverdicted · none · ref 6
A new benchmark dataset drawn from Japan's National Assessment of Academic Ability supplies real exam layouts, diagrams, Japanese text, and nationwide student response distributions for evaluating multimodal LLMs.
The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods cs.CL · 2026-05-10 · unverdicted · none · ref 24
Semantic Softmax aggregates probabilities from semantic synonyms around target labels to correct renormalization bias in zero-shot LLM classification, lowering calibration error and raising AUROC and F1.
CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation cs.CL · 2026-05-08 · unverdicted · none · ref 6
CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.
Accurate and Efficient Statistical Testing for Word Semantic Breadth cs.CL · 2026-05-08 · unverdicted · none · ref 6
A new permutation test uses Householder reflection to align word embedding clouds before testing dispersion differences, cutting Type-I error by 32.5% and speeding up 23x on GPU.
Logic-Regularized Verifier Elicits Reasoning from LLMs cs.CL · 2026-05-07 · unverdicted · none · ref 26
LOVER creates an unsupervised logic-regularized verifier that reaches 95% of supervised verifier performance on reasoning tasks across 10 datasets.
POSTCONDBENCH: Benchmarking Correctness and Completeness in Formal Postcondition Inference cs.SE · 2026-05-05 · unverdicted · none · ref 42
POSTCONDBENCH is a new multilingual benchmark that evaluates LLM postcondition generation on real code using defect discrimination to assess completeness beyond surface matching.
Where Do Prompt Perturbations Break Generation? A Segment-Level View of Robustness in LoRA-Tuned Language Models cs.CL · 2026-05-02 · unverdicted · none · ref 6
S²R² improves robustness of LoRA-tuned LLMs to prompt perturbations by penalizing semantic-segment drift while preserving clean performance and cross-dataset transfer.
A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis cs.CL · 2026-05-02 · unverdicted · none · ref 139
Presents MBFC-2025 dataset and multi-view embeddings with fusion methods for media bias and factuality, reporting SOTA results on ACL-2020 and new benchmarks on MBFC-2025.
OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving cs.CL · 2026-04-23 · unverdicted · none · ref 132
OptiVerse is a new benchmark spanning neglected optimization domains that shows LLMs suffer sharp accuracy drops on hard problems due to modeling and logic errors, with a Dual-View Auditor Agent proposed to improve performance.
Decoding Text Spans for Efficient and Accurate Named-Entity Recognition cs.CL · 2026-04-22 · unverdicted · none · ref 19
SpanDec achieves competitive NER accuracy with improved efficiency by using a final-stage lightweight decoder for span representations and early candidate filtering to reduce redundant computation.
ATIR: Towards Audio-Text Interleaved Contextual Retrieval cs.SD · 2026-04-22 · unverdicted · none · ref 24
Defines ATIR task and benchmark for mixed audio-text queries; MLLM model with token compression shows substantial gains over strong baselines.
Memory-Augmented LLM-based Multi-Agent System for Automated Feature Generation on Tabular Data cs.AI · 2026-04-22 · unverdicted · none · ref 55
MALMAS is a memory-augmented multi-agent LLM system that generates diverse, high-quality features for tabular data via agent decomposition, routing, and iterative memory-guided refinement.
Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context cs.CL · 2026-04-22 · unverdicted · none · ref 6
Quantile tokens inserted into LLM inputs combined with neighbor retrieval enable direct prediction of full distributions, yielding lower MAPE and narrower intervals than baselines on Airbnb and StackSample tasks.
Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages eess.AS · 2026-04-21 · unverdicted · none · ref 6
Introduces the Indic-CodecFake dataset for Indic codec deepfakes and SATYAM, a novel hyperbolic ALM that outperforms baselines through dual-stage semantic-prosodic fusion using Bhattacharya distance.
Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation cs.CL · 2026-04-21 · unverdicted · none · ref 6
Translation function vectors extracted from English to one target language improve correct token ranking for translations to multiple other unseen target languages in decoder-only multilingual LLMs.
Structure Guided Retrieval-Augmented Generation for Factual Queries cs.IR · 2026-04-21 · unverdicted · none · ref 5
SG-RAG frames retrieval as subgraph matching to ensure LLMs meet every condition in factual queries and reports large gains over baselines on a new 120k-pair ERQA dataset.
From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning cs.AI · 2026-04-21 · unverdicted · none · ref 6
MAGEO is a multi-agent system that distills validated editing patterns into reusable optimization skills for generative engines, outperforming heuristic baselines on visibility and fidelity via a new benchmark and evaluation protocol.
Cell-Based Representation of Relational Binding in Language Models cs.CL · 2026-04-21 · unverdicted · none · ref 6
Large language models encode relational bindings via a cell-based representation: a low-dimensional linear subspace in which each cell corresponds to an entity-relation index pair and attributes are retrieved from the matching cell.
LQM: Linguistically Motivated Multidimensional Quality Metrics for Machine Translation cs.CL · 2026-04-20 · unverdicted · none · ref 6
LQM introduces a six-level linguistically motivated error taxonomy for MT evaluation and applies it via expert annotation to LLM outputs on a new 3,850-sentence multi-dialect Arabic corpus.
Region-Grounded Report Generation for 3D Medical Imaging: A Fine-Grained Dataset and Graph-Enhanced Framework cs.CV · 2026-04-20 · conditional · none · ref 6
Introduces the first large-scale 3D PET/CT dataset with fine-grained RoI annotations for Vietnamese and a graph-enhanced HiRRA framework that achieves SOTA report generation by modeling RoI dependencies.
Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition eess.AS · 2026-04-19 · unverdicted · none · ref 6
NOVA-ARC is a hyperbolic geometry framework that transfers emotion supervision from labeled non-verbal vocalizations to unlabeled verbal speech in multiple languages via optimal transport prototype alignment and consistency regularization.
Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning cs.CL · 2026-04-19 · unverdicted · none · ref 6
CoT-PoT ensembling achieves self-consistency accuracy in LLMs with only two samples for 78.6% of tasks, reducing computation by 9.3x compared to standard methods.
GaLa: Hypergraph-Guided Visual Language Models for Procedural Planning cs.RO · 2026-04-19 · unverdicted · none · ref 38
GaLa uses hypergraph representations of objects and a TriView encoder with contrastive learning to improve vision-language models on procedural planning benchmarks.
HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads cs.IR · 2026-04-19 · unverdicted · none · ref 6
HeadRank lifts preference optimization into attention space via entropy-regularized head selection and distribution regularizers to sharpen discriminability for efficient listwise reranking.
Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism cs.LO · 2026-04-07 · unverdicted · none · ref 82
ProofGrid is a new benchmark for LLM reasoning that uses machine-checkable proofs in minimal formal notation, revealing progress on basic tasks but major gaps in complex combinatorial and synthesis reasoning.
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation cs.CL · 2025-02-28 · unverdicted · none · ref 7
CODI compresses explicit CoT into continuous space via self-distillation and is the first implicit method to match explicit CoT performance on GSM8k at GPT-2 scale with 3.1x compression and 28.2% higher accuracy than prior implicit approaches.
Cognitive offloading and the speedup illusion in human-AI interaction cs.CY · 2026-05-22 · unverdicted · none · ref 56
Preregistered behavioral study identifies a speedup illusion where users overestimate time savings from AI assistance on cognitive tasks despite no actual difference in completion times.
GHI: Graphormer over Conditioned Hypergraph Incidence for Aspect-Based Sentiment Analysis cs.CL · 2026-05-21 · unverdicted · none · ref 6
GHI introduces an incidence-based structural reasoning layer using Graphormer on conditioned hypergraphs for ABSA, reporting outperformance on SemEval benchmarks, near-parity with 11B models at 247M parameters, and robustness on ARTS.
When Support Escalates Distress: Regulation and Escalation in LLM Responses to Venting and Advice-Seeking cs.HC · 2026-05-20 · unverdicted · none · ref 9
LLM responses mirror venting with higher regulation and escalation; therapist personas lower escalation while preserving regulation, and lay raters miss escalation.
Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation cs.CL · 2026-05-20 · conditional · none · ref 9
DPR-BAG generates factually grounded biomedical abstracts from full texts via structured BOMRC decomposition, parallel LLM prompting, and coherence refinement without any model training.
ContextRAG: Extraction-Free Hierarchical Graph Construction for Retrieval-Augmented Generation cs.CL · 2026-05-19 · unverdicted · none · ref 23
ContextRAG constructs extraction-free hierarchical graphs via residual-quantization k-means and Formal Concept Analysis with Lukasiewicz residuated logic on embeddings, using 30 LLM calls and 22k tokens to reach 33.6% F1 on a 130-task UltraDomain subset.
AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code cs.CL · 2026-05-18 · unverdicted · none · ref 6
AutoVecCoder combines VecPrompt for automated intrinsic knowledge synthesis and VecRL for efficiency-aligned RL to train an 8B LLM that achieves SOTA on SimdBench SSE/AVX subsets and sometimes exceeds -O3 compiler results.
DMN: A Compositional Framework for Jailbreaking Multimodal LLMs with Multi-Image Inputs cs.CR · 2026-05-18 · unverdicted · none · ref 6
DMN achieves over 90% attack success rate on GPT-4o, Gemini-2.5-pro and Claude Sonnet 4 by distributing instructions, supplying multimodal evidence, and adding number chain tasks across multiple images.
Prefix-Adaptive Block Diffusion for Efficient Document Recognition cs.CV · 2026-05-16 · unverdicted · none · ref 6
PA-BDM adapts block diffusion by switching to causal intra-block denoising and dynamically committing reliable prefixes to KV cache, yielding higher accuracy and 71.6% higher throughput than a comparable baseline on document benchmarks.
Defining Cultural Capabilities for AI Evaluation: A Taxonomy Grounded in Intercultural Communication Theory cs.CL · 2026-05-15 · unverdicted · none · ref 6
Proposes a three-level taxonomy of Cultural Awareness, Cultural Sensitivity, and Cultural Competence for AI evaluation, grounded in intercultural communication scholarship to improve validity in multicultural contexts.
Evaluating Chinese Ambiguity Understanding in Large Language Models cs.CL · 2026-05-15 · unverdicted · none · ref 15
Introduces the CHA-Gen dataset for Chinese ambiguity based on Potential Ambiguity Theory and shows LLMs struggle to detect ambiguity, exhibiting specific failure modes and overconfidence after instruction tuning.
GiLT: Augmenting Transformer Language Models with Dependency Graphs cs.CL · 2026-05-15 · unverdicted · none · ref 1
GiLT augments Transformers with semantic dependency graphs by modulating attention to improve syntactic generalization while keeping perplexity competitive and enabling better finetuning on downstream tasks.
Dual Hierarchical Dialogue Policy Learning for Legal Inquisitive Conversational Agents cs.CL · 2026-05-13 · unverdicted · none · ref 198 · 2 links
A dual hierarchical RL framework with two agents coordinates high-level dialogue strategy and low-level question generation to emulate judicial questioning and extract key information from Supreme Court arguments, outperforming baselines.
History Anchors: How Prior Behavior Steers LLM Decisions Toward Unsafe Actions cs.AI · 2026-05-13 · unverdicted · none · ref 32
A single consistency instruction with harmful prior actions causes aligned frontier LLMs to select unsafe options at 91-98% rates in high-stakes domains, with escalation and inverse scaling by model size.
STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes cs.CL · 2026-05-13 · unverdicted · none · ref 6
STOP uses structured on-policy analysis to prune long reasoning traces to their earliest correct node, cutting token usage 19-42% with little accuracy loss on math benchmarks.
Linking Extreme Discourse to Structural Polarization in Signed Interaction Networks cs.SI · 2026-05-12 · unverdicted · none · ref 6
A pipeline derives continuous signed edges from LLM stance scores on text and links discourse signals such as toxicity and extreme claims to changes in structural polarization measured by spectral and frustration scores on Reddit Brexit data.
Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation cs.CL · 2026-05-12 · unverdicted · none · ref 39
Summing outputs from separately trained QLoRA PEFT modules provides strong performance for attribute-controlled text generation, often matching or exceeding single-task modules even on single-attribute tests.
Improving Lexical Difficulty Prediction with Context-Aligned Contrastive Learning and Ridge Ensembling cs.CL · 2026-05-09 · unverdicted · none · ref 13
Context-Aligned Contrastive Regression combines cross-view context alignment and ordinal soft contrastive learning with ridge ensembles to improve lexical difficulty prediction across L1 backgrounds on three datasets.
Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding cs.CV · 2026-05-08 · unverdicted · none · ref 6 · 2 links
Response-G1 uses query-guided scene graphs, memory retrieval, and augmented prompting to improve when Video-LLMs decide to respond during streaming videos.
Effective Performance Measurement: Challenges and Opportunities in KPI Extraction from Earnings Calls cs.CL · 2026-05-04 · unverdicted · none · ref 97
Encoder models trained on SEC filings struggle with earnings calls due to domain shift, while LLMs enable open-ended KPI extraction with 79.7% human-verified precision on newly introduced benchmarks.

Tetreault , title =

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer