Title resolution pending

URL https://aclanthology · 2024 · DOI 10.34740/kaggle/m/3301

24 Pith papers cite this work. Polarity classification is still indexing.

24 Pith papers citing it

open at publisher browse 24 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 3

citation-polarity summary

background 2 unclear 1

representative citing papers

Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control

cs.LG · 2026-04-21 · conditional · novelty 7.0

Local linearity of LLM layers enables LQR-based closed-loop activation steering with theoretical tracking guarantees.

FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs

cs.CL · 2025-10-10 · unverdicted · novelty 7.0

FinAuditing is a taxonomy-structured multi-document benchmark with 1,102 instances averaging over 33k tokens from XBRL filings, defining three tasks to evaluate LLMs on financial auditing capabilities.

FinTagging: Benchmarking LLMs for Extracting and Structuring Financial Information

cs.CL · 2025-05-27 · unverdicted · novelty 7.0

FinTagging decomposes XBRL tagging into FinNI extraction and FinCL full-taxonomy linking, showing LLMs handle extraction but struggle with fine-grained concept alignment in zero-shot settings.

Improving Dictionary Learning with Gated Sparse Autoencoders

cs.LG · 2024-04-24 · unverdicted · novelty 7.0

Gated SAEs decouple which features to use from how large their activations should be, applying the L1 penalty only to selection and thereby eliminating shrinkage while halving the number of firing features needed for good fidelity.

C$^{2}$R: Cross-sample Consistency Regularization Mitigates Feature Splitting and Absorption in Sparse Autoencoders

cs.LG · 2026-06-29 · unverdicted · novelty 6.0

Introduces C²R cross-sample consistency regularization that mitigates feature splitting and absorption in SAEs while preserving reconstruction fidelity.

Detecting Sensitive Personal Information in Japanese Pre-Training Corpora for Large Language Models

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

The paper constructs an SCPI dataset via LLM-based annotation and trains classifiers to detect sensitive personal information in Japanese pre-training corpora, claiming this is the first such exploration.

MOC: Multi-Order Communication in LLM-based Multi-Agent Systems

cs.AI · 2026-06-01 · unverdicted · novelty 6.0

MOC formalizes a multi-order evidence stream and Semantic-Topological Merging algorithm that improves task performance while cutting communication costs on six datasets.

A Multi-Agent Framework for Feature-Constrained Difficulty Control in Reading Comprehension Item Generation

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

MAFIG is a multi-agent framework that uses LLM agents and evaluators to generate reading comprehension items with significantly higher adherence to specified feature constraints than single-agent baselines.

Alignment Dynamics in LLM Fine-Tuning

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

The paper introduces a dynamical model that decomposes alignment updates in LLM fine-tuning into rebound and driving forces and predicts a rehearsal priming effect.

Tree SAE: Learning Hierarchical Feature Structures in Sparse Autoencoders

cs.LG · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

Tree SAE learns hierarchical feature structures by combining activation coverage with a new reconstruction condition, outperforming prior SAEs on hierarchical pair detection while matching state-of-the-art benchmark performance.

You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation

cs.CR · 2026-05-06 · unverdicted · novelty 6.0

NeWTral is a non-linear weight translation framework using MoE routing that reduces average attack success rate from 70% to 13% on unsafe domain adapters across Llama, Mistral, Qwen, and Gemma models up to 72B while retaining 90% knowledge fidelity.

Debiasing Reward Models via Causally Motivated Inference-Time Intervention

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

Neuron-level inference-time intervention reduces multiple biases in reward models, enabling 2B and 7B models to match 70B performance on LLM alignment benchmarks without trade-offs.

GRPO-VPS: Enhancing Group Relative Policy Optimization with Verifiable Process Supervision for Effective Reasoning

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

GRPO-VPS improves GRPO by using segment-wise conditional probabilities of the correct answer to supply process-level feedback, yielding up to 2.6-point accuracy gains and 13.7% shorter reasoning on math tasks.

All Public Voices Are Equal, But Are Some More Equal Than Others to LLMs?

cs.CY · 2026-04-19 · unverdicted · novelty 6.0

LLMs produce lower-fidelity summaries of identical public comments when attributed to lower-status occupations like street vendors versus financial analysts, with inconsistent race effects and no gender effects.

The Realignment Problem: When Right becomes Wrong in LLMs

cs.CL · 2025-11-04 · unverdicted · novelty 6.0

TRACE is a three-stage optimization framework that realigns LLMs to new policies by categorizing preference conflicts, scoring impact via bi-level optimization, and applying hybrid losses without new human annotations.

CFDLLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid Dynamics

cs.CL · 2025-09-19 · unverdicted · novelty 6.0

CFDLLMBench is a new benchmark suite with CFDQuery, CFDCodeBench, and FoamBench to evaluate LLMs on graduate-level CFD knowledge, numerical reasoning, and context-dependent code implementation.

A StrongREJECT for Empty Jailbreaks

cs.LG · 2024-02-15 · conditional · novelty 6.0

StrongREJECT provides a standardized benchmark and evaluator for jailbreak attacks that aligns better with human judgments than prior methods and reveals that successful jailbreaks often reduce model capabilities.

MOSAIC: Efficient Mixture-of-Agent Scheduling via Adaptive Aggregation and Inference Concurrency

cs.LG · 2026-06-02 · unverdicted · novelty 5.0

MOSAIC uses an Integer Linear Program scheduler for expert placement and prompt assignment plus adaptive aggregation to achieve 1.7-2.3x end-to-end speedup on 4-GPU MoA workloads while keeping accuracy within 0.1pp.

When Meaning Travels: A Granular Lens on Hybrid-MoE's Role in Idiomatic Understanding for Language Models

cs.CL · 2026-06-01 · unverdicted · novelty 5.0

HybridMoE with controlled hybridization and idiomatic property signals yields 5-6% gains in figurative language representation for multilingual vision-language models.

MAPLE: A Meta-learning Framework for Cross-Prompt Essay Scoring

cs.CL · 2026-04-19 · unverdicted · novelty 5.0

MAPLE uses meta-learning with prototypical networks to learn transferable representations and achieves state-of-the-art cross-prompt essay scoring on ELLIPSE, LAILA, and parts of ASAP datasets.

Digital Skin, Digital Bias: Uncovering Tone-Based Biases in LLMs and Emoji Embeddings

cs.SI · 2026-04-08 · unverdicted · novelty 5.0

LLMs handle skin tone emoji modifiers better than dedicated embedding models but display systemic disparities in sentiment and semantic consistency across tones.

MedConclusion: A Benchmark for Biomedical Conclusion Generation from Structured Abstracts

cs.CL · 2026-04-07 · unverdicted · novelty 5.0

MedConclusion is a 5.7M-instance benchmark dataset for generating biomedical conclusions from structured PubMed abstracts, with LLM evaluations showing conclusion writing differs from summarization and that judge choice affects scores.

Food Noise & False Safety: A Systematic Evaluation of How LLMs Fail to Adapt to Eating Disorder Queries with Clinician Feedback

cs.AI · 2026-06-01 · unverdicted · novelty 4.0

Systematic evaluation shows LLMs frequently give unsafe responses to eating disorder prompts when linguistic cues signal risk, as measured by varying prompt danger levels with clinician feedback.

From Fragments to Facts: A Curriculum-Driven DPO Approach for Generating Hindi News Veracity Explanations

cs.CL · 2025-07-07

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer