Mixed citations

Title resolution pending

Anthony Hughes, Nikolaos Aletras, Ning Ma · 2025 · DOI 10.18653/v1/2025.emnlp-

Mixed citation behavior. Most common role is background (40%).

12 Pith papers citing it

Background 40% of classified citations

open at publisher browse 12 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2 method 2 baseline 1

citation-polarity summary

background 2 use method 2 baseline 1

representative citing papers

C-Mining: Unsupervised Discovery of Seeds for Cultural Data Synthesis via Geometric Misalignment

cs.CL · 2026-04-17 · unverdicted · novelty 7.0

C-Mining automatically mines high-fidelity Culture Points from raw multilingual text by treating cross-lingual geometric isolation in embeddings as a quantifiable signal for cultural specificity, then uses them to synthesize better instruction data.

What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook

cs.SE · 2026-05-08 · unverdicted · novelty 6.0

Empirical analysis of 4707 MoltBook posts shows AI-only technical discourse focuses on security, trust, and abstract topics while lacking concrete runtime and project details found in human GitHub discussions.

LLM-PRISM: Characterizing Silent Data Corruption from Permanent GPU Faults in LLM Training

cs.AR · 2026-04-12 · unverdicted · novelty 6.0

LLMs resist low-frequency permanent GPU faults but certain datapaths and precision formats trigger catastrophic training divergence even at moderate fault rates.

Entropy-Gradient Grounding: Training-Free Evidence Retrieval in Vision-Language Models

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

Entropy-gradient grounding uses model uncertainty to retrieve evidence regions in VLMs, improving performance on detail-critical and compositional tasks across multiple architectures.

From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

RETINA-SAFE benchmark and ECRT two-stage triage improve hallucination risk detection in medical LLMs for retinal decisions by 0.15-0.19 balanced accuracy over baselines using internal representations and logit shifts.

Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight

cs.AI · 2025-12-22 · conditional · novelty 6.0

Physician oversight reveals high error rates in LLM-generated labels for a clinical benchmark and demonstrates that corrected labels improve both evaluation accuracy and downstream model training.

Can Large Language Models Really Recognize Your Name?

cs.CR · 2025-05-20 · unverdicted · novelty 6.0

LLMs exhibit 20-40% lower recall on ambiguous human names for PII detection, worsening under prompt injections, as shown via the new AmBench benchmark.

Starve to Perceive: Taming Lazy Perception in VLMs with Constrained Visual Bandwidth

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

Constraining visual token budget per observation during VLM training forces genuine active perception and delivers 5% average relative improvement without auxiliary losses or architecture changes.

CroSearch-R1: Better Leveraging Cross-lingual Knowledge for Retrieval-Augmented Generation

cs.CL · 2026-04-28 · unverdicted · novelty 5.0

CroSearch-R1 applies search-augmented RL with cross-lingual integration and multilingual rollouts to improve RAG effectiveness on multilingual collections.

Evaluation of LLM-Based Software Engineering Tools: Practices, Challenges, and Future Directions

cs.SE · 2026-04-27 · unverdicted · novelty 4.0

LLM-based SE tools lack stable ground truth and deterministic outputs, making standard evaluation assumptions invalid and requiring new approaches for reliable assessment.

GraphMind: From Operational Traces to Self-Evolving Workflow Automation

cs.AI · 2026-05-17

Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization

cs.CV · 2026-05-11

citing papers explorer

Showing 12 of 12 citing papers.

C-Mining: Unsupervised Discovery of Seeds for Cultural Data Synthesis via Geometric Misalignment cs.CL · 2026-04-17 · unverdicted · none · ref 40
C-Mining automatically mines high-fidelity Culture Points from raw multilingual text by treating cross-lingual geometric isolation in embeddings as a quantifiable signal for cultural specificity, then uses them to synthesize better instruction data.
What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook cs.SE · 2026-05-08 · unverdicted · none · ref 42
Empirical analysis of 4707 MoltBook posts shows AI-only technical discourse focuses on security, trust, and abstract topics while lacking concrete runtime and project details found in human GitHub discussions.
LLM-PRISM: Characterizing Silent Data Corruption from Permanent GPU Faults in LLM Training cs.AR · 2026-04-12 · unverdicted · none · ref 21
LLMs resist low-frequency permanent GPU faults but certain datapaths and precision formats trigger catastrophic training divergence even at moderate fault rates.
Entropy-Gradient Grounding: Training-Free Evidence Retrieval in Vision-Language Models cs.CV · 2026-04-09 · unverdicted · none · ref 28
Entropy-gradient grounding uses model uncertainty to retrieve evidence regions in VLMs, improving performance on detail-critical and compositional tasks across multiple architectures.
From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs cs.AI · 2026-04-07 · unverdicted · none · ref 13
RETINA-SAFE benchmark and ECRT two-stage triage improve hallucination risk detection in medical LLMs for retinal decisions by 0.15-0.19 balanced accuracy over baselines using internal representations and logit shifts.
Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight cs.AI · 2025-12-22 · conditional · none · ref 19
Physician oversight reveals high error rates in LLM-generated labels for a clinical benchmark and demonstrates that corrected labels improve both evaluation accuracy and downstream model training.
Can Large Language Models Really Recognize Your Name? cs.CR · 2025-05-20 · unverdicted · none · ref 35
LLMs exhibit 20-40% lower recall on ambiguous human names for PII detection, worsening under prompt injections, as shown via the new AmBench benchmark.
Starve to Perceive: Taming Lazy Perception in VLMs with Constrained Visual Bandwidth cs.CV · 2026-05-18 · unverdicted · none · ref 23
Constraining visual token budget per observation during VLM training forces genuine active perception and delivers 5% average relative improvement without auxiliary losses or architecture changes.
CroSearch-R1: Better Leveraging Cross-lingual Knowledge for Retrieval-Augmented Generation cs.CL · 2026-04-28 · unverdicted · none · ref 12
CroSearch-R1 applies search-augmented RL with cross-lingual integration and multilingual rollouts to improve RAG effectiveness on multilingual collections.
Evaluation of LLM-Based Software Engineering Tools: Practices, Challenges, and Future Directions cs.SE · 2026-04-27 · unverdicted · none · ref 23
LLM-based SE tools lack stable ground truth and deterministic outputs, making standard evaluation assumptions invalid and requiring new approaches for reliable assessment.
GraphMind: From Operational Traces to Self-Evolving Workflow Automation cs.AI · 2026-05-17 · unreviewed · ref 27
Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization cs.CV · 2026-05-11 · unreviewed · ref 39

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer