hub Canonical reference

Ellie Pavlick and Tom Kwiatkowski

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, Ming Zhou · 2020 · DOI 10.18653/v1/2020

Canonical reference. 85% of citing Pith papers cite this work as background.

49 Pith papers citing it

Background 85% of classified citations

open at publisher browse 49 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 12 method 1

citation-polarity summary

background 11 extend 1 unclear 1

representative citing papers

Pretraining Exposure Explains Popularity Judgments in Large Language Models

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

LLM popularity judgments align more closely with pretraining data exposure counts than with Wikipedia popularity, with stronger effects in pairwise comparisons and larger models.

SafePyramid: A Hierarchical Benchmark for In-context Policy Guardrailing

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

SafePyramid is a three-level benchmark showing frontier LLMs identify all violated rules in only 54.0%, 35.3%, and 12.9% of cases on L0, L1, and L2 respectively, indicating in-context policy guardrailing remains difficult.

Linguistically Informed Multimodal Fusion for Vietnamese Scene-Text Image Captioning: Dataset, Graph Framework, and Phonological Attention

cs.CV · 2026-04-30 · unverdicted · novelty 7.0

Introduces ViTextCaps dataset and PhonoSTFG phonological graph fusion framework for Vietnamese scene-text image captioning, showing cross-modal graph edges harm performance.

LASQ: A Low-resource Aspect-based Sentiment Quadruple Extraction Dataset

cs.CL · 2026-04-12 · unverdicted · novelty 7.0

LASQ is a new quadruple extraction dataset for Uzbek and Uyghur that includes a syntax-aware model showing gains over baselines on the task.

Scaling Laws for Cross-Encoder Reranking

cs.IR · 2026-03-05 · unverdicted · novelty 7.0

Cross-encoder reranker performance scales predictably via power laws with model size and training exposure, allowing accurate forecasts for 400M and 1B models and data-heavy compute allocation.

The Challenge and Reward of Fair Play in Narrative: A Computational Approach

cs.CL · 2025-07-18 · unverdicted · novelty 7.0

Develops an information-theoretic framework showing surprise and coherence trade off in single reader models but coexist via pre- and post-revelation modes, operationalized as reference-less LLM metrics for fair play and validated on generated stories plus classic detective fiction.

Accelerating Large Language Model Decoding with Speculative Sampling

cs.CL · 2023-02-02 · accept · novelty 7.0

Speculative sampling accelerates LLM decoding 2-2.5x by letting a draft model propose short sequences that the target model scores in parallel, then applies modified rejection sampling to keep the exact target distribution.

Multitask Prompted Training Enables Zero-Shot Task Generalization

cs.LG · 2021-10-15 · conditional · novelty 7.0

Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.

An Information-Geometric Justification for Composite Coherence in Event-Based Narrative Extraction

cs.IT · 2026-06-28 · unverdicted · novelty 6.0

The paper justifies the composite coherence metric in event-based narrative extraction via an information-geometric decomposition on the product manifold and an axiomatic uniqueness proof for the geometric mean.

TAVR-VLM: Risk-Conditioned Causal Grounding for Hallucination-Resistant Report Generation

cs.AI · 2026-06-25 · unverdicted · novelty 6.0

TAVR-VLM introduces Risk-Conditioned Causal Grounding Attention to achieve SOTA AUROC 0.896, CIDEr 0.936, and 8.1% hallucination rate on a 1,482-patient TAVR cohort.

STAR: Rethinking MoE Routing as Structure-Aware Subspace Learning

cs.AI · 2026-06-07 · unverdicted · novelty 6.0

STAR rethinks MoE routing as structure-aware subspace learning by adding a GHA-tracked principal subspace to standard routers, yielding more stable specialization and better performance on synthetic, language, and vision tasks.

Diagnosing Evidence Utilization in Long-Context and Retrieval-Augmented Language Models under Matched Evidence Conditions

cs.CL · 2026-06-04 · unverdicted · novelty 6.0

Introduces a matched four-condition protocol and ONCU metric to diagnose evidence utilization in long-context and RAG models across synthetic and multi-hop QA tasks.

Link Prediction or Perdition: the Seeds of Instability in Knowledge Graph Embeddings

cs.LG · 2026-06-02 · unverdicted · novelty 6.0

KGEMs for link prediction exhibit high instability in predictions and embeddings from initialization, negative sampling, and other factors, with better MRR not ensuring higher stability.

Not All Tokens Matter Equally: Dynamic In-context Vector Distillation with Decisive-Token Supervision for Long-form Medical Report Generation

cs.CL · 2026-05-26 · unverdicted · novelty 6.0

DIVE improves in-context vector distillation for medical report generation via decisive-token supervision on pathology terms and EOS plus state-conditioned dynamic steering, achieving top BLEU-4, ROUGE-L and RadGraph F1 on MIMIC-CXR and CheXpert Plus.

CLIF: Concept-Level Influence Functions for Transparent Bottleneck Models

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

CLIF applies influence functions to pinpoint influential samples and concepts in CBMs on CEBaB and Yelp datasets, enabling performance restoration via adjustments without retraining.

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

Test-time LLM feedback refines query embeddings to deliver up to 25% relative gains on zero-shot literature search, intent detection, and related benchmarks.

Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution

cs.NE · 2026-05-10 · unverdicted · novelty 6.0 · 2 refs

QD-LLM applies neuroevolution to prompt embeddings within a quality-diversity framework, producing 46% higher coverage and 41% higher QD-score than QDAIF on HumanEval, MBPP, and creative writing benchmarks.

Retrieval from Within: An Intrinsic Capability of Attention-Based Models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.

The Refusal--Compliance Tradeoff: A Large-Scale Safety Behavior Audit of Large Language Models

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

A large-scale audit of 21 LLMs on OR-Bench, XSTest, ToxiGen and BOLD using composition adjustment reveals distinct conservative vs permissive safety strategies, unequal demographic protection, and post-training stability within model families.

When AI reviews science: Can we trust the referee?

cs.AI · 2026-04-26 · unverdicted · novelty 6.0

AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference submissions.

CIR: Lightweight Container Image for Cross-Platform Deployment

cs.DC · 2026-04-12 · unverdicted · novelty 6.0

CIR is a cross-platform container image format for Python/R-style apps that defers dependency assembly to deployment, cutting image size by 95% and deployment time by 40-60% versus traditional bundled images.

Context-Aware Dialectal Arabic Machine Translation with Interactive Region and Register Selection

cs.CL · 2026-04-07 · unverdicted · novelty 6.0

A metadata-conditioned mT5 model trained on rule-augmented dialectal Arabic data produces translations that better match intended regional varieties than high-resource baselines, despite lower BLEU scores.

JU\'A -- A Benchmark for Information Retrieval in Brazilian Legal Text Collections

cs.IR · 2026-04-07 · accept · novelty 6.0

JU'A is a new heterogeneous benchmark for Brazilian legal IR that distinguishes retrieval methods and shows domain-adapted models excel on aligned subsets while BM25 stays competitive elsewhere.

TimelineReasoner: Advancing Timeline Summarization with Large Reasoning Models

cs.CL · 2026-04-03 · unverdicted · novelty 6.0

TimelineReasoner applies large reasoning models in a Global Cognition plus Detail Exploration loop to produce more accurate, complete, and coherent timelines from news than prior LLM-based methods.

citing papers explorer

Showing 4 of 4 citing papers after filters.

A Reproducible Benchmark and Evidence-Retrieval Software Framework for Silicon Detector R&D Literature physics.ins-det · 2026-06-23 · unreviewed · ref 22
CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning cs.CL · 2025-09-26 · unreviewed · ref 7
Clotho: Measuring Task-Specific Pre-Generation Test Adequacy for LLM Inputs cs.SE · 2025-09-22 · unreviewed · ref 10
Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models cs.LG · 2025-08-06 · unreviewed · ref 81

Ellie Pavlick and Tom Kwiatkowski

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer