pith. sign in

Reliable and

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

years

2026 10

clear filters

representative citing papers

AGC-Bench: Measuring Artificial General Creativity

cs.CL · 2026-07-01 · unverdicted · novelty 6.0 · 2 refs

AGC-Bench introduces a multi-domain creativity benchmark for LLMs, recovers a general 'c' factor explaining 81.5% of variance, and finds humans still outperform top models on matched tasks.

Quality Is Not a Safety Proxy Under Quantization

cs.LG · 2026-06-08 · conditional · novelty 6.0

Across 51 quantized checkpoints, quality metrics fail to predict safety drops in 36 pairings and 10 hidden-danger cases, while a new RTSI screen routes all 10 dangerous rows to testing at matched bucket size.

Why Do Safety Guardrails Degrade Across Languages?

cs.CL · 2026-05-16 · conditional · novelty 6.0

A latent variable IRT framework decouples four safety-driving factors across 61 model configurations and 10 languages using 1.9 million evaluations, revealing that safety is largely unidimensional and that high cross-lingual gaps cluster in physical harm prompts and lower-resource languages.

Latent Confidence Alignment for LLM Self-Assessment

cs.CY · 2026-06-20 · unverdicted · novelty 5.0

LCAE is introduced as a Rasch-model metric that aligns LLM self-reported confidence with latent error probability derived from ability and item difficulty, shown to improve calibration on a medical dataset across 20 models.

citing papers explorer

Showing 8 of 8 citing papers after filters.