TraceGuard formulates antidistillation as a detectability-constrained Stackelberg game and poisons sparsely located thought anchors via branching-token detection to degrade student models while preserving trace quality.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
DialectLLM generates parallel multi-dialect dialog data and a 50k-dialog benchmark showing frontier LLMs achieve under 70% accuracy on dialect tasks while the generated data can improve post-training.
Image autoregressive models leak substantially more training data than diffusion models under membership inference, dataset inference with as few as 4 samples, and data extraction attacks.
AttriCoT is a black-box algorithm that attributes causal importance to units in a specific CoT trace via a structural causal model estimated with linear forward passes.
Introduces a benchmark with 34,560 instances for selective QA over conflicting multi-source personal memory and compares fusion methods against LLMs.
RE-TAB uses a deterministic LCS-based table-state reward for stepwise guidance and test-time scaling, raising LLM table-reasoning accuracy by 26.7 pp on average across six backbones and three benchmarks.
citing papers explorer
-
Local Causal Attribution of Chain-of-Thought Reasoning
AttriCoT is a black-box algorithm that attributes causal importance to units in a specific CoT trace via a structural causal model estimated with linear forward passes.