Transformers learn connectivity on low-dimensional grid graphs but fail on high-dimensional grids or graphs with many disconnected components, with larger models showing better generalization on grids.
Entropy law: The story behind data compression and llm performance
5 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 5representative citing papers
DynaSteer is a dynamic representation editing framework that uses pattern clustering, Fisher-LDA, and lookahead entropy monitoring to steer LLM reasoning trajectories toward truth on MATH and coding tasks.
MIST models up to 10x larger than prior work, fine-tuned on over 400 structure-property tasks, match or exceed SOTA on benchmarks and demonstrate zero-shot olfactory perception mapping consistent with hyperbolic geometry.
YouZhi-LLM applies a layer-adaptive GQA-to-MLA transition plus Ascend-specific distillation and fine-tuning to reduce KV-cache size, yielding up to 2.69× higher concurrency and modest gains on financial benchmarks versus base models.
Selecting preference pairs whose DPO implicit reward gap is small yields better LLM alignment than random or baseline selection while using only 10% of the data.
citing papers explorer
-
Transformers Can Learn Connectivity in Some Graphs but Not Others
Transformers learn connectivity on low-dimensional grid graphs but fail on high-dimensional grids or graphs with many disconnected components, with larger models showing better generalization on grids.
-
Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories
DynaSteer is a dynamic representation editing framework that uses pattern clustering, Fisher-LDA, and lookahead entropy monitoring to steer LLM reasoning trajectories toward truth on MATH and coding tasks.
-
Foundation Models for Discovery and Exploration in Chemical Space
MIST models up to 10x larger than prior work, fine-tuned on over 400 structure-property tasks, match or exceed SOTA on benchmarks and demonstrate zero-shot olfactory perception mapping consistent with hyperbolic geometry.
-
YouZhi: Towards High-Concurrency Financial LLMs via Adaptive GQA-to-MLA Transition
YouZhi-LLM applies a layer-adaptive GQA-to-MLA transition plus Ascend-specific distillation and fine-tuning to reduce KV-cache size, yielding up to 2.69× higher concurrency and modest gains on financial benchmarks versus base models.
-
Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
Selecting preference pairs whose DPO implicit reward gap is small yields better LLM alignment than random or baseline selection while using only 10% of the data.