Entropy law: The story behind data compression and llm performance

Entropy law: The story behind data compression, llm performance , author= · 2024 · arXiv 2407.06645

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

Transformers Can Learn Connectivity in Some Graphs but Not Others

cs.CL · 2025-09-26 · unverdicted · novelty 7.0

Transformers learn connectivity on low-dimensional grid graphs but fail on high-dimensional grids or graphs with many disconnected components, with larger models showing better generalization on grids.

Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories

cs.AI · 2026-06-26 · unverdicted · novelty 6.0 · 2 refs

DynaSteer is a dynamic representation editing framework that uses pattern clustering, Fisher-LDA, and lookahead entropy monitoring to steer LLM reasoning trajectories toward truth on MATH and coding tasks.

Foundation Models for Discovery and Exploration in Chemical Space

physics.chem-ph · 2025-10-20 · unverdicted · novelty 6.0

MIST models up to 10x larger than prior work, fine-tuned on over 400 structure-property tasks, match or exceed SOTA on benchmarks and demonstrate zero-shot olfactory perception mapping consistent with hyperbolic geometry.

YouZhi: Towards High-Concurrency Financial LLMs via Adaptive GQA-to-MLA Transition

cs.CL · 2026-06-04 · unverdicted · novelty 5.0

YouZhi-LLM applies a layer-adaptive GQA-to-MLA transition plus Ascend-specific distillation and fine-tuning to reduce KV-cache size, yielding up to 2.69× higher concurrency and modest gains on financial benchmarks versus base models.

Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap

cs.CL · 2025-08-06 · unverdicted · novelty 5.0

Selecting preference pairs whose DPO implicit reward gap is small yields better LLM alignment than random or baseline selection while using only 10% of the data.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Transformers Can Learn Connectivity in Some Graphs but Not Others cs.CL · 2025-09-26 · unverdicted · none · ref 27
Transformers learn connectivity on low-dimensional grid graphs but fail on high-dimensional grids or graphs with many disconnected components, with larger models showing better generalization on grids.
Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories cs.AI · 2026-06-26 · unverdicted · none · ref 19 · 2 links
DynaSteer is a dynamic representation editing framework that uses pattern clustering, Fisher-LDA, and lookahead entropy monitoring to steer LLM reasoning trajectories toward truth on MATH and coding tasks.
Foundation Models for Discovery and Exploration in Chemical Space physics.chem-ph · 2025-10-20 · unverdicted · none · ref 285
MIST models up to 10x larger than prior work, fine-tuned on over 400 structure-property tasks, match or exceed SOTA on benchmarks and demonstrate zero-shot olfactory perception mapping consistent with hyperbolic geometry.
YouZhi: Towards High-Concurrency Financial LLMs via Adaptive GQA-to-MLA Transition cs.CL · 2026-06-04 · unverdicted · none · ref 61
YouZhi-LLM applies a layer-adaptive GQA-to-MLA transition plus Ascend-specific distillation and fine-tuning to reduce KV-cache size, yielding up to 2.69× higher concurrency and modest gains on financial benchmarks versus base models.
Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap cs.CL · 2025-08-06 · unverdicted · none · ref 50
Selecting preference pairs whose DPO implicit reward gap is small yields better LLM alignment than random or baseline selection while using only 10% of the data.

Entropy law: The story behind data compression and llm performance

fields

years

verdicts

representative citing papers

citing papers explorer