An 800K-parameter Lattice Deduction Transformer reaches 100% accuracy on Sudoku-Extreme and Snowflake Sudoku and 99.9% on Maze-Hard by using lattice projections and abstract-interpretation supervision, while frontier LLMs score 0%.
BERT: Pre-training of deep bidirectional transformers for language understanding
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
GD-FPS is a gradient-free, forward-pass-only parameter selection method for PEFT that identifies important weights by scaling magnitudes with relative activation growth against a pre-training anchor, matching or beating gradient-based baselines on 26 visual tasks while cutting memory by ~18x and run
citing papers explorer
-
Lattice Deduction Transformers
An 800K-parameter Lattice Deduction Transformer reaches 100% accuracy on Sudoku-Extreme and Snowflake Sudoku and 99.9% on Maze-Hard by using lattice projections and abstract-interpretation supervision, while frontier LLMs score 0%.
-
GD-FPS: Growth-Driven Feedforward Parameter Selection for Efficient Fine-Tuning
GD-FPS is a gradient-free, forward-pass-only parameter selection method for PEFT that identifies important weights by scaling magnitudes with relative activation growth against a pre-training anchor, matching or beating gradient-based baselines on 26 visual tasks while cutting memory by ~18x and run