some" task with SSA for model trained from scratch Figure 6: Heatmap showing the evolution of errors for the task

a learning rate of10 −4 for all models · 2022

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

SSA: Improving Performance With a Better Scoring Function

cs.CL · 2025-08-20 · unverdicted · novelty 5.0

Replacing Softmax with Scaled Signed Averaging in transformer attention improves generalization under distribution shifts for in-context learning and boosts results on NLP benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

SSA: Improving Performance With a Better Scoring Function cs.CL · 2025-08-20 · unverdicted · none · ref 19
Replacing Softmax with Scaled Signed Averaging in transformer attention improves generalization under distribution shifts for in-context learning and boosts results on NLP benchmarks.

some" task with SSA for model trained from scratch Figure 6: Heatmap showing the evolution of errors for the task

fields

years

verdicts

representative citing papers

citing papers explorer