Search modeWe load the original self-attention model and, for each pass over D with a given mask z, handle any layer ℓ that requires hybrid SA/SW A heads as follows

· 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

BOSCH: Black-Box Binary Optimization for Short-Context Attention-Head Selection in LLMs

cs.CL · 2026-04-07 · unverdicted · novelty 7.0

BOSCH decomposes attention-head selection for short-context hybridization into layer probing, adaptive ratio assignment, and grouped binary optimization, yielding better efficiency-performance tradeoffs than static or layer-wise baselines.

citing papers explorer

Showing 1 of 1 citing paper.

BOSCH: Black-Box Binary Optimization for Short-Context Attention-Head Selection in LLMs cs.CL · 2026-04-07 · unverdicted · none · ref 9
BOSCH decomposes attention-head selection for short-context hybridization into layer probing, adaptive ratio assignment, and grouped binary optimization, yielding better efficiency-performance tradeoffs than static or layer-wise baselines.

Search modeWe load the original self-attention model and, for each pass over D with a given mask z, handle any layer ℓ that requires hybrid SA/SW A heads as follows

fields

years

verdicts

representative citing papers

citing papers explorer