pith. sign in

An exploration of hierarchical attention transformers for efficient long document classification

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.LG 2 cs.AI 1

years

2026 2 2023 1

roles

background 1

polarities

background 1

clear filters

representative citing papers

Hierarchical Attention via Domain Decomposition

cs.LG · 2026-06-16 · unverdicted · novelty 6.0

A two-level overlapping Schwarz domain decomposition constructs a hierarchical attention operator that trains faster and approximates the inverse of a discretized 1D diffusion operator more accurately than global low-rank attention while using fewer parameters.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • NEST: Nested Event Stream Transformer for Sequences of Multisets cs.LG · 2026-01-31 · unverdicted · none · ref 3

    NEST is a nested transformer for sequences of multisets that uses masked set modeling to learn improved set-level representations from hierarchical event streams like EHRs.

  • Hierarchical Attention via Domain Decomposition cs.LG · 2026-06-16 · unverdicted · none · ref 16

    A two-level overlapping Schwarz domain decomposition constructs a hierarchical attention operator that trains faster and approximates the inverse of a discretized 1D diffusion operator more accurately than global low-rank attention while using fewer parameters.