An exploration of hierarchical attention transformers for efficient long document classification

Ilias Chalkidis, Xiang Dai, Manos Fergadiotis, Prodromos Malakasiotis, Desmond Elliott · 2022 · arXiv 2210.05529

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

NEST: Nested Event Stream Transformer for Sequences of Multisets

cs.LG · 2026-01-31 · unverdicted · novelty 7.0

NEST is a nested transformer for sequences of multisets that uses masked set modeling to learn improved set-level representations from hierarchical event streams like EHRs.

Hierarchical Attention via Domain Decomposition

cs.LG · 2026-06-16 · unverdicted · novelty 6.0

A two-level overlapping Schwarz domain decomposition constructs a hierarchical attention operator that trains faster and approximates the inverse of a discretized 1D diffusion operator more accurately than global low-rank attention while using fewer parameters.

The Rise and Potential of Large Language Model Based Agents: A Survey

cs.AI · 2023-09-14 · accept · novelty 4.0

The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

citing papers explorer

Showing 2 of 2 citing papers after filters.

NEST: Nested Event Stream Transformer for Sequences of Multisets cs.LG · 2026-01-31 · unverdicted · none · ref 3
NEST is a nested transformer for sequences of multisets that uses masked set modeling to learn improved set-level representations from hierarchical event streams like EHRs.
Hierarchical Attention via Domain Decomposition cs.LG · 2026-06-16 · unverdicted · none · ref 16
A two-level overlapping Schwarz domain decomposition constructs a hierarchical attention operator that trains faster and approximates the inverse of a discretized 1D diffusion operator more accurately than global low-rank attention while using fewer parameters.

An exploration of hierarchical attention transformers for efficient long document classification

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer