A structured self-attentive sentence embedding.arXiv preprint arXiv:1703.03130

Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, Yoshua Bengio · 2017 · arXiv 1703.03130

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

FRACTAL integrates fractional recurrent architecture into SSMs using a tunable singularity index to capture multi-scale temporal features, reporting 87.11% average on Long Range Arena and outperforming S5.

Graph Attention Networks

stat.ML · 2017-10-30 · accept · novelty 7.0

Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.

Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis

cs.MM · 2026-04-07 · unverdicted · novelty 6.0

PRISM learns shared sentiment prototypes to enable structured cross-modal comparison and dynamic modality reweighting in multimodal sentiment analysis, outperforming baselines on three benchmark datasets.

Universal Transformers

cs.CL · 2018-07-10 · unverdicted · novelty 6.0

Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

Attention Is All You Need

cs.CL · 2017-06-12 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

citing papers explorer

Showing 5 of 5 citing papers.

FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences cs.AI · 2026-05-09 · unverdicted · none · ref 38
FRACTAL integrates fractional recurrent architecture into SSMs using a tunable singularity index to capture multi-scale temporal features, reporting 87.11% average on Long Range Arena and outperforming S5.
Graph Attention Networks stat.ML · 2017-10-30 · accept · none · ref 11
Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.
Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis cs.MM · 2026-04-07 · unverdicted · none · ref 30
PRISM learns shared sentiment prototypes to enable structured cross-modal comparison and dynamic modality reweighting in multimodal sentiment analysis, outperforming baselines on three benchmark datasets.
Universal Transformers cs.CL · 2018-07-10 · unverdicted · none · ref 17
Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.
Attention Is All You Need cs.CL · 2017-06-12 · unverdicted · none · ref 22
Pith review generated a malformed one-line summary.

A structured self-attentive sentence embedding.arXiv preprint arXiv:1703.03130

fields

years

verdicts

representative citing papers

citing papers explorer