pith. machine review for the scientific record. sign in

A structured self-attentive sentence embedding.arXiv preprint arXiv:1703.03130

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

Graph Attention Networks

stat.ML · 2017-10-30 · accept · novelty 7.0

Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.

Universal Transformers

cs.CL · 2018-07-10 · unverdicted · novelty 6.0

Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

Attention Is All You Need

cs.CL · 2017-06-12 · unverdicted · novelty 5.0

Pith review generated a malformed one-line summary.

citing papers explorer

Showing 5 of 5 citing papers.

  • FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences cs.AI · 2026-05-09 · unverdicted · none · ref 38

    FRACTAL integrates fractional recurrent architecture into SSMs using a tunable singularity index to capture multi-scale temporal features, reporting 87.11% average on Long Range Arena and outperforming S5.

  • Graph Attention Networks stat.ML · 2017-10-30 · accept · none · ref 11

    Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.

  • Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis cs.MM · 2026-04-07 · unverdicted · none · ref 30

    PRISM learns shared sentiment prototypes to enable structured cross-modal comparison and dynamic modality reweighting in multimodal sentiment analysis, outperforming baselines on three benchmark datasets.

  • Universal Transformers cs.CL · 2018-07-10 · unverdicted · none · ref 17

    Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.

  • Attention Is All You Need cs.CL · 2017-06-12 · unverdicted · none · ref 22

    Pith review generated a malformed one-line summary.