Minimal collectives of three LLM agents develop spontaneous cooperation, storage strategies, and complex evolving cultural artifacts via interaction with a decaying shared text store and evolutionary pressure.
hub
A Structured Self-attentive Sentence Embedding
12 Pith papers cite this work. Polarity classification is still indexing.
abstract
This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification, and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.
hub tools
citation-role summary
citation-polarity summary
polarities
unclear 2representative citing papers
FRACTAL integrates fractional recurrent architecture into SSMs using a tunable singularity index to capture multi-scale temporal features, reporting 87.11% average on Long Range Arena and outperforming S5.
Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.
VR head and hand motion data can be adapted to motion foundation models to classify cognitive states like confusion and hesitation at 82% accuracy with better cross-user generalization than baseline models on a new 24-participant dataset.
DropAttention regularizes attention weights in fully-connected self-attention networks to reduce overfitting and improve performance.
DG-STA builds dynamic graphs from hand skeletons, applies spatial-temporal self-attention to learn features, and uses a mask to cut cost by 99%, outperforming prior methods on DHG-14/28 and SHREC'17.
DMPP models spatio-temporal event intensity as a deep NN-weighted mixture of kernels to incorporate high-dimensional context while keeping likelihood integration tractable.
PRISM learns shared sentiment prototypes to enable structured cross-modal comparison and dynamic modality reweighting in multimodal sentiment analysis, outperforming baselines on three benchmark datasets.
Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.
AMAD is an end-to-end model using adversarial autoencoders and RNNs with attention for multiscale anomaly detection on time-evolving high-dimensional categorical data.
Pith review generated a malformed one-line summary.
Standard NLP classifiers can surface valid injury precursors from raw construction safety reports.
citing papers explorer
No citing papers match the current filters.