{"total":12,"items":[{"citing_arxiv_id":"2606.30668","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Emergent Culture in Minimal LLM Systems","primary_cat":"cs.NE","submitted_at":"2026-06-21T15:56:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Minimal collectives of three LLM agents develop spontaneous cooperation, storage strategies, and complex evolving cultural artifacts via interaction with a decaying shared text store and evolutionary pressure.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08833","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences","primary_cat":"cs.AI","submitted_at":"2026-05-09T09:38:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"FRACTAL integrates fractional recurrent architecture into SSMs using a tunable singularity index to capture multi-scale temporal features, reporting 87.11% average on Long Range Arena and outperforming S5.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"We verify ⟨pm, pn⟩µ(t) = δmn. Step 1: Expand the inner product: ⟨pm, pn⟩µ(t) = γmγn · 1 − α t1−α Z t 0 P (−α,0) m \u00122x t − 1 \u0013 P (−α,0) n \u00122x t − 1 \u0013 (t − x)−α dx. (36) Step 2: Substitute y = 2x t − 1 with dx = t 2 dy and (t − x)−α = ( t 2)−α(1 − y)−α: = γmγn · 1 − α t1−α · \u0012 t 2 \u00131−αZ 1 −1 P (−α,0) m (y)P (−α,0) n (y)(1 − y)−α dy (37) = γmγn · (1 − α) · 2α−1 · h(−α,0) n δmn. (38) 13 FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences Step 3: For m = n, substituting h(−α,0) n = 21−α 2n+1−α: ⟨pn, pn⟩µ(t) = γ2 n · (1 − α) · 2α−1 · 21−α 2n + 1 − α = γ2 n · 1 − α 2n + 1 − α . (39) Setting this equal to 1 yields γn = q 2n+1−α 1−α . Corollary D.5 (LegS Recovery). When α = 0: γn = √2n + 1 and P (0,0)"},{"citing_arxiv_id":"2604.05873","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis","primary_cat":"cs.MM","submitted_at":"2026-04-07T13:31:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PRISM learns shared sentiment prototypes to enable structured cross-modal comparison and dynamic modality reweighting in multimodal sentiment analysis, outperforming baselines on three benchmark datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.24255","ref_index":17,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Cognitive State Inference from VR Motion via Motion Foundation Model","primary_cat":"cs.HC","submitted_at":"2025-09-29T03:59:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VR head and hand motion data can be adapted to motion foundation models to classify cognitive states like confusion and hesitation at 82% accuracy with better cross-user generalization than baseline models on a new 24-participant dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.11769","ref_index":51,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Automatically Learning Construction Injury Precursors from Text","primary_cat":"cs.CL","submitted_at":"2019-07-26T19:43:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Standard NLP classifiers can surface valid injury precursors from raw construction safety reports.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.11065","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks","primary_cat":"cs.CL","submitted_at":"2019-07-25T14:03:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DropAttention regularizes attention weights in fully-connected self-attention networks to reduce overfitting and improve performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.08871","ref_index":17,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Construct Dynamic Graphs for Hand Gesture Recognition via Spatial-Temporal Attention","primary_cat":"cs.CV","submitted_at":"2019-07-20T22:24:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DG-STA builds dynamic graphs from hand skeletons, applies spatial-temporal self-attention to learn features, and uses a mask to cut cost by 99%, outperforming prior methods on DHG-14/28 and SHREC'17.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.06582","ref_index":17,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AMAD: Adversarial Multiscale Anomaly Detection on High-Dimensional and Time-Evolving Categorical Data","primary_cat":"cs.LG","submitted_at":"2019-07-12T05:51:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"AMAD is an end-to-end model using adversarial autoencoders and RNNs with attention for multiscale anomaly detection on time-evolving high-dimensional categorical data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.08952","ref_index":25,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Deep Mixture Point Processes: Spatio-temporal Event Prediction with Rich Contextual Information","primary_cat":"stat.ML","submitted_at":"2019-06-21T05:29:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DMPP models spatio-temporal event intensity as a deep NN-weighted mixture of kernels to incorporate high-dimensional context while keeping likelihood integration tractable.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1807.03819","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Universal Transformers","primary_cat":"cs.CL","submitted_at":"2018-07-10T18:39:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1710.10903","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Graph Attention Networks","primary_cat":"stat.ML","submitted_at":"2017-10-30T12:41:12+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1706.03762","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Attention Is All You Need","primary_cat":"cs.CL","submitted_at":"2017-06-12T17:57:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"UNKNOWN","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Pith review generated a malformed one-line summary.","context_count":1,"top_context_role":"other","top_context_polarity":"unclear","context_text":"[19] Yoon Kim, Carl Denton, Luong Hoang, and Alexander M. Rush. Structured attention networks. In International Conference on Learning Representations , 2017. [20] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015. [21] Oleksii Kuchaiev and Boris Ginsburg. Factorization tricks for LSTM networks. arXiv preprint arXiv:1703.10722, 2017. [22] Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130, 2017. [23] Minh-Thang Luong, Quoc V . Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. Multi-task sequence to sequence learning. arXiv preprint arXiv:1511.06114, 2015."}],"limit":50,"offset":0}