{"total":22,"items":[{"citing_arxiv_id":"2606.12928","ref_index":45,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Continuum Neural Momentum Eigenstate for Variationally Solving Quasiparticles","primary_cat":"cond-mat.quant-gas","submitted_at":"2026-06-11T05:42:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"EVE is a neural quantum state that enforces exact momentum eigenstates by construction, allowing VMC to variationally solve quasiparticle states across multiple phases in 2D interacting bosons.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01172","ref_index":115,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Revisiting Neural Processes via Fourier Transform and Volterra Series","primary_cat":"cs.LG","submitted_at":"2026-05-31T11:27:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces SFConvCNPs and SFVConvCNPs using set Fourier convolutions and Volterra expansions for translation-equivariant neural processes on irregular data with global receptive fields and linear scaling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20635","ref_index":69,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"The General Theory of Localization Methods","primary_cat":"cs.LG","submitted_at":"2026-05-20T02:42:14+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"non-negativity ofK. Eq.(2) is only a reference for comparison with the feature mapping of kernel methods and is not the only form that the local feature mapping must adopt. An alternative formulation defines the kernel via K(x, x′) = e⟨ϕ(x),ψ(x′)⟩,(3) or, more generally,K(x, x ′) =F(ϕ(x), ψ(x ′)), whereFis a well-defined binary function, called therelation function[69, 89, 116]. Meanwhile, the feature spaceHneed not be restricted to a Hilbert space. Definition 2.4 reveals another significant difference between localization methods and kernel methods: the localization kernel can correspond to two different feature mappings (which can be respectively called the \"query-feature mapping\" and the \"key-feature mapping\" of the kernel)."},{"citing_arxiv_id":"2605.06578","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Resource-Efficient CSI Prediction: A Gated Fusion and Factorized Projection Approach","primary_cat":"eess.SP","submitted_at":"2026-05-07T17:07:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A gated-fusion CSI predictor using GRU, attention, and DSLH reaches -13.84 dB NMSE with 26% fewer parameters and 2.3x higher throughput than a LinFormer baseline on 3GPP channels.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.22374","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Selective Contrastive Learning For Gloss Free Sign Language Translation","primary_cat":"cs.CL","submitted_at":"2026-04-24T09:08:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A pair selection strategy based on negative similarity dynamics strengthens contrastive supervision in gloss-free sign language translation by reducing noisy negatives.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21088","ref_index":56,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Jet Quenching Identification via Supervised Learning in Simulated Heavy-Ion Collisions","primary_cat":"hep-ph","submitted_at":"2026-04-22T21:11:47+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.02451","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Skeleton-based Coherence Modeling in Narratives","primary_cat":"cs.CL","submitted_at":"2026-04-02T18:35:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Sentence-level models outperform skeleton-based approaches for narrative coherence despite a new SSN network improving on cosine and Euclidean baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.12370","ref_index":33,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Positional Encoding in Transformer-Based Time Series Models: A Survey","primary_cat":"cs.LG","submitted_at":"2025-02-17T23:21:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A survey of positional encoding methods in transformer-based time series models that evaluates fixed, learnable, relative, and hybrid approaches on classification tasks and links effectiveness to data characteristics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2408.00118","ref_index":138,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Gemma 2: Improving Open Language Models at a Practical Size","primary_cat":"cs.CL","submitted_at":"2024-07-31T19:13:07+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.11769","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Automatically Learning Construction Injury Precursors from Text","primary_cat":"cs.CL","submitted_at":"2019-07-26T19:43:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Standard NLP classifiers can surface valid injury precursors from raw construction safety reports.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.11512","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Investigating Self-Attention Network for Chinese Word Segmentation","primary_cat":"cs.CL","submitted_at":"2019-07-26T12:29:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Self-attention networks achieve competitive results to BiLSTM-CRF on Chinese word segmentation, with BERT and word integration yielding the best reported performance on six heterogeneous domain benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.07449","ref_index":44,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"OGNet: Salient Object Detection with Output-guided Attention Module","primary_cat":"cs.CV","submitted_at":"2019-07-17T11:36:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"OGNet proposes an output-guided attention module from multi-scale outputs and an intractable area F-measure loss to enhance salient object detection in edges and confusing areas while remaining lightweight.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.07769","ref_index":54,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Hierarchical Sequence to Sequence Voice Conversion with Limited Data","primary_cat":"eess.AS","submitted_at":"2019-07-15T07:54:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Hierarchical seq2seq model for parallel voice conversion pretrained as autoencoder on single-speaker data then adapted to limited multispeaker data, using mel spectrograms converted via wavenet vocoder.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.06205","ref_index":6,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Automatic Repair and Type Binding of Undeclared Variables using Neural Networks","primary_cat":"cs.SE","submitted_at":"2019-07-14T11:14:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Neural network trained on AST structural details repairs undeclared variable errors and infers types, reporting 81% success on location/identification and 80% on types for 1059 programs in the prutor dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.02226","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Graph-based Knowledge Distillation by Multi-head Attention Network","primary_cat":"cs.LG","submitted_at":"2019-07-04T05:29:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Multi-head attention constructs a graph of dataset relations from the teacher embedding procedure and transfers it to the student via multi-task learning, yielding 7.05% higher CIFAR-100 accuracy than the student alone and 2.46% above prior SOTA.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.00570","ref_index":12,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Do Transformer Attention Heads Provide Transparency in Abstractive Summarization?","primary_cat":"cs.CL","submitted_at":"2019-07-01T06:46:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Analysis of transformer attention heads in abstractive summarization shows specialization in some heads and proposes a method to measure model reliance on learned attention distributions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.10910","ref_index":28,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Creating A Neural Pedagogical Agent by Jointly Learning to Review and Assess","primary_cat":"cs.LG","submitted_at":"2019-06-26T08:37:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Bidirectional RNN with attention models real-time user knowledge from question-response sequences to predict correctness, outperforming baselines especially for new users on a large TOEIC mobile app dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.10907","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Leveraging Text Repetitions and Denoising Autoencoders in OCR Post-correction","primary_cat":"cs.CL","submitted_at":"2019-06-26T08:28:51+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Error distributions estimated from text repetitions enable training of denoising autoencoders that improve OCR post-correction on historical Finnish newspapers without manual training data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.08584","ref_index":22,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Improving Zero-shot Translation with Language-Independent Constraints","primary_cat":"cs.CL","submitted_at":"2019-06-20T12:49:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Language-independent constraints and regularization in multilingual Transformer NMT yield a 2.23 BLEU average gain on zero-shot pairs from the IWSLT 2017 dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.08089","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Predicting Drug Responses by Propagating Interactions through Text-Enhanced Drug-Gene Networks","primary_cat":"cs.SI","submitted_at":"2019-06-19T13:23:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A text-enhanced drug-gene network is constructed from articles and data, with edge embeddings estimated from cell line records to enable explainable drug sensitivity predictions at 94.74% accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1804.03999","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Attention U-Net: Learning Where to Look for the Pancreas","primary_cat":"cs.CV","submitted_at":"2018-04-11T14:13:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Attention gates added to U-Net automatically focus on target organs in CT images and improve segmentation performance on abdominal datasets.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"The gating vector contains contextual information to prune lower-level feature responses as suggested in [32], which uses AGs for natural image classiﬁcation. We use additive attention [2] to obtain the gating coefﬁcient. Although this is computationally more expensive, it has experimentally shown to achieve higher accuracy than multiplicative attention [19]. Additive attention is formulated as follows: ql att = ψT( σ1 ( W T x xl i + W T g gi + bg ) ) + bψ (1) αl i = σ2( ql att(xl i , g i ; Θatt) ), (2) where σ2(xi,c) = 1 1+exp(−xi,c) correspond to sigmoid activation function. AG is characterised by a set of parameters Θatt containing: linear transformations Wx∈ RFl×Fint, Wg∈ RFg×Fint, ψ∈ RFint×1 and bias terms bψ∈ R , bg∈ RFint."},{"citing_arxiv_id":"1706.03762","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Attention Is All You Need","primary_cat":"cs.CL","submitted_at":"2017-06-12T17:57:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"UNKNOWN","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Pith review generated a malformed one-line summary.","context_count":1,"top_context_role":"other","top_context_polarity":"unclear","context_text":"[22] Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130, 2017. [23] Minh-Thang Luong, Quoc V . Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. Multi-task sequence to sequence learning. arXiv preprint arXiv:1511.06114, 2015. [24] Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches to attention- based neural machine translation. arXiv preprint arXiv:1508.04025, 2015. [25] Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 19(2):313-330, 1993."}],"limit":50,"offset":0}