{"total":32,"items":[{"citing_arxiv_id":"2605.14055","ref_index":16,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts","primary_cat":"cs.CL","submitted_at":"2026-05-13T19:25:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"We analyze the joint optimization of(θ, α)under simultaneous projected SGD, proving two results: (i) the coupled updates converge at the standard non-convex SGD rate with an explicit coupling penalty, and (ii) the continuous-to-discrete gap introduced by PrefixNAS is controlled by entropy regularization. 7.2.1 Setup Letθ= (θ LoRA, θPrefix)and consider min θ, α∈∆ f(θ, α),(16) where∆ ={α: P k αk = 1, α k ≥0}. Updates follow simultaneous projected SGD: θt+1 =θ t −η tgt θ, α t+1 = Π∆ αt −η tgt α \u0001 ,(17) with gt θ, gt α unbiased stochastic gradients. Since Π∆ is non-expansive (∥Π∆(x)−Π ∆(y)∥ ≤ ∥x−y∥ ), the standard descent lemma applies without modification. 7.2.2 Assumptions Assumption 1(Smoothness).fisL-smooth in(θ, α): for all(θ 1, α1),(θ 2, α2),"},{"citing_arxiv_id":"2605.08423","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Queryable LoRA: Instruction-Regularized Routing Over Shared Low-Rank Update Atoms","primary_cat":"cs.LG","submitted_at":"2026-05-08T19:32:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Queryable LoRA adds dynamic routing over shared low-rank atoms with attention and language-instruction regularization to make parameter-efficient fine-tuning more adaptive across inputs and layers.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"This vector is used in place of the block-entry rank-state sentry ℓb in Eq. (26) for attention blocks. Thus, the router receives a compact summary of the low-rank attention state, while avoiding a circular dependence on the output projection. Given the routed rank-space operator Sb(c), the adapted update for each projection typep∈ {Q, K, V, O}is ∆W p ℓ(H ℓ;b,c) = α r Bp ℓ (I r +g p ℓ Sb(c))A p ℓ (17) gp ℓ =σ(η p ℓ )(18) Here,g p ℓ ∈(0,1). The adapted attention projections are then: Qℓ =H ℓ \u0010 W 0,Q ℓ + ∆W Q ℓ \u0011⊤ (19) Kℓ =H ℓ \u0010 W 0,K ℓ + ∆W K ℓ \u0011⊤ (20) V ℓ =H ℓ \u0010 W 0,V ℓ + ∆W V ℓ \u0011⊤ (21) Oℓ = Attention(Qℓ,K ℓ,V ℓ) \u0010 W 0,O ℓ + ∆W O ℓ \u0011⊤ (22) The same routed operator Sb(c) is reused across Q, K, V, O within the block, whereas the local LoRA factors Ap"},{"citing_arxiv_id":"2605.06402","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SparseForge: Efficient Semi-Structured LLM Sparsification via Annealing of Hessian-Guided Soft-Mask","primary_cat":"cs.LG","submitted_at":"2026-05-07T15:11:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SparseForge achieves 57.27% zero-shot accuracy on LLaMA-2-7B at 2:4 sparsity using only 5B retraining tokens, beating the dense baseline and nearly matching a 40B-token SOTA method.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.02285","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Complexity Horizons of Compressed Models in Analog Circuit Analysis","primary_cat":"cs.AI","submitted_at":"2026-05-04T07:19:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Prerequisite graphs map compressed LLM performance boundaries in analog circuit analysis to allow selecting the smallest viable model for a given task complexity.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08885","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Uncertainty-Aware Transformers: Conformal Prediction for Language Models","primary_cat":"cs.LG","submitted_at":"2026-04-10T02:48:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CONFIDE applies conformal prediction to transformer embeddings for valid prediction sets, improving accuracy up to 4.09% and efficiency over baselines on models like BERT-tiny.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2508.14685","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SSA: Improving Performance With a Better Scoring Function","primary_cat":"cs.CL","submitted_at":"2025-08-20T13:01:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Replacing Softmax with Scaled Signed Averaging in transformer attention improves generalization under distribution shifts for in-context learning and boosts results on NLP benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.04501","ref_index":55,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Ultra-Low-Dimensional Prompt Tuning via Random Projection","primary_cat":"cs.CL","submitted_at":"2025-02-06T21:00:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ULPT optimizes prompts in ultra-low dimensions with frozen random up-projection to cut training parameters by 98% while matching vanilla prompt tuning performance on NLP tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2501.14249","ref_index":57,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Humanity's Last Exam","primary_cat":"cs.LG","submitted_at":"2025-01-24T05:27:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Humanity's Last Exam is a new 2,500-question benchmark at the frontier of human knowledge where state-of-the-art LLMs show low accuracy.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Mmlu-pro: A more robust and challenging multi-task language understanding benchmark (published at neurips 2024 track datasets and benchmarks), 2024. URL https://arxiv.org/abs/2406.01574. [56] J. Wei, N. Karina, H. W. Chung, Y . J. Jiao, S. Papay, A. Glaese, J. Schulman, and W. Fedus. Measuring short-form factuality in large language models, 2024. URLhttps://arxiv.org/abs/2411.04368. [57] H. Wijk, T. Lin, J. Becker, S. Jawhar, N. Parikh, T. Broadley, L. Chan, M. Chen, J. Clymer, J. Dhyani, E. Ericheva, K. Garcia, B. Goodrich, N. Jurkovic, M. Kinniment, A. Lajko, S. Nix, L. Sato, W. Saunders, M. Taran, B. West, and E. Barnes. Re-bench: Evaluating frontier ai r&d capabilities of language model agents against human experts, 2024. URLhttps://arxiv."},{"citing_arxiv_id":"2403.14720","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Defending Against Indirect Prompt Injection Attacks With Spotlighting","primary_cat":"cs.CR","submitted_at":"2024-03-20T15:26:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Spotlighting prompt transformations cut indirect prompt injection success rates from >50% to <2% on GPT models while preserving task performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2309.14509","ref_index":147,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models","primary_cat":"cs.LG","submitted_at":"2023-09-25T20:15:57+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DeepSpeed-Ulysses keeps communication volume constant for sequence-parallel attention when sequence length and device count scale together, delivering 2.5x faster training on 4x longer sequences than prior SOTA.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2308.14132","ref_index":85,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Detecting Language Model Attacks with Perplexity","primary_cat":"cs.CL","submitted_at":"2023-08-27T15:20:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Jailbreak prompts with adversarial suffixes have high GPT-2 perplexity, and a LightGBM model on perplexity and length detects most attacks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2308.03958","ref_index":45,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Simple synthetic data reduces sycophancy in large language models","primary_cat":"cs.CL","submitted_at":"2023-08-07T23:48:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Scaling and instruction tuning increase sycophancy in LLMs on opinion and fact tasks, but a synthetic data fine-tuning intervention reduces it on held-out prompts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2307.08621","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Retentive Network: A Successor to Transformer for Large Language Models","primary_cat":"cs.CL","submitted_at":"2023-07-17T16:40:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RetNet is a new sequence modeling architecture that delivers parallel training, constant-time inference, and competitive language modeling performance as a potential replacement for Transformers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2306.14824","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Kosmos-2: Grounding Multimodal Large Language Models to the World","primary_cat":"cs.CL","submitted_at":"2023-06-26T16:32:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Kosmos-2 grounds text to image regions by encoding refer expressions as Markdown links to sequences of location tokens and trains on a new GrIT dataset of grounded image-text pairs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2305.16264","ref_index":122,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Scaling Data-Constrained Language Models","primary_cat":"cs.CL","submitted_at":"2023-05-25T17:18:55+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Repeating training data up to 4 epochs yields negligible loss increase versus unique data for fixed compute, and a new scaling law accounts for the decaying value of repeated tokens and excess parameters.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"then start dropping, which aligns with our results on test loss in §6. Filling up to 50% of data with code (42 billion tokens) also shows no deterioration. Beyond that, performance decreases quickly on natural language tasks. However, adding more code data may benefit non-natural language tasks, which are not considered in the benchmarking. Two of the tasks benchmarked, WebNLG [17, 34], a generation task, and bAbI [122, 57], a reasoning task, see jumps in performance as soon as code is added, possibly due to code enabling models to learn long-range state-tracking capabilities beneficial for these tasks. Of the filtering approaches, we find perplexity-filtering to be effective, while deduplication does not help. Prior work found deduplication was able to improve perplexity [55]; however, it did not"},{"citing_arxiv_id":"2305.10403","ref_index":150,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"PaLM 2 Technical Report","primary_cat":"cs.CL","submitted_at":"2023-05-17T17:46:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2302.14045","ref_index":29,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Language Is Not All You Need: Aligning Perception with Language Models","primary_cat":"cs.CL","submitted_at":"2023-02-27T18:55:27+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Kosmos-1 shows strong zero-shot and few-shot results on language tasks, image captioning, visual QA, OCR-free document understanding, and image recognition guided by text instructions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2210.11610","ref_index":13,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Large Language Models Can Self-Improve","primary_cat":"cs.CL","submitted_at":"2022-10-20T21:53:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A 540B-parameter LLM improves reasoning performance on GSM8K, DROP, OpenBookQA, and ANLI-A3 by fine-tuning on self-generated high-confidence CoT solutions from unlabeled data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2207.05221","ref_index":144,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Language Models (Mostly) Know What They Know","primary_cat":"cs.CL","submitted_at":"2022-07-11T22:59:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2112.04359","ref_index":285,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Ethical and social risks of harm from Language Models","primary_cat":"cs.CL","submitted_at":"2021-12-08T16:09:48+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2112.00861","ref_index":86,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"A General Language Assistant as a Laboratory for Alignment","primary_cat":"cs.CL","submitted_at":"2021-12-01T22:24:34+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2110.14168","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Training Verifiers to Solve Math Word Problems","primary_cat":"cs.LG","submitted_at":"2021-10-27T04:49:45+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces GSM8K dataset and demonstrates that verifier-based selection of solutions from multiple candidates outperforms fine-tuning baselines on math word problems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2102.01293","ref_index":198,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Scaling Laws for Transfer","primary_cat":"cs.LG","submitted_at":"2021-02-02T04:07:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Effective data transferred from pre-training to fine-tuning is described by a power law in model parameter count and fine-tuning dataset size, acting like a multiplier on the fine-tuning data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2009.03300","ref_index":272,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Measuring Massive Multitask Language Understanding","primary_cat":"cs.CY","submitted_at":"2020-09-07T17:59:25+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Introduces the MMLU benchmark of 57 tasks and shows that current models, including GPT-3, achieve low accuracy far below expert level across academic and professional domains.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2005.11401","ref_index":65,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks","primary_cat":"cs.CL","submitted_at":"2020-05-22T21:34:34+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"In ICLR, 2018. URL https://openreview. net/forum?id=rJl3yM-Ab. [64] Jason Weston, Sumit Chopra, and Antoine Bordes. Memory networks. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings , 2015. URL http://arxiv.org/abs/1410.3916. [65] Jason Weston, Emily Dinan, and Alexander Miller. Retrieve and reﬁne: Improved sequence generation models for dialogue. In Proceedings of the 2018 EMNLP Workshop SCAI: The 2nd International Workshop on Search-Oriented Conversational AI, pages 87-92, Brussels, Belgium, October 2018. Association for Computational Linguistics. doi: 10.18653/v1/W18-5713."},{"citing_arxiv_id":"2002.05202","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"GLU Variants Improve Transformer","primary_cat":"cs.LG","submitted_at":"2020-02-12T19:57:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Some GLU variants using non-sigmoid nonlinearities improve Transformer quality over ReLU and GELU in feed-forward sublayers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2001.08361","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Scaling Laws for Neural Language Models","primary_cat":"cs.LG","submitted_at":"2020-01-23T03:59:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Empirical power-law scaling governs language model loss versus model size, data size, and compute, enabling optimal allocation of training compute.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1910.10683","ref_index":76,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","primary_cat":"cs.LG","submitted_at":"2019-10-23T17:37:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1910.03771","ref_index":186,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"HuggingFace's Transformers: State-of-the-art Natural Language Processing","primary_cat":"cs.CL","submitted_at":"2019-10-09T03:23:22+00:00","verdict":"ACCEPT","verdict_confidence":"HIGH","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.11692","ref_index":44,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"RoBERTa: A Robustly Optimized BERT Pretraining Approach","primary_cat":"cs.CL","submitted_at":"2019-07-26T17:48:29+00:00","verdict":"ACCEPT","verdict_confidence":"HIGH","novelty_score":5.0,"formal_verification":"none","one_line_summary":"With better hyperparameters, more data, and longer training, an unchanged BERT-Large architecture matches or exceeds XLNet and other successors on GLUE, SQuAD, and RACE.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.10002","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"LIAAD at SemDeep-5 Challenge: Word-in-Context (WiC)","primary_cat":"cs.CL","submitted_at":"2019-06-24T14:49:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"An adapted WSD system with contextual and sense embeddings places second in the WiC challenge while avoiding task-specific training data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.08230","ref_index":34,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Evaluating Protein Transfer Learning with TAPE","primary_cat":"cs.LG","submitted_at":"2019-06-19T17:19:31+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"TAPE benchmark of five protein tasks shows self-supervised pretraining improves performance but often lags non-neural baselines, with code and data released publicly.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}