{"total":33,"items":[{"citing_arxiv_id":"2605.27105","ref_index":11,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Lost in the Evidence? Reproducing Document Position and Context Size Effects in RAG","primary_cat":"cs.IR","submitted_at":"2026-05-26T14:44:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Reproducibility study shows position and context size effects in RAG depend on topic sampling and retrieval quality, proposes calibration for stable trends, and releases code after finding discrepancies with prior industry work.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22393","ref_index":16,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Nf-PEAK: Process-Based Energy Attribution for Nextflow Workflows on Kubernetes Clusters","primary_cat":"cs.DC","submitted_at":"2026-05-21T12:26:47+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Nf-PEAK is a containerized method that attributes energy to Nextflow tasks with 6.6% MAPE in isolated runs and 10.9% under co-located load, outperforming Kepler on nf-core workflows.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19605","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"deadtrees.earth-aerial: A Multi-Resolution Aerial Image Dataset for Tree Cover and Mortality Detection","primary_cat":"cs.CV","submitted_at":"2026-05-19T09:43:29+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Releases DTE-aerial-train (385K patches) and DTE-aerial-bench (25 global orthoimages) as the first harmonized multi-resolution datasets for joint tree cover and mortality segmentation across biomes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18893","ref_index":92,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Position: Graph Condensation Needs a Reset -- Move Beyond Full-dataset Training and Model-Dependence","primary_cat":"cs.LG","submitted_at":"2026-05-17T07:08:22+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper claims current graph condensation approaches are flawed due to full-dataset training requirements, high overhead, poor generalization, and misleading evaluation metrics, calling for a reset toward lightweight and architecture-agnostic methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14624","ref_index":12,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"An Amortized Efficiency Threshold for Comparing Neural and Heuristic Solvers in Combinatorial Optimization","primary_cat":"cs.LG","submitted_at":"2026-05-14T09:39:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Introduces the Amortized Efficiency Threshold (AET) to identify the deployment volume at which neural combinatorial optimization solvers achieve lower total energy use than heuristic baselines after accounting for training costs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14550","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Multi-Dimensional Model Integrity and Responsibility Assessment Index and Scoring Framework","primary_cat":"cs.LG","submitted_at":"2026-05-14T08:29:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MIRAI is a unified index that combines five responsibility dimensions into one score for tabular models, demonstrating that predictive performance does not ensure high overall integrity.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14249","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization","primary_cat":"cs.LG","submitted_at":"2026-05-14T01:37:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EnergyLens predicts multi-GPU LLM inference energy consumption with 9-13% MAPE and identifies configurations with up to 52x energy efficiency differences.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14055","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts","primary_cat":"cs.CL","submitted_at":"2026-05-13T19:25:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Letη t =c/ √ Tfor some constantc >0. Under Assumptions 1-3: 1 T T−1X t=0 E∥∇f(θ t, αt)∥2 ≤ C1 +C 2L2 θα√ T ,(20) where C1 = 2(f0 −f ∗) c + cLσ2 2 , C 2 =c 2Lσ2,(21) f0 =f(θ 0, α0), andf ∗ = inf θ,α f(θ, α). Proof.Step 1: Per-step descent with coupling.ByL-smoothness and the update rule (17): f(θ t+1, αt+1)≤f(θ t, αt)−η t∥∇f(θ t, αt)∥2 + Lη2 t 2 ∥gt∥2.(22) Because θ and α are updated simultaneously, theα-step uses ∇αf(θ t, αt) rather than ∇αf(θ t+1, αt). By Assumption 3 and the update rule: ∥∇αf(θ t+1, αt)− ∇ αf(θ t, αt)∥ ≤L θα∥θt+1 −θ t∥=L θαηt∥gt θ∥.(23) This staleness error propagates into the descent inequality. Incorporating (23) into (22) and taking expectations: E[f(θ t+1, αt+1)]≤E[f(θ t, αt)]−η t E∥∇f(θ t, αt)∥2"},{"citing_arxiv_id":"2605.11764","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Decomposing the Generalization Gap in PROTAC Activity Prediction: Variance Attribution and the Inter-Laboratory Ceiling","primary_cat":"cs.LG","submitted_at":"2026-05-12T08:35:02+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Inter-laboratory measurement variance dominates the generalization gap in PROTAC activity prediction, capping LOTO AUROC near 0.67 across models and architectures.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"cohort with at least three publications per target enables the inter-laboratory variance attribution in Section 5, and a reusable dual-LLM metadata-enrichment pipeline (Appendix H) recovers cell-line, readout-method, timepoint, and concentration annotations on 70.9 percent of the source corpus. Table 1: PROTAC-Bench dataset summary. Property Value Total entries 10,748 Unique targets (UniProt) 173 LOTO-eligible targets (n≥10, pos rate∈[0.1,0.9]) 65 LOFO-eligible targets (covered by 22-family map) 61 Cross-lab eligible targets (≥3papers,≥5entries each) 36 E3 ligase distribution CRBN 7,727 / VHL 2,896 / other 125 Activity rate (full benchmark / LOTO cohort) 0.658 / 0.672 Label criterion DC50<1µM OR Dmax>50% Unique Murcko scaffolds (singletons) 7,427 (5,771) 3.2 Evaluation Protocol Four primary evaluation splits are defined, with random cross-validation referenced as the published-"},{"citing_arxiv_id":"2605.11733","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Position: LLM Inference Should Be Evaluated as Energy-to-Token Production","primary_cat":"cs.CE","submitted_at":"2026-05-12T08:15:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM inference should be reframed and evaluated as energy-to-token production with a Token Production Function that accounts for power, cooling, and efficiency ceilings.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Sustainable ai: Environmental implications, challenges and opportunities. InProceedings of Machine Learning and Systems (MLSys), volume 4, pages 795-813, 2022. URL https: //arxiv.org/abs/2111.00364. [12] A. Lacoste, A. Luccioni, V . Schmidt, and T. Dandres. Quantifying the carbon emissions of machine learning, 2019. URL https://arxiv.org/abs/1910.09700 . arXiv preprint arXiv:1910.09700. [13] E. Strubell, A. Ganesh, and A. McCallum. Energy and policy considerations for deep learning in nlp. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), pages 3645-3650, 2019. URLhttps://arxiv.org/abs/1906.02243. [14] A. S. Luccioni, Y . Jernite, and E. Strubell. Power hungry processing: Watts driving the cost"},{"citing_arxiv_id":"2605.06597","ref_index":51,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"UniSD: Towards a Unified Self-Distillation Framework for Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-07T17:22:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"UniSD unifies self-distillation components for autoregressive LLMs and its full integrated version improves base models by 5.4 points and baselines by 2.8 points across six benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05416","ref_index":49,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"From Cradle to Cloud: A Life Cycle Review of AI's Environmental Footprint","primary_cat":"cs.CY","submitted_at":"2026-05-06T20:20:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A review of AI sustainability studies finds inconsistent life cycle definitions and predominant reliance on coarse CO2e proxies, with limited coverage of water, materials, and multi-impact assessments.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20927","ref_index":125,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Hidden Secrets in the arXiv: Discovering, Analyzing, and Preventing Unintentional Information Disclosure in Source Files of Scientific Preprints","primary_cat":"cs.CR","submitted_at":"2026-04-22T08:18:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Nearly every arXiv submission leaks hidden sensitive information through its source files, existing cleaners fail, and ALC-NG provides a more reliable fix.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Ten Years Later,\" arXiv:2102.00856, 2021. [122] arXiv, \"Terms of Use for arXiv APIs,\" https://info.arxiv.org/help/api/ tou.html#rate-limits, 2019. [123] Z. Durumericet al., \"ZMap: Fast Internet-wide Scanning and Its Security Applications,\" inSEC, 2013. [124] arXiv, \"Legacy Submission System,\" https://info.arxiv.org/help/ submit_legacy_differences.html, 2025. [125] A. Lacosteet al., \"Quantifying the Carbon Emissions of Machine Learning,\" arXiv:1910.09700, 2019. Appendix A: Ethics Considerations At its core, this research includes handling sensitive (confidential), potentially personally identifiable information, reviewer comments, credentials/key material, links/references to other resources, and software/scripts that are not meant"},{"citing_arxiv_id":"2604.12421","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Agentic Insight Generation in VSM Simulations","primary_cat":"cs.CL","submitted_at":"2026-04-14T08:11:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A two-step agentic system for extracting insights from VSM simulations achieves up to 86% accuracy with top LLMs by using progressive data discovery and slim context.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.11104","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Frugal Knowledge Graph Construction with Local LLMs: A Zero-Shot Pipeline, Self-Consistency and Wisdom of Artificial Crowds","primary_cat":"cs.AI","submitted_at":"2026-04-13T07:20:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A frugal zero-shot local-LLM pipeline extracts relations at F1 0.70 and reaches 0.55 EM on multi-hop QA through self-consistency, cross-model oracles, and confidence routing, while identifying an agreement paradox where strong consensus signals hallucination.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09316","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"ChatGPT, is this real? The influence of generative AI on writing style in top-tier cybersecurity papers","primary_cat":"cs.CR","submitted_at":"2026-04-10T13:34:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Top-tier cybersecurity papers exhibit a post-2022 increase in AI marker words and higher lexical complexity, suggesting generative AI is influencing academic writing style.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.22487","ref_index":43,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Coordinating GPU Data Centers and Power Grid Regulation Service for Exogenous Carbon Benefits","primary_cat":"cs.DC","submitted_at":"2026-01-30T02:57:34+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.16719","ref_index":63,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SAM 3: Segment Anything with Concepts","primary_cat":"cs.CV","submitted_at":"2025-11-20T18:59:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SAM 3 introduces promptable concept segmentation that doubles accuracy of prior systems on images and videos while improving standard SAM segmentation performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.06943","ref_index":4,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PlantTraitNet: An Uncertainty-Aware Multimodal Framework for Global-Scale Plant Trait Inference from Citizen Science Data","primary_cat":"cs.CV","submitted_at":"2025-11-10T10:51:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PlantTraitNet is an uncertainty-aware multimodal deep learning framework that infers four plant traits from citizen science images and produces global trait maps that outperform prior products when validated against independent survey data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.04776","ref_index":20,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Quantifying the Climate Risk of Generative AI: Region-Aware Carbon Accounting with G-TRACE and the AI Sustainability Pyramid","primary_cat":"cs.CY","submitted_at":"2025-11-06T19:52:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"G-TRACE provides region-aware estimates of GenAI carbon emissions including 4309 MWh and 2068 tCO2 for a 2024-2025 image generation trend, paired with a seven-level AI Sustainability Pyramid for policy guidance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.02850","ref_index":32,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users","primary_cat":"cs.CL","submitted_at":"2025-07-03T17:55:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A single attacker can use strategic upvoting and downvoting on language model outputs to inject facts, security flaws, or fake news that persist in the model for all users after preference tuning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2503.08223","ref_index":127,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices","primary_cat":"cs.DC","submitted_at":"2025-03-11T09:41:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"Position paper claiming that distributed training across massive edge devices can overcome data depletion and centralized compute monopolies in LLM scaling.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[125] Alexandre Lacoste, Alexandra Luccioni, Victor Schmidt, and Thomas Dandres. Quantifying the carbon emissions of machine learning. arXiv preprint arXiv:1910.09700, 2019. [126] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems , volume 25, pages 1097-1105, 2012. [127] Neil C Thompson, Kristjan Greenewald, Keeheon Lee, and Gabriel F Manso. The computa- tional limits of deep learning. arXiv preprint arXiv:2007.05558, 10, 2020. [128] Frank Rosenblatt. The perceptron: a probabilistic model for information storage and organiza- tion in the brain. Psychological review, 65(6):386, 1958. [129] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner."},{"citing_arxiv_id":"2503.10666","ref_index":27,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Green Prompting: Characterizing Prompt-driven Energy Costs of LLM Inference","primary_cat":"cs.CL","submitted_at":"2025-03-09T19:49:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Empirical tests on three LLMs show prompt semantics and task keywords drive inference energy costs more than length, with varying patterns by task.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2408.00714","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SAM 2: Segment Anything in Images and Videos","primary_cat":"cs.CV","submitted_at":"2024-08-01T17:00:08+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SAM 2 delivers more accurate video segmentation with 3x fewer user interactions and 6x faster image segmentation than the original SAM by training a streaming-memory transformer on the largest video segmentation dataset collected to date.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"999 gradient clipping type: ℓ2, max: 0.1 weight decay 0.1 learning rate (lr) 4e-4 lr schedule reciprocal sqrt, timescale=1000 warmup linear, 1k iters cooldown linear, 5k iters layer-wise decay 0.8 (T, S), 0.9 (B+), 0.925 (L) augmentation hflip, resize to 1024 (square) batch size 256 drop path 0.1 (T, S), 0.2 (B+), 0.3 (L) mask losses (weight) focal (20), dice (1) IoU loss (weight) ℓ1 (1) max # masks per image 64 # correction points 7 global attn. blocks 5-7-9 (T), 7-10-13 (S), 12-16-20 (B+), 23-33-43 (L) (a) Pre-training config value data SA-1B, Internal, SA-V steps ∼150k resolution 1024 precision bfloat16 optimizer AdamW optimizer momentum β1, β2=0.9, 0.999 gradient clipping type: ℓ2, max: 0.1 weight decay 0."},{"citing_arxiv_id":"2402.19173","ref_index":220,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"StarCoder 2 and The Stack v2: The Next Generation","primary_cat":"cs.SE","submitted_at":"2024-02-29T13:53:35+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2309.14509","ref_index":158,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models","primary_cat":"cs.LG","submitted_at":"2023-09-25T20:15:57+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DeepSpeed-Ulysses keeps communication volume constant for sequence-parallel attention when sequence length and device count scale together, delivering 2.5x faster training on 4x longer sequences than prior SOTA.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2305.06161","ref_index":287,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"StarCoder: may the source be with you!","primary_cat":"cs.CL","submitted_at":"2023-05-09T08:16:42+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":5.0,"formal_verification":"none","one_line_summary":"StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2304.02643","ref_index":61,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Segment Anything","primary_cat":"cs.CV","submitted_at":"2023-04-05T17:59:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A promptable model trained on 1B masks achieves competitive zero-shot segmentation performance across tasks and is released publicly with its dataset.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"CVPR, 2019. 4 [60] Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, and Vittorio Ferrari. The open images dataset v4: Uniﬁed image classiﬁcation, object detection, and visual relationship detection at scale. IJCV, 2020. 2, 6, 7, 18, 19 [61] Alexandre Lacoste, Alexandra Luccioni, Victor Schmidt, and Thomas Dandres. Quantifying the carbon emissions of machine learning. arXiv:1910.09700, 2019. 28 [62] Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He. Explor- ing plain vision transformer backbones for object detection. ECCV, 2022. 5, 10, 11, 16, 21, 23, 24 [63] Yin Li, Zhefan Ye, and James M."},{"citing_arxiv_id":"2303.09014","ref_index":129,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ART: Automatic multi-step reasoning and tool-use for large language models","primary_cat":"cs.CL","submitted_at":"2023-03-16T01:04:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ART automatically generates multi-step reasoning programs with tool integration for LLMs, yielding substantial gains over few-shot and auto-CoT prompting on BigBench and MMLU while matching hand-crafted CoT on most tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2211.05100","ref_index":259,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"BLOOM: A 176B-Parameter Open-Access Multilingual Language Model","primary_cat":"cs.CL","submitted_at":"2022-11-09T18:48:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2205.01068","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"OPT: Open Pre-trained Transformer Language Models","primary_cat":"cs.CL","submitted_at":"2022-05-02T17:49:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2204.06745","ref_index":51,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"GPT-NeoX-20B: An Open-Source Autoregressive Language Model","primary_cat":"cs.CL","submitted_at":"2022-04-14T04:00:27+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2110.08207","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Multitask Prompted Training Enables Zero-Shot Task Generalization","primary_cat":"cs.LG","submitted_at":"2021-10-15T17:08:57+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Multitask fine-tuning of an encoder-decoder model on prompted datasets produces zero-shot generalization that often beats models up to 16 times larger on standard benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}