mega hub Mixed citations

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Eric Bieber, Gheorghe Comanici, Ice Pasupat, Inderjit Dhillon, Mike Schaekermann, Noveen Sachdeva · 2025 · cs.CL · arXiv 2507.06261

Mixed citation behavior. Most common role is background (55%).

1032 Pith papers citing it

Background 55% of classified citations

open full Pith review browse 1032 citing papers more from Eric Bieber arXiv PDF

abstract

In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 122 baseline 46 method 28 other 8 dataset 3

citation-polarity summary

background 114 baseline 47 use method 28 unclear 12 support 3 use dataset 3

claims ledger

abstract In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. G

authors

Eric Bieber Gheorghe Comanici Ice Pasupat Inderjit Dhillon Mike Schaekermann Noveen Sachdeva

mega hub controls

export citing contexts JSON export graph JSON export full bundle JSON open full Pith review annotated reader queued

Recognition alignment

counterfactual ablation

If this work disappeared, these are the nearest dependency candidates in Pith, weighted toward method, dataset, baseline, and extension contexts where available. This is a structural signal, not a retraction verdict.

co-cited works

representative citing papers

EHRNote-ChatQA: A Benchmark for Evidence-Grounded Multi-Turn Clinical Question Answering over Longitudinal Discharge Summaries

cs.CL · 2026-06-14 · unverdicted · novelty 8.0

EHRNote-ChatQA is the first benchmark for evidence-grounded multi-turn clinical QA over longitudinal discharge summaries, containing 16,072 medical-expert-verified pairs across eight categories and revealing LLM weaknesses in evidence grounding and multi-turn consistency.

HKJudge: A Legal Discourse-Annotated Corpus for Interpreting What Courts Find, How They Reason, and What They Rule

cs.CL · 2026-06-04 · accept · novelty 8.0

HKJudge is a new ~290k-sentence expert-annotated corpus of Hong Kong criminal judgments with 26 rhetorical roles and 3 sentencing elements, plus benchmarks on classification and extraction tasks.

RRP-Voice: A Longitudinal Dataset and Benchmark for Recurrent Respiratory Papillomatosis Detection

eess.AS · 2026-06-01 · unverdicted · novelty 8.0

Introduces the first longitudinal voice dataset for RRP with benchmarks across handcrafted features, deep networks, self-supervised models, and audio LLMs under patient-level validation.

VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational Agents

cs.CV · 2026-05-28 · unverdicted · novelty 8.0

VideoFDB is a new benchmark and LM-as-judge framework for evaluating full-duplex audio-visual-to-audio-visual conversational agents on nonverbal dynamics from real video calls.

EgoIntrospect: An Egocentric Dataset and Benchmark for User-Centric Internal State Reasoning

cs.CV · 2026-05-17 · unverdicted · novelty 8.0

EgoIntrospect provides the first egocentric dataset with self-annotations for internal state tasks and shows multimodal LLMs struggle to infer subjective states from combined signals.

Tracing Persona Vectors Through LLM Pretraining

cs.CL · 2026-05-13 · unverdicted · novelty 8.0

Persona vectors form within the first 0.22% of LLM pretraining and remain effective for steering post-trained models, with continued refinement and transfer to other models.

Sieve: Dynamic Expert-Aware PIM Acceleration for Evolving Mixture-of-Experts Models

cs.AR · 2026-05-11 · conditional · novelty 8.0

Sieve dynamically schedules MoE experts across GPU and PIM hardware to handle bimodal token distributions, achieving 1.3x to 1.6x gains in throughput and interactivity over static prior PIM systems on three large models.

Knowledge Poisoning Attacks on Medical Multi-Modal Retrieval-Augmented Generation

cs.CR · 2026-05-11 · unverdicted · novelty 8.0

M³Att poisons medical multimodal RAG by pairing covert textual misinformation with query-agnostic visual perturbations that increase retrieval of the bad content, causing LLMs to generate clinically plausible but incorrect responses.

Omni-DeepSearch: A Benchmark for Audio-Driven Omni-Modal Deep Search

cs.SD · 2026-05-09 · unverdicted · novelty 8.0

Omni-DeepSearch is a 640-sample benchmark for audio-driven omni-modal search where the best model reaches only 43.44% accuracy, exposing bottlenecks in audio inference, tool use, and cross-modal reasoning.

TraceAV-Bench: Benchmarking Multi-Hop Trajectory Reasoning over Long Audio-Visual Videos

cs.CV · 2026-05-08 · unverdicted · novelty 8.0

TraceAV-Bench is the first benchmark for multi-hop trajectory reasoning over long audio-visual videos, showing top models reach only 51-68% accuracy with substantial room for improvement.

S1-VL: Scientific Multimodal Reasoning Model with Thinking-with-Images

cs.CV · 2026-04-23 · unverdicted · novelty 8.0

S1-VL combines structured scientific reasoning with iterative image manipulation via code execution to reach state-of-the-art results on visual and scientific reasoning benchmarks.

Lost in Translation: Do LVLM Judges Generalize Across Languages?

cs.CL · 2026-04-21 · unverdicted · novelty 8.0

MM-JudgeBench shows substantial cross-lingual performance variance in 22 LVLM judges, with model size and architecture as poor predictors of multilingual robustness.

HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models

cs.SD · 2026-04-21 · unverdicted · novelty 8.0

HalluAudio is the first large-scale benchmark spanning speech, environmental sound, and music that uses human-verified QA pairs, adversarial prompts, and mixed-audio tests to measure hallucinations in large audio-language models.

When Text Hijacks Vision: Benchmarking and Mitigating Text Overlay-Induced Hallucination in Vision Language Models

cs.CV · 2026-04-19 · unverdicted · novelty 8.0

VLMs hallucinate by prioritizing contradictory on-screen text over visual content, addressed via the VisualTextTrap benchmark with 6,057 human-validated samples and the VTHM-MoE dual-encoder framework using dimension-specific experts and adaptive routing.

Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models

cs.CL · 2026-04-13 · conditional · novelty 8.0

Large language models display the identifiable victim effect at roughly twice the human baseline, strongly amplified by instruction tuning and chain-of-thought prompting but inverted by reasoning-specialized models.

MMRareBench: A Rare-Disease Multimodal and Multi-Image Medical Benchmark

cs.CV · 2026-04-12 · unverdicted · novelty 8.0 · 2 refs

MMRareBench provides 1,756 QA pairs and 7,958 images from PMC rare-disease cases to evaluate 23 MLLMs, revealing low treatment-planning scores and medical models underperforming general models on multi-image tasks due to capacity dilution.

HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing

cs.CV · 2026-04-10 · accept · novelty 8.0

HM-Bench is the first benchmark for MLLMs on hyperspectral images, showing models struggle with complex spatial-spectral reasoning and perform better with visual PCA images than textual reports.

DialBGM: A Benchmark for Background Music Recommendation from Everyday Multi-Turn Dialogues

cs.AI · 2026-04-09 · unverdicted · novelty 8.0

DialBGM is a new benchmark dataset revealing that existing AI models fall far short of human performance when recommending fitting background music for open-domain conversations.

V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views

cs.RO · 2026-04-03 · conditional · novelty 8.0

V2X-QA provides a view-decoupled benchmark showing infrastructure views aid macroscopic traffic understanding while cooperative reasoning requires explicit cross-view alignment, with V2X-MoE as a routing-based baseline that improves performance.

ScreenParse: Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision

cs.CV · 2026-02-15 · conditional · novelty 8.0

ScreenParse dataset and ScreenVLM model deliver dense screen parsing that outperforms larger VLMs on PageIoU and transfers to better UI grounding.

EgoSound: Benchmarking Sound Understanding in Egocentric Videos

cs.CV · 2026-02-15 · unverdicted · novelty 8.0

EgoSound is a new benchmark with 7315 QA pairs across seven tasks to evaluate egocentric sound understanding in multimodal large language models.

VLRS-Bench: A Vision-Language Reasoning Benchmark for Remote Sensing

cs.CV · 2026-02-04 · unverdicted · novelty 8.0

VLRS-Bench is the first benchmark dedicated to complex vision-language reasoning in remote sensing, with 2000 QA pairs across 14 tasks in cognition, decision, and prediction dimensions.

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

cs.CV · 2026-01-15 · unverdicted · novelty 8.0

Molmo2 delivers state-of-the-art open-weight video VLMs with new grounding datasets and training methods that outperform prior open models and match or exceed some proprietary ones on pointing and tracking tasks.

ConceptPose: Training-Free Zero-Shot Object Pose Estimation using Concept Vectors

cs.CV · 2025-12-09 · unverdicted · novelty 8.0

ConceptPose delivers state-of-the-art zero-shot relative pose estimation by matching open-vocabulary 3D concept vectors derived from VLM saliency maps, beating the strongest baseline by 62% in ADD(-S) without training.

citing papers explorer

Showing 50 of 1032 citing papers.

UniTranslator: A Unified Multi-modal Framework for End-to-end In-Image Machine Translation cs.CV · 2026-06-23 · unverdicted · none · ref 9 · internal anchor
UniTranslator adds an Understand-Generation Alignment Module and Spatial Mask Decoder to a unified multimodal model to fix translation inconsistency and spatial misalignment in in-image machine translation, reporting SOTA results on multiple benchmarks.
Token-to-Token Alignment of Text Embeddings for Semantic Blending cs.CV · 2026-06-22 · unverdicted · none · ref 10 · internal anchor
Token-to-Token alignment rephrases prompts into shared structure then matches token embeddings by semantic similarity, making linear interpolation a meaningful operation for blending in text-to-image models.
Music Playlist Captioning at Scale with Large Language Models cs.IR · 2026-06-21 · unverdicted · none · ref 13 · internal anchor
Deezer deployed an LLM-driven playlist captioning system in 2025 for its Daily Mix recommendations, claiming significant gains in user engagement from the added natural-language descriptions.
Multi-LCB: Extending LiveCodeBench to Multiple Programming Languages cs.AI · 2026-06-18 · unverdicted · none · ref 18 · internal anchor
Multi-LCB extends LiveCodeBench to 12 languages by translating Python tasks, revealing Python overfitting and performance disparities when evaluating 24 LLMs.
ASTRA: A Scalable Next-Generation ATCO Training Simulator with Autonomous Simpilots cs.LG · 2026-06-16 · unverdicted · none · ref 20 · internal anchor
ASTRA automates simpilot roles in ATCO training with a fine-tuned ASR pipeline that cuts WER to 23.45% on Singaporean aviation speech and an AI evaluator scoring 86.9-91.7% on accuracy, brevity, and completeness.
Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale cs.CL · 2026-06-13 · unverdicted · none · ref 37 · internal anchor
Technical report announcing Ling-2.6 and Ring-2.6 models with hybrid linear attention, evolutionary CoT, and KPop RL for efficient agentic intelligence at scale.
InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning cs.CV · 2026-06-10 · unverdicted · none · ref 10 · internal anchor
InternVideo3 introduces Multimodal Contextual Reasoning and M^2LA attention to enable closed-loop evidence accumulation in long-video understanding and agentic tool use, reporting strong benchmark results.
MSUE: Multi-Modal Soccer Understanding Expert cs.CV · 2026-06-10 · unverdicted · none · ref 5 · internal anchor
MSUE routes questions via LLM to text/image/video experts and an external KB after VLM-driven data synthesis, achieving 0.95 accuracy on SoccerNet VQA.
Language-Driven Cost Optimization for Autonomous Driving cs.RO · 2026-06-09 · unverdicted · none · ref 3 · internal anchor
LLM interprets user language to set parameters of a risk-aware MPPI controller, with human-in-the-loop validation for adaptive autonomous driving behavior.
Task Robustness via Re-Labelling Vision-Action Robot Data cs.RO · 2026-06-09 · unverdicted · none · ref 10 · internal anchor
TREAD augments robotics datasets via VLM-based sub-task generation, video segmentation, and linguistic diversity to improve policy generalization on novel tasks in LIBERO benchmarks.
Building Customer Support AI Agents at 100M-User Scale: An Evaluation-Driven Framework cs.CL · 2026-06-07 · unverdicted · none · ref 8 · internal anchor
An evaluation-driven framework for customer support AI agents at Nubank integrates context engineering, LLM judges, and A/B testing to deliver up to 37pp NPS gains and strong offline-online correlation across five production domains.
IEA: Amateur-Friendly Conversational Image Editing Agent via Three Stages of Multitask Alignment cs.CV · 2026-06-06 · unverdicted · none · ref 7 · internal anchor
IEA is a tool-calling VLM for conversational image editing trained in three multitask stages that reports lower pixel distance, higher ROUGE-L, and top user-study rankings versus baselines.
Neutrality Bites: Gender Representation in AI-Generated Animal Stories cs.CL · 2026-06-06 · unverdicted · none · ref 11 · internal anchor
LLMs exhibit masculine bias when assigning gender to animal characters in generated stories, with neutrality often resulting in erasure of feminine perspectives.
Beyond Semantic Dominance: Cognitive Affective Reasoning and Empathetic Response Alignment in Audio Language Models eess.AS · 2026-06-05 · unverdicted · none · ref 32 · internal anchor
CogAudio-LLM introduces LIME-440K dataset, EIPS chain-of-thought reasoning, and DR-SAPO optimization to address semantic dominance and improve affective responses in audio language models.
Breaking the Lock-in: Diversifying Text-to-Image Generation via Representation Modulation cs.CV · 2026-06-05 · unverdicted · none · ref 45 · internal anchor
Early DC component convergence in text-to-image Transformer features causes output homogeneity; selective early attenuation via DAVE improves diversity without retraining or extra cost.
UNIVID: Unified Vision-Language Model for Video Moderation cs.MM · 2026-06-04 · unverdicted · none · ref 20 · internal anchor
UNIVID generates policy-aware captions for video moderation, reducing violation leakage by 42.7% and overkill rate by 37.0% while replacing over 1,000 policy-specific models with a single backbone.
Libra: Efficient Resource Management for Agentic RL Post-Training cs.LG · 2026-06-02 · unverdicted · none · ref 9 · internal anchor
Libra optimizes GPU allocation across rollout and training in agentic RL via an elastic hybrid pool and C-MLFQ scheduler based on tool-return causal signals, claiming up to 3.0x throughput and 2.5x faster reward convergence on 48 A800 GPUs.
Self-Distilled Policy Gradient cs.LG · 2026-06-02 · unverdicted · none · ref 24 · internal anchor
SDPG combines group-relative verifier advantages, normalized standard deviation, full-vocabulary on-policy self-distillation, and reference-policy KL regularization to improve stability and performance over RLVR and self-distillation baselines in language model RL.
Training-Free Composed Video Retrieval via Visual Representation-Guided Video-LLM Reasoning cs.CV · 2026-06-01 · unverdicted · none · ref 9 · internal anchor
Training-free composed video retrieval pipeline using DINOv3 for candidate selection and video-LLM reasoning achieves 48.78 Recall@1 and 51.48 Recall@5 on the CVPR 2026 challenge test set.
WALL-WM: Carving World Action Modeling at the Event Joints cs.RO · 2026-06-01 · unverdicted · none · ref 17 · internal anchor
WALL-WM introduces event-grounded Vision-Language-Action pretraining that uses semantic events as the atomic unit to address granularity mismatch in world action models and reports state-of-the-art generalization.
Effects of Varying LLM Access on Essay Writing Behavior cs.CL · 2026-05-29 · unverdicted · none · ref 50 · internal anchor
Pilot experiment shows limited LLM access maintains higher student ownership and strategic use than unlimited access, with no difference in essay quality.
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer eess.AS · 2026-05-29 · unverdicted · none · ref 7 · internal anchor
SwanSphere introduces a causal autoregressive diffusion transformer architecture with SVAC contrastive learning and ODPO optimization for streaming spatial audio generation from video and text.
Geometric Erasure by Contrastive Velocity Matching in Rectified Flows cs.LG · 2026-05-29 · unverdicted · none · ref 2 · internal anchor
GEM bridges trajectory-based unlearning and teacher-guided erasure to create a geometric guidance objective for targeted concept suppression in Rectified Flow models.
Label Over Logic? How Source Cues Bias Human Fallacy Judgments More Than LLMs cs.HC · 2026-05-28 · unverdicted · none · ref 10 · internal anchor
Humans exhibit greater source-label bias in logical fallacy judgments than LLMs, which maintain more consistent evaluations regardless of source cues.
OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration cs.CL · 2026-05-27 · unverdicted · none · ref 3 · internal anchor
OmniVerifier-M1 is a generalist visual verifier using symbolic outputs for meta-verification and decoupled RL to outperform joint optimization for robust verification and agentic self-correction.
Audio-Mind: An Auditable Agentic Framework for Audio Understanding eess.AS · 2026-05-27 · unverdicted · none · ref 7 · internal anchor
Audio-Mind introduces a conditional, auditable agentic framework for audio understanding that preserves frontend judgment and acquires bounded external evidence only when needed, reporting 80.4% on MMAR and 82.8% on MSU-Bench.
Measuring Progress Toward AGI: A Cognitive Framework cs.AI · 2026-05-27 · unverdicted · none · ref 36 · internal anchor
The paper introduces a 10-faculty Cognitive Taxonomy and a held-out task protocol to generate cognitive profiles for measuring AI progress toward AGI.
Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini cs.CV · 2026-05-26 · unverdicted · none · ref 5 · internal anchor
A native multimodal embedding model from Gemini achieves reported state-of-the-art results on retrieval benchmarks across modalities via large-scale contrastive learning.
MuChator: Enabling Active Music Discovery via Conversational Music LLMs in Douyin Music cs.IR · 2026-05-26 · unverdicted · none · ref 4 · internal anchor
MuChator introduces a three-component MusicLLM system (staged knowledge pre-training, automated triplet instruction tuning, hybrid RM with GRPO) that outperforms Gemini-3-Pro on internal datasets and yields 46.49% higher user active days after deployment on Douyin Music.
VEN-VL: A Visual Ensemble MoE Framework for Effective and Efficient Multi-Modal Understanding cs.CV · 2026-05-25 · unverdicted · none · ref 2 · internal anchor
VEN-VL introduces an enrich-then-compact visual ensemble MoE approach claiming superior performance-efficiency trade-off in multimodal tasks using fewer condensed visual tokens.
VectorArk: Learning Practical Image Vectorization with Rounded Polygon Representation cs.CV · 2026-05-23 · unverdicted · none · ref 8 · internal anchor
A vision-language model for robust image vectorization via rounded polygon primitives and input degradation simulation.
Tracing the ongoing emergence of human-like reasoning in Large Language Models cs.CL · 2026-05-20 · unverdicted · none · ref 70 · internal anchor
LLMs function as accurate semantic processors for conditionals but do not replicate the pragmatic inferences that define human reasoning.
Operating Within the Operational Design Domain: Zero-Shot Perception with Vision-Language Models cs.CV · 2026-05-08 · unverdicted · none · ref 38 · 2 links · internal anchor
Vision-language models can serve as zero-shot ODD sensors for autonomous driving when using definition-anchored chain-of-thought prompting with persona decomposition.
JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation cs.GR · 2026-05-05 · unverdicted · none · ref 23 · 2 links · internal anchor
JoyAI-Image unifies visual understanding and generation via an MLLM-MMDiT architecture with spatial training signals to reach competitive benchmark performance and stronger spatial intelligence.
HealthNLP_Retrievers at ArchEHR-QA 2026: Cascaded LLM Pipeline for Grounded Clinical Question Answering cs.CL · 2026-04-29 · unverdicted · none · ref 9 · internal anchor
A cascaded LLM pipeline for grounded question answering over electronic health records achieved competitive rankings in the ArchEHR-QA 2026 shared task.
LARA: Validation-Driven Agentic Supercomputer Workflows for Atomistic Modeling physics.comp-ph · 2026-04-24 · unverdicted · none · ref 41 · internal anchor
LARA-HPC introduces a validation-first agentic system with dry-run verification and multi-phase refinement that improves robustness of AI-generated DFT workflows on HPC systems.
From Handwriting to Structured Data: Benchmarking AI Digitisation of Handwritten Forms cs.CV · 2026-04-14 · unverdicted · none · ref 19 · internal anchor
Frontier multimodal LLMs achieve ~85% accuracy and ~90% weighted F1 on digitizing complex handwritten medical forms, with Gemini 3.1 strongest overall and prompt optimization lifting macro metrics over 60%.
LLM-Based Automated Diagnosis Of Integration Test Failures At Google cs.SE · 2026-04-13 · unverdicted · none · ref 7 · internal anchor
Auto-Diagnose applies LLMs to summarize and diagnose root causes of integration test failures, reporting 90.14% accuracy on 71 manual cases and positive adoption after Google-wide rollout.
ClinQueryAgent: A Conversational Agent for Population Health Management cs.IR · 2026-04-13 · unverdicted · none · ref 123 · internal anchor
The paper introduces ClinQueryAgent, a conversational agent that converts natural language queries into database queries for population health management while keeping patient data secure, and reports its use by 128 staff across 15 NHS practices covering 148,319 patients.
Retrieve Only Relevant Tables Whether Few or Many: Adaptive Table Retrieval Method cs.IR · 2026-04-12 · unverdicted · none · ref 98 · internal anchor
An adaptive thresholding mechanism combined with sliding-window reranking retrieves a query-dependent number of tables from large corpora, improving retrieval and downstream text-to-SQL performance on Spider, BIRD, and Spider 2.0.
DietDelta: A Vision-Language Approach for Dietary Assessment via Before-and-After Images cs.CV · 2026-04-07 · unverdicted · none · ref 33 · internal anchor
DietDelta uses vision-language prompts on paired before-and-after RGB images to localize food items, estimate their weights, and compute consumption differences, reporting better results than prior single-image methods on three public datasets.
OmniFysics: Towards Physical Intelligence Evolution via Omni-Modal Signal Processing and Network Optimization cs.CV · 2026-02-05 · unverdicted · none · ref 38 · internal anchor
OmniFysics is an omni-modal network using a dynamic physical data engine and evolutive tuning to improve performance on multimodal benchmarks and physics-oriented tasks.
Multi-Dimensional Knowledge Profiling with Large-Scale Literature Database and Hierarchical Retrieval cs.CV · 2026-01-21 · unverdicted · none · ref 13 · internal anchor
Large-scale profiling of recent AI literature shows growth in safety, multimodal reasoning, and agent studies alongside stabilization in neural machine translation and graph methods.
AI for Mathematics: Progress, Challenges, and Prospects math.HO · 2026-01-19 · unverdicted · none · ref 34 · internal anchor
AI for math combines task-specific architectures and general foundation models to support research and advance AI reasoning capabilities.
Agentic Reasoning for Large Language Models cs.AI · 2026-01-18 · unverdicted · none · ref 242 · internal anchor
The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.
MUSEKG: A Knowledge Graph Over Museum Collections cs.AI · 2025-11-20 · unverdicted · none · ref 3 · internal anchor
MuseKG builds a typed knowledge graph over museum collections to support natural-language queries and relation-aware exploration of objects, people, images, and extracted entities.
Quantifying the Climate Risk of Generative AI: Region-Aware Carbon Accounting with G-TRACE and the AI Sustainability Pyramid cs.CY · 2025-11-06 · unverdicted · none · ref 9 · 2 links · internal anchor
G-TRACE provides region-aware estimates of GenAI carbon emissions including 4309 MWh and 2068 tCO2 for a 2024-2025 image generation trend, paired with a seven-level AI Sustainability Pyramid for policy guidance.
Evaluating Reasoning Models for Queries with Presuppositions cs.CL · 2026-05-04 · unverdicted · none · ref 2
Reasoning models achieve only 2-11% higher accuracy than non-reasoning models when handling queries with false presuppositions, failing to challenge 26-42% of them and remaining sensitive to presupposition strength.
Audio Editing in the Era of Foundation Models: A Survey eess.AS · 2026-06-22 · unverdicted · none · ref 3 · internal anchor
A survey that presents a unified taxonomy of audio editing tasks, summarizes training-based and training-free foundation model approaches, reviews datasets and evaluation protocols, and identifies future challenges.
An Analysis of the Effectiveness of Synthetic Speech Data for ASR Fine-tuning in Selected Indic Languages eess.AS · 2026-06-16 · unverdicted · none · ref 21 · internal anchor
Empirical study measuring ASR performance gains from synthetic speech augmentation in three Indic languages, varying script sources, synthesis models, and cloned voice counts.