Title resolution pending

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution , author= · 2024

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

browse 9 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Embedding Temporal Logic (ETL) performs runtime monitoring directly in learned embedding spaces using distance-based predicates composed with temporal operators, supported by conformal calibration for reliable predicate evaluation.

When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias

cs.AI · 2026-04-20 · unverdicted · novelty 7.0

VLMs as judges exhibit informativeness bias by favoring detailed but image-inconsistent answers; BIRCH mitigates it by first correcting answers against the image, reducing bias up to 17% and improving performance up to 9.8%.

OProver: A Unified Framework for Agentic Formal Theorem Proving

cs.CL · 2026-05-17 · unverdicted · novelty 6.0

OProver-32B achieves top Pass@32 scores on MiniF2F, ProverBench, and PutnamBench by combining continued pretraining with iterative agentic proving, retrieval, SFT on repairs, and RL on unresolved cases using a 6.86M-proof dataset.

Prefix-Adaptive Block Diffusion for Efficient Document Recognition

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

PA-BDM adapts block diffusion by switching to causal intra-block denoising and dynamically committing reliable prefixes to KV cache, yielding higher accuracy and 71.6% higher throughput than a comparable baseline on document benchmarks.

GOMA: Toward Structure-Driven Multimodal Alignment from a Graph Signal Smoothing Perspective

cs.LG · 2026-05-15 · unverdicted · novelty 6.0

GOMA refines frozen multimodal embeddings via modality-aware graph signal smoothing on attributed graphs to improve retrieval while avoiding over-smoothing.

Deep Pre-Alignment for VLMs

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.

When Good OCR Is Not Enough: Benchmarking OCR Robustness for Retrieval-Augmented Generation

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

High OCR accuracy on standard metrics does not guarantee strong downstream RAG performance because structural and semantic errors cause retrieval and generation failures on challenging industrial documents.

Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling

cs.AI · 2026-04-20

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference

cs.LG · 2026-04-19

citing papers explorer

Showing 9 of 9 citing papers.

Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic cs.LG · 2026-05-12 · unverdicted · none · ref 29 · 2 links
Embedding Temporal Logic (ETL) performs runtime monitoring directly in learned embedding spaces using distance-based predicates composed with temporal operators, supported by conformal calibration for reliable predicate evaluation.
When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias cs.AI · 2026-04-20 · unverdicted · none · ref 49
VLMs as judges exhibit informativeness bias by favoring detailed but image-inconsistent answers; BIRCH mitigates it by first correcting answers against the image, reducing bias up to 17% and improving performance up to 9.8%.
OProver: A Unified Framework for Agentic Formal Theorem Proving cs.CL · 2026-05-17 · unverdicted · none · ref 42
OProver-32B achieves top Pass@32 scores on MiniF2F, ProverBench, and PutnamBench by combining continued pretraining with iterative agentic proving, retrieval, SFT on repairs, and RL on unresolved cases using a 6.86M-proof dataset.
Prefix-Adaptive Block Diffusion for Efficient Document Recognition cs.CV · 2026-05-16 · unverdicted · none · ref 23
PA-BDM adapts block diffusion by switching to causal intra-block denoising and dynamically committing reliable prefixes to KV cache, yielding higher accuracy and 71.6% higher throughput than a comparable baseline on document benchmarks.
GOMA: Toward Structure-Driven Multimodal Alignment from a Graph Signal Smoothing Perspective cs.LG · 2026-05-15 · unverdicted · none · ref 41
GOMA refines frozen multimodal embeddings via modality-aware graph signal smoothing on attributed graphs to improve retrieval while avoiding over-smoothing.
Deep Pre-Alignment for VLMs cs.CV · 2026-05-14 · unverdicted · none · ref 141
Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.
When Good OCR Is Not Enough: Benchmarking OCR Robustness for Retrieval-Augmented Generation cs.CV · 2026-04-29 · unverdicted · none · ref 6
High OCR accuracy on standard metrics does not guarantee strong downstream RAG performance because structural and semantic errors cause retrieval and generation failures on challenging industrial documents.
Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling cs.AI · 2026-04-20 · unreviewed · ref 27
MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference cs.LG · 2026-04-19 · unreviewed · ref 5

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer