Infinity parser: Layout aware reinforcement learning for scanned document parsing

Baode Wang, Biao Wu, Weizhen Li, Meng Fang, Zuming Huang, Jun Huang, Haozhe Wang, Yanjie Liang, Ling Chen, Wei Chu, et al · 2025 · arXiv 2506.03197

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

SEM-RAG: Structure-Preserving Multimodal Graph Compilation and Entropy-Guided Retrieval for Telecommunication Standards

eess.SP · 2026-05-09 · unverdicted · novelty 7.0

SEM-RAG compiles telecommunication standards into structure-preserving graphs and uses entropy-guided retrieval to reach 94.1% accuracy on TeleQnA and 93.8% on ORAN-Bench-13K while reducing indexing token usage compared to standard GraphRAG.

TexOCR: Advancing Document OCR Models for Compilable Page-to-LaTeX Reconstruction

cs.CL · 2026-04-24 · unverdicted · novelty 7.0

A 2B-parameter model trained with RL on verifiable LaTeX unit tests produces more compilable page-to-LaTeX reconstructions than prior OCR systems across structural and compilation metrics.

UIPress: Bringing Optical Token Compression to UI-to-Code Generation

cs.CL · 2026-04-10 · unverdicted · novelty 7.0

UIPress is the first encoder-side learned optical compression method for UI-to-Code that compresses visual tokens to 256, outperforming the uncompressed baseline by 7.5% CLIP score and the best inference-time baseline by 4.6% while delivering 9.1x TTFT speedup.

PresentAgent-2: Towards Generalist Multimodal Presentation Agents

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

PresentAgent-2 generates query-driven multimodal presentation videos with research grounding, supporting single-speaker, multi-speaker discussion, and interactive question-answering modes.

citing papers explorer

Showing 4 of 4 citing papers.

SEM-RAG: Structure-Preserving Multimodal Graph Compilation and Entropy-Guided Retrieval for Telecommunication Standards eess.SP · 2026-05-09 · unverdicted · none · ref 27
SEM-RAG compiles telecommunication standards into structure-preserving graphs and uses entropy-guided retrieval to reach 94.1% accuracy on TeleQnA and 93.8% on ORAN-Bench-13K while reducing indexing token usage compared to standard GraphRAG.
TexOCR: Advancing Document OCR Models for Compilable Page-to-LaTeX Reconstruction cs.CL · 2026-04-24 · unverdicted · none · ref 2
A 2B-parameter model trained with RL on verifiable LaTeX unit tests produces more compilable page-to-LaTeX reconstructions than prior OCR systems across structural and compilation metrics.
UIPress: Bringing Optical Token Compression to UI-to-Code Generation cs.CL · 2026-04-10 · unverdicted · none · ref 51
UIPress is the first encoder-side learned optical compression method for UI-to-Code that compresses visual tokens to 256, outperforming the uncompressed baseline by 7.5% CLIP score and the best inference-time baseline by 4.6% while delivering 9.1x TTFT speedup.
PresentAgent-2: Towards Generalist Multimodal Presentation Agents cs.CV · 2026-05-12 · unverdicted · none · ref 12
PresentAgent-2 generates query-driven multimodal presentation videos with research grounding, supporting single-speaker, multi-speaker discussion, and interactive question-answering modes.

Infinity parser: Layout aware reinforcement learning for scanned document parsing

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer