PaperFit uses rendered page images in a closed loop to diagnose and repair typesetting defects in LaTeX documents, outperforming baselines on a new benchmark of 200 papers.
Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures.arXiv preprint arXiv:2403.02308
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6representative citing papers
HAMSA achieves 85.7% ImageNet-1K top-1 accuracy as a spectral-domain SSM with 2.2x faster inference and lower memory than transformers or scanning-based SSMs.
SCRWKV is a 1.22M-parameter Vision-RWKV model using Structure-Field Encoder with AMCM and SCIU modules plus CSHF decoder that reports F1 0.8428 and mIoU 0.8512 on TUT crack dataset while claiming to outperform prior SOTA.
PestVL-Net combines an RWKV visual backbone with saliency-guided window partitioning and MLLM-derived linguistic priors via multimodal chain-of-thought to enable fine-grained multimodal pest recognition on dedicated datasets.
MFC-RFNet integrates multi-scale bidirectional communication, condition-guided alignment, and rectified flow to produce clearer and more skillful radar precipitation forecasts than prior baselines on four public datasets.
IDNet uses cross-modal distillation to integrate eye images and clinical variables, outperforming baselines on a new benchmark of 50,410 UK Biobank images for IHD screening.
citing papers explorer
-
PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents
PaperFit uses rendered page images in a closed loop to diagnose and repair typesetting defects in LaTeX documents, outperforming baselines on a new benchmark of 200 papers.
-
HAMSA: Scanning-Free Vision State Space Models via SpectralPulseNet
HAMSA achieves 85.7% ImageNet-1K top-1 accuracy as a spectral-domain SSM with 2.2x faster inference and lower memory than transformers or scanning-based SSMs.
-
SCRWKV: Ultra-Compact Structure-Calibrated Vision-RWKV for Topological Crack Segmentation
SCRWKV is a 1.22M-parameter Vision-RWKV model using Structure-Field Encoder with AMCM and SCIU modules plus CSHF decoder that reports F1 0.8428 and mIoU 0.8512 on TUT crack dataset while claiming to outperform prior SOTA.
-
PestVL-Net: Enabling Multimodal Pest Learning via Fine-grained Vision-Language Interaction
PestVL-Net combines an RWKV visual backbone with saliency-guided window partitioning and MLLM-derived linguistic priors via multimodal chain-of-thought to enable fine-grained multimodal pest recognition on dedicated datasets.
-
MFC-RFNet: A Multi-scale Guided Rectified Flow Network for Radar Sequence Prediction
MFC-RFNet integrates multi-scale bidirectional communication, condition-guided alignment, and rectified flow to produce clearer and more skillful radar precipitation forecasts than prior baselines on four public datasets.
-
Cross-Modal Iteration Distillation for Robust IHD Screening: The IDNet Framework and A New Benchmark
IDNet uses cross-modal distillation to integrate eye images and clinical variables, outperforming baselines on a new benchmark of 50,410 UK Biobank images for IHD screening.