PaperFit uses rendered page images in a closed loop to diagnose and repair typesetting defects in LaTeX documents, outperforming baselines on a new benchmark of 200 papers.
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures, March 2024
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5representative citing papers
HAMSA achieves 85.7% ImageNet-1K top-1 accuracy as a spectral-domain SSM with 2.2x faster inference and lower memory than transformers or scanning-based SSMs.
SCRWKV is a 1.22M-parameter Vision-RWKV model using Structure-Field Encoder with AMCM and SCIU modules plus CSHF decoder that reports F1 0.8428 and mIoU 0.8512 on TUT crack dataset while claiming to outperform prior SOTA.
PestVL-Net combines an RWKV visual backbone with saliency-guided window partitioning and MLLM-derived linguistic priors via multimodal chain-of-thought to enable fine-grained multimodal pest recognition on dedicated datasets.
MFC-RFNet integrates multi-scale bidirectional communication, condition-guided alignment, and rectified flow to produce clearer and more skillful radar precipitation forecasts than prior baselines on four public datasets.
citing papers explorer
-
PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents
PaperFit uses rendered page images in a closed loop to diagnose and repair typesetting defects in LaTeX documents, outperforming baselines on a new benchmark of 200 papers.
-
HAMSA: Scanning-Free Vision State Space Models via SpectralPulseNet
HAMSA achieves 85.7% ImageNet-1K top-1 accuracy as a spectral-domain SSM with 2.2x faster inference and lower memory than transformers or scanning-based SSMs.
-
SCRWKV: Ultra-Compact Structure-Calibrated Vision-RWKV for Topological Crack Segmentation
SCRWKV is a 1.22M-parameter Vision-RWKV model using Structure-Field Encoder with AMCM and SCIU modules plus CSHF decoder that reports F1 0.8428 and mIoU 0.8512 on TUT crack dataset while claiming to outperform prior SOTA.
-
PestVL-Net: Enabling Multimodal Pest Learning via Fine-grained Vision-Language Interaction
PestVL-Net combines an RWKV visual backbone with saliency-guided window partitioning and MLLM-derived linguistic priors via multimodal chain-of-thought to enable fine-grained multimodal pest recognition on dedicated datasets.
-
MFC-RFNet: A Multi-scale Guided Rectified Flow Network for Radar Sequence Prediction
MFC-RFNet integrates multi-scale bidirectional communication, condition-guided alignment, and rectified flow to produce clearer and more skillful radar precipitation forecasts than prior baselines on four public datasets.