LSTCN is a dual-branch CNN that extracts temporal gait features by pooling spatial data into strips and applying local spatiotemporal convolutions with asymmetric kernels.
UniDoc: A universal large multimodal model for simultaneous text detection, recognition, spotting and understanding
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 9verdicts
UNVERDICTED 9roles
background 1polarities
background 1representative citing papers
SAME-Net adds a differentiable soft attention mask embedding module to achieve rectification-free end-to-end scene text spotting with 84.02% H-mean on Total-Text.
LUT-Opt distills XGBoost regressors into lookup tables to enable sub-millisecond adaptive optimization of rendering parameters such as subsurface scattering and ambient occlusion.
CPIFNet decomposes non-homogeneous dehazing into multiple homogeneous sub-problems via specialized IENet branches trained on different haze concentrations, then uses IFNet to fuse advantageous regions through deep feature merging.
FPFNet reports state-of-the-art AUROC scores on MVTec-AD and VisA for unified multi-class defect detection by adding feature perturbation and hierarchical fusion to UniAD with no extra parameters.
A multilevel perceptual CRF model using Swin Transformer, HPF fusion, HA adapters, and dynamic scaling attention achieves state-of-the-art monocular depth estimation on NYU Depth v2, KITTI, and MatterPort3D with reduced error and fast inference.
A multi-branch ResNet architecture with HRNet pose estimation and channel-attention fusion reaches 94.52% Rank-1 gait recognition accuracy on CASIA-B normal walking and leads skeleton-based methods on coat-wearing cases.
RDCNet reports state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof by combining random dilated convolutions with multi-branch and attention modules.
A survey of MLLM-based Visually Rich Document Understanding covering feature integration techniques, training paradigms, challenges like data scarcity, and emerging trends such as RAG and agentic frameworks.
citing papers explorer
-
Local Spatiotemporal Convolutional Network for Robust Gait Recognition
LSTCN is a dual-branch CNN that extracts temporal gait features by pooling spatial data into strips and applying local spatiotemporal convolutions with asymmetric kernels.
-
Do You Need Text Rectification? Soft Attention Mask Embedding for Rectification-Free Scene Text Spotting
SAME-Net adds a differentiable soft attention mask embedding module to achieve rectification-free end-to-end scene text spotting with 84.02% H-mean on Total-Text.
-
Lightweight Real-Time Rendering Parameter Optimization via XGBoost-Driven Lookup Tables
LUT-Opt distills XGBoost regressors into lookup tables to enable sub-millisecond adaptive optimization of rendering parameters such as subsurface scattering and ambient occlusion.
-
Multi-Branch Non-Homogeneous Image Dehazing via Concentration Partitioning and Image Fusion
CPIFNet decomposes non-homogeneous dehazing into multiple homogeneous sub-problems via specialized IENet branches trained on different haze concentrations, then uses IFNet to fuse advantageous regions through deep feature merging.
-
Feature Perturbation Pool-based Fusion Network for Unified Multi-Class Industrial Defect Detection
FPFNet reports state-of-the-art AUROC scores on MVTec-AD and VisA for unified multi-class defect detection by adding feature perturbation and hierarchical fusion to UniAD with no extra parameters.
-
Hierarchical Awareness Adapters with Hybrid Pyramid Feature Fusion for Dense Depth Prediction
A multilevel perceptual CRF model using Swin Transformer, HPF fusion, HA adapters, and dynamic scaling attention achieves state-of-the-art monocular depth estimation on NYU Depth v2, KITTI, and MatterPort3D with reduced error and fast inference.
-
Gait Recognition via Deep Residual Networks and Multi-Branch Feature Fusion
A multi-branch ResNet architecture with HRNet pose estimation and channel-attention fusion reaches 94.52% Rank-1 gait recognition accuracy on CASIA-B normal walking and leads skeleton-based methods on coat-wearing cases.
-
Image Classification via Random Dilated Convolution with Multi-Branch Feature Extraction and Context Excitation
RDCNet reports state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof by combining random dilated convolutions with multi-branch and attention modules.
-
A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends
A survey of MLLM-based Visually Rich Document Understanding covering feature integration techniques, training paradigms, challenges like data scarcity, and emerging trends such as RAG and agentic frameworks.