LSTCN is a dual-branch CNN that extracts temporal gait features by pooling spatial data into strips and applying local spatiotemporal convolutions with asymmetric kernels.
TextSquare: Scaling up text-centric visual instruction tuning
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 9years
2026 9verdicts
UNVERDICTED 9roles
background 1polarities
background 1representative citing papers
ASAHI adaptively slices high-res images into 6 or 12 patches, adds slicing-assisted fine-tuning, and uses Cluster-DIoU-NMS to hit 56.8% mAP on VisDrone2019 and 22.7% on xView while running 20-25% faster than fixed slicing baselines.
SAME-Net adds a differentiable soft attention mask embedding module to achieve rectification-free end-to-end scene text spotting with 84.02% H-mean on Total-Text.
LUT-Opt distills XGBoost regressors into lookup tables to enable sub-millisecond adaptive optimization of rendering parameters such as subsurface scattering and ambient occlusion.
CPIFNet decomposes non-homogeneous dehazing into multiple homogeneous sub-problems via specialized IENet branches trained on different haze concentrations, then uses IFNet to fuse advantageous regions through deep feature merging.
FPFNet reports state-of-the-art AUROC scores on MVTec-AD and VisA for unified multi-class defect detection by adding feature perturbation and hierarchical fusion to UniAD with no extra parameters.
A multilevel perceptual CRF model using Swin Transformer, HPF fusion, HA adapters, and dynamic scaling attention achieves state-of-the-art monocular depth estimation on NYU Depth v2, KITTI, and MatterPort3D with reduced error and fast inference.
A multi-branch ResNet architecture with HRNet pose estimation and channel-attention fusion reaches 94.52% Rank-1 gait recognition accuracy on CASIA-B normal walking and leads skeleton-based methods on coat-wearing cases.
RDCNet reports state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof by combining random dilated convolutions with multi-branch and attention modules.
citing papers explorer
-
Local Spatiotemporal Convolutional Network for Robust Gait Recognition
LSTCN is a dual-branch CNN that extracts temporal gait features by pooling spatial data into strips and applying local spatiotemporal convolutions with asymmetric kernels.
-
Adaptive Slicing-Assisted Hyper Inference for Enhanced Small Object Detection in High-Resolution Imagery
ASAHI adaptively slices high-res images into 6 or 12 patches, adds slicing-assisted fine-tuning, and uses Cluster-DIoU-NMS to hit 56.8% mAP on VisDrone2019 and 22.7% on xView while running 20-25% faster than fixed slicing baselines.
-
Do You Need Text Rectification? Soft Attention Mask Embedding for Rectification-Free Scene Text Spotting
SAME-Net adds a differentiable soft attention mask embedding module to achieve rectification-free end-to-end scene text spotting with 84.02% H-mean on Total-Text.
-
Lightweight Real-Time Rendering Parameter Optimization via XGBoost-Driven Lookup Tables
LUT-Opt distills XGBoost regressors into lookup tables to enable sub-millisecond adaptive optimization of rendering parameters such as subsurface scattering and ambient occlusion.
-
Multi-Branch Non-Homogeneous Image Dehazing via Concentration Partitioning and Image Fusion
CPIFNet decomposes non-homogeneous dehazing into multiple homogeneous sub-problems via specialized IENet branches trained on different haze concentrations, then uses IFNet to fuse advantageous regions through deep feature merging.
-
Feature Perturbation Pool-based Fusion Network for Unified Multi-Class Industrial Defect Detection
FPFNet reports state-of-the-art AUROC scores on MVTec-AD and VisA for unified multi-class defect detection by adding feature perturbation and hierarchical fusion to UniAD with no extra parameters.
-
Hierarchical Awareness Adapters with Hybrid Pyramid Feature Fusion for Dense Depth Prediction
A multilevel perceptual CRF model using Swin Transformer, HPF fusion, HA adapters, and dynamic scaling attention achieves state-of-the-art monocular depth estimation on NYU Depth v2, KITTI, and MatterPort3D with reduced error and fast inference.
-
Gait Recognition via Deep Residual Networks and Multi-Branch Feature Fusion
A multi-branch ResNet architecture with HRNet pose estimation and channel-attention fusion reaches 94.52% Rank-1 gait recognition accuracy on CASIA-B normal walking and leads skeleton-based methods on coat-wearing cases.
-
Image Classification via Random Dilated Convolution with Multi-Branch Feature Extraction and Context Excitation
RDCNet reports state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof by combining random dilated convolutions with multi-branch and attention modules.