TextSquare: Scaling up text-centric visual instruction tuning

· 2024 · arXiv 2404.12803

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Local Spatiotemporal Convolutional Network for Robust Gait Recognition

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

LSTCN is a dual-branch CNN that extracts temporal gait features by pooling spatial data into strips and applying local spatiotemporal convolutions with asymmetric kernels.

Adaptive Slicing-Assisted Hyper Inference for Enhanced Small Object Detection in High-Resolution Imagery

cs.CV · 2026-04-21 · unverdicted · novelty 7.0

ASAHI adaptively slices high-res images into 6 or 12 patches, adds slicing-assisted fine-tuning, and uses Cluster-DIoU-NMS to hit 56.8% mAP on VisDrone2019 and 22.7% on xView while running 20-25% faster than fixed slicing baselines.

Do You Need Text Rectification? Soft Attention Mask Embedding for Rectification-Free Scene Text Spotting

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

SAME-Net adds a differentiable soft attention mask embedding module to achieve rectification-free end-to-end scene text spotting with 84.02% H-mean on Total-Text.

Lightweight Real-Time Rendering Parameter Optimization via XGBoost-Driven Lookup Tables

cs.CV · 2026-04-28 · unverdicted · novelty 5.0

LUT-Opt distills XGBoost regressors into lookup tables to enable sub-millisecond adaptive optimization of rendering parameters such as subsurface scattering and ambient occlusion.

Multi-Branch Non-Homogeneous Image Dehazing via Concentration Partitioning and Image Fusion

cs.CV · 2026-04-27 · unverdicted · novelty 5.0

CPIFNet decomposes non-homogeneous dehazing into multiple homogeneous sub-problems via specialized IENet branches trained on different haze concentrations, then uses IFNet to fuse advantageous regions through deep feature merging.

Feature Perturbation Pool-based Fusion Network for Unified Multi-Class Industrial Defect Detection

cs.CV · 2026-04-21 · unverdicted · novelty 5.0

FPFNet reports state-of-the-art AUROC scores on MVTec-AD and VisA for unified multi-class defect detection by adding feature perturbation and hierarchical fusion to UniAD with no extra parameters.

Hierarchical Awareness Adapters with Hybrid Pyramid Feature Fusion for Dense Depth Prediction

cs.CV · 2026-04-03 · unverdicted · novelty 5.0

A multilevel perceptual CRF model using Swin Transformer, HPF fusion, HA adapters, and dynamic scaling attention achieves state-of-the-art monocular depth estimation on NYU Depth v2, KITTI, and MatterPort3D with reduced error and fast inference.

Gait Recognition via Deep Residual Networks and Multi-Branch Feature Fusion

cs.CV · 2026-04-30 · unverdicted · novelty 3.0

A multi-branch ResNet architecture with HRNet pose estimation and channel-attention fusion reaches 94.52% Rank-1 gait recognition accuracy on CASIA-B normal walking and leads skeleton-based methods on coat-wearing cases.

Image Classification via Random Dilated Convolution with Multi-Branch Feature Extraction and Context Excitation

cs.CV · 2026-04-28 · unverdicted · novelty 3.0

RDCNet reports state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof by combining random dilated convolutions with multi-branch and attention modules.

citing papers explorer

Showing 9 of 9 citing papers.

Local Spatiotemporal Convolutional Network for Robust Gait Recognition cs.CV · 2026-05-14 · unverdicted · none · ref 9
LSTCN is a dual-branch CNN that extracts temporal gait features by pooling spatial data into strips and applying local spatiotemporal convolutions with asymmetric kernels.
Adaptive Slicing-Assisted Hyper Inference for Enhanced Small Object Detection in High-Resolution Imagery cs.CV · 2026-04-21 · unverdicted · none · ref 51
ASAHI adaptively slices high-res images into 6 or 12 patches, adds slicing-assisted fine-tuning, and uses Cluster-DIoU-NMS to hit 56.8% mAP on VisDrone2019 and 22.7% on xView while running 20-25% faster than fixed slicing baselines.
Do You Need Text Rectification? Soft Attention Mask Embedding for Rectification-Free Scene Text Spotting cs.CV · 2026-05-18 · unverdicted · none · ref 3
SAME-Net adds a differentiable soft attention mask embedding module to achieve rectification-free end-to-end scene text spotting with 84.02% H-mean on Total-Text.
Lightweight Real-Time Rendering Parameter Optimization via XGBoost-Driven Lookup Tables cs.CV · 2026-04-28 · unverdicted · none · ref 33
LUT-Opt distills XGBoost regressors into lookup tables to enable sub-millisecond adaptive optimization of rendering parameters such as subsurface scattering and ambient occlusion.
Multi-Branch Non-Homogeneous Image Dehazing via Concentration Partitioning and Image Fusion cs.CV · 2026-04-27 · unverdicted · none · ref 50
CPIFNet decomposes non-homogeneous dehazing into multiple homogeneous sub-problems via specialized IENet branches trained on different haze concentrations, then uses IFNet to fuse advantageous regions through deep feature merging.
Feature Perturbation Pool-based Fusion Network for Unified Multi-Class Industrial Defect Detection cs.CV · 2026-04-21 · unverdicted · none · ref 50
FPFNet reports state-of-the-art AUROC scores on MVTec-AD and VisA for unified multi-class defect detection by adding feature perturbation and hierarchical fusion to UniAD with no extra parameters.
Hierarchical Awareness Adapters with Hybrid Pyramid Feature Fusion for Dense Depth Prediction cs.CV · 2026-04-03 · unverdicted · none · ref 52
A multilevel perceptual CRF model using Swin Transformer, HPF fusion, HA adapters, and dynamic scaling attention achieves state-of-the-art monocular depth estimation on NYU Depth v2, KITTI, and MatterPort3D with reduced error and fast inference.
Gait Recognition via Deep Residual Networks and Multi-Branch Feature Fusion cs.CV · 2026-04-30 · unverdicted · none · ref 29
A multi-branch ResNet architecture with HRNet pose estimation and channel-attention fusion reaches 94.52% Rank-1 gait recognition accuracy on CASIA-B normal walking and leads skeleton-based methods on coat-wearing cases.
Image Classification via Random Dilated Convolution with Multi-Branch Feature Extraction and Context Excitation cs.CV · 2026-04-28 · unverdicted · none · ref 43
RDCNet reports state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, Imagenette, and Imagewoof by combining random dilated convolutions with multi-branch and attention modules.

TextSquare: Scaling up text-centric visual instruction tuning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer