iMiGUE-3K is the largest in-the-wild micro-gesture video dataset with 3.4K clips and 37M frames from real interviews, supporting self-supervised foundation models and benchmarks that show micro-gestures improve emotion understanding.
hub Tool reference
Deep residual learning for image recognition
Tool reference. 80% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
MetaEarth-MM unifies multi-modal remote sensing image generation and any-to-any translation across five modalities via scene-centered joint modeling on the new EarthMM dataset.
A Mamba-based interactive state space model with cross-modal local scanning achieves competitive guided depth super-resolution performance at linear computational cost.
Class-level unlearning shortcuts via bias suppression in the classification head; new bias-aware training mechanisms and bias-specific metrics are introduced to diagnose and reduce this dependence.
InterMesh explicitly incorporates human-object interaction semantics into multi-person mesh recovery via a detector and two lightweight modules, delivering up to 9.9% MPJPE reduction on interaction-heavy datasets.
ShapeGrasp improves grasp success on unknown objects to 84-91% by iteratively updating a 3D shape model with visuo-haptic feedback during real-world grasp attempts.
CT-Lite combines Feature Attention Style Transfer (FAST) and Structured Factorized Projections (SFP) with contrastive learning to reach AUROC within 5-7% of uncompressed baselines on compressed CT volumes across three datasets while using far fewer parameters.
A spatio-channel clustering framework for CNN compression reduces FLOPs by 81% and raises brain tumor MRI classification accuracy from 87.76% to 89.80% compared with global SVD and Tucker baselines.
Latent space probing on CogVideoX achieves 97.29% F1 for adult content detection on a new 11k-clip dataset with 4-6ms overhead.
Unlearnable examples fail under pretraining-finetuning due to semantic filtering by frozen layers, but Shallow Semantic Camouflage restores effectiveness by confining perturbations to semantically valid subspaces.
FogFool creates fog-based adversarial perturbations using Perlin noise optimization to achieve high black-box transferability (83.74% TASR) and robustness to defenses in remote sensing classification.
CDPR integrates polarization priors into a diffusion-based monocular depth estimator via shared latent space and adaptive gating, outperforming RGB-only methods in challenging scenes.
TinySet-9M dataset and DEAL point-prompted framework deliver 31.4% relative AP75 gain over supervised baselines for small object detection with one click at inference and generalization to unseen categories.
SBL algorithms are unified under majorization-minimization with new convergence results, and a dimension-invariant neural network learns superior data-driven update rules that generalize across matrices and parameters.
SABLE shows that semantics-aware natural triggers enable effective backdoor attacks in federated learning against multiple aggregation rules while preserving benign accuracy.
UMID infers membership in contrastive pre-training data using only text queries by performing latent inversion and comparing similarity and variability signals to synthetic gibberish references via unsupervised anomaly detection.
CBEN provides paired optical-radar images with cloud occlusion, revealing 23-33 point AP drops in clear-sky trained models and 17-29 point relative gains when models are trained on cloudy data.
CoLA-Flow Policy encodes action sequences into a continuous latent space and learns an explicit flow there, yielding near-single-step inference with up to 93.7% smoother trajectories and 25-point higher task success than raw-action flow baselines.
GRAIL trains graph predictors via imitation learning by modeling generation as sequential decisions on partial graph embeddings, matching or exceeding prior methods on 18 benchmarks.
AdaLoc keeps a model locked to authorized users by confining all post-deployment updates to a chosen subset of weights, preserving both task performance for authorized use and near-random accuracy for unauthorized use across vision and language models.
OD-TTA enables resource-efficient test-time adaptation on edge devices by triggering updates only on detected domain shifts, achieving comparable accuracy with lower energy and computation costs for embodied visual systems.
Open-source neural network iris matchers (TripletIris using batch-hard triplet loss and ArcIris using ArcFace loss) plus compliant C++ implementations of HDBIF and CRYPTS are released, evaluated on IREX X and eight academic datasets, and accompanied by segmentation tools to lower entry barriers for
TAR uses frozen text encoders on remote sensing scene descriptions to boost high-level features for coarse-to-fine optical-SAR image registration under large deformations.
RGSE adapts text embeddings at test time via evolutionary search, using cosine similarity rewards from high-confidence visual proposals to improve open-vocabulary object detection under distribution shifts.
citing papers explorer
-
iMiGUE-3K: A Large-Scale Benchmark for Micro-Gesture Analysis with Self-Supervised Learning
iMiGUE-3K is the largest in-the-wild micro-gesture video dataset with 3.4K clips and 37M frames from real interviews, supporting self-supervised foundation models and benchmarks that show micro-gestures improve emotion understanding.
-
MetaEarth-MM: Unified Multimodal Remote Sensing Image Generation with Scene-centered Joint Modeling
MetaEarth-MM unifies multi-modal remote sensing image generation and any-to-any translation across five modalities via scene-centered joint modeling on the new EarthMM dataset.
-
Interactive State Space Model with Cross-Modal Local Scanning for Depth Super-Resolution
A Mamba-based interactive state space model with cross-modal local scanning achieves competitive guided depth super-resolution performance at linear computational cost.
-
Classification-Head Bias in Class-Level Machine Unlearning: Diagnosis, Mitigation, and Evaluation
Class-level unlearning shortcuts via bias suppression in the classification head; new bias-aware training mechanisms and bias-specific metrics are introduced to diagnose and reduce this dependence.
-
InterMesh: Explicit Interaction-Aware End-to-End Multi-Person Human Mesh Recovery
InterMesh explicitly incorporates human-object interaction semantics into multi-person mesh recovery via a detector and two lightweight modules, delivering up to 9.9% MPJPE reduction on interaction-heavy datasets.
-
ShapeGrasp: Simultaneous Visuo-Haptic Shape Completion and Grasping for Improved Robot Manipulation
ShapeGrasp improves grasp success on unknown objects to 84-91% by iteratively updating a 3D shape model with visuo-haptic feedback during real-world grasp attempts.
-
Learning from Compressed CT: Feature Attention Style Transfer and Structured Factorized Projections for Resource-Efficient Medical Image Analysis
CT-Lite combines Feature Attention Style Transfer (FAST) and Structured Factorized Projections (SFP) with contrastive learning to reach AUROC within 5-7% of uncompressed baselines on compressed CT volumes across three datasets while using far fewer parameters.
-
Hierarchical Spatio-Channel Clustering for Efficient Model Compression in Medical Image Analysis
A spatio-channel clustering framework for CNN compression reduces FLOPs by 81% and raises brain tumor MRI classification accuracy from 87.76% to 89.80% compared with global SVD and Tucker baselines.
-
Latent Space Probing for Adult Content Detection in Video Generative Models
Latent space probing on CogVideoX achieves 97.29% F1 for adult content detection on a new 11k-clip dataset with 4-6ms overhead.
-
Channel-Level Semantic Perturbations: Unlearnable Examples for Diverse Training Paradigms
Unlearnable examples fail under pretraining-finetuning due to semantic filtering by frozen layers, but Shallow Semantic Camouflage restores effectiveness by confining perturbations to semantically valid subspaces.
-
Physically-Induced Atmospheric Adversarial Perturbations: Enhancing Transferability and Robustness in Remote Sensing Image Classification
FogFool creates fog-based adversarial perturbations using Perlin noise optimization to achieve high black-box transferability (83.74% TASR) and robustness to defenses in remote sensing classification.
-
CDPR: Cross-modal Diffusion with Polarization for Reliable Monocular Depth Estimation
CDPR integrates polarization priors into a diffusion-based monocular depth estimator via shared latent space and adaptive gating, outperforming RGB-only methods in challenging scenes.
-
Generalized Small Object Detection:A Point-Prompted Paradigm and Benchmark
TinySet-9M dataset and DEAL point-prompted framework deliver 31.4% relative AP75 gain over supervised baselines for small object detection with one click at inference and generalization to unseen categories.
-
Sparse Bayesian Learning Algorithms Revisited: From Learning Majorizers to Structured Algorithmic Learning using Neural Networks
SBL algorithms are unified under majorization-minimization with new convergence results, and a dimension-invariant neural network learns superior data-driven update rules that generalize across matrices and parameters.
-
Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning
SABLE shows that semantics-aware natural triggers enable effective backdoor attacks in federated learning against multiple aggregation rules while preserving benign accuracy.
-
Membership Inference for Contrastive Pre-training Models with Text-only PII Queries
UMID infers membership in contrastive pre-training data using only text queries by performing latent inversion and comparing similarity and variability signals to synthetic gibberish references via unsupervised anomaly detection.
-
CBEN -- A Multimodal Machine Learning Dataset for Cloud Robust Remote Sensing Image Understanding
CBEN provides paired optical-radar images with cloud occlusion, revealing 23-33 point AP drops in clear-sky trained models and 17-29 point relative gains when models are trained on cloudy data.
-
CoLA-Flow Policy: Temporally Coherent Imitation Learning via Continuous Latent Action Flow Matching for Robotic Manipulation
CoLA-Flow Policy encodes action sequences into a continuous latent space and learns an explicit flow there, yielding near-single-step inference with up to 93.7% smoother trajectories and 25-point higher task success than raw-action flow baselines.
-
Building Deep Graph Predictors with Graph Imitation Learning
GRAIL trains graph predictors via imitation learning by modeling generation as sequential decisions on partial graph embeddings, matching or exceeding prior methods on 18 benchmarks.
-
Re-Key-Free, Risky-Free: Adaptable Model Usage Control
AdaLoc keeps a model locked to authorized users by confining all post-deployment updates to a chosen subset of weights, preserving both task performance for authorized use and near-random accuracy for unauthorized use across vision and language models.
-
EmbodiTTA: Resource-Efficient Test-Time Adaptation for Embodied Visual Systems
OD-TTA enables resource-efficient test-time adaptation on edge devices by triggering updates only on detected domain shifts, achieving comparable accuracy with lower energy and computation costs for embodied visual systems.
-
Lowering the Barrier to IREX Participation: Open-Source Algorithms, Toolkit, and Benchmarking for Iris Recognition
Open-source neural network iris matchers (TripletIris using batch-hard triplet loss and ArcIris using ArcFace loss) plus compliant C++ implementations of HDBIF and CRYPTS are released, evaluated on IREX X and eight academic datasets, and accompanied by segmentation tools to lower entry barriers for
-
TAR: Text Semantic Assisted Cross-modal Image Registration Framework for Optical and SAR Images
TAR uses frozen text encoders on remote sensing scene descriptions to boost high-level features for coarse-to-fine optical-SAR image registration under large deformations.
-
Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection
RGSE adapts text embeddings at test time via evolutionary search, using cosine similarity rewards from high-confidence visual proposals to improve open-vocabulary object detection under distribution shifts.
-
RFPrompt: Prompt-Based Expert Adaptation of the Large Wireless Model for Modulation Classification
RFPrompt adapts the Large Wireless Model via deep prompt tokens to improve out-of-distribution robustness in modulation classification while training only a small number of parameters.
-
MSACT: Multistage Spatial Alignment for Stable Low-Latency Fine Manipulation
MSACT improves localization stability and task success rates in limited-data bimanual manipulation by extracting stable 2D attention points and aligning predicted attention sequences across frames without keypoint labels.
-
Stereo Multistage Spatial Attention for Real-Time Mobile Manipulation Under Visual Scale Variation and Disturbances
A stereo multistage spatial attention deep predictive learning system improves robustness and success rates for real-time mobile manipulation under visual scale variation and disturbances.
-
HFS-TriNet: A Three-Branch Collaborative Feature Learning Network for Prostate Cancer Classification from TRUS Videos
HFS-TriNet applies heuristic frame selection and a three-branch network (ResNet50, SAM-based with temporal attention, WTCR) to classify prostate cancer from TRUS videos.
-
CODO: An Automated Compiler for Comprehensive Dataflow Optimization
CODO automates comprehensive dataflow optimization on FPGAs, achieving 1.45x-4.52x speedups on kernels and up to 33.8x on DNN models over state-of-the-art frameworks.
-
BiasIG: Benchmarking Multi-dimensional Social Biases in Text-to-Image Models
BiasIG is a multi-dimensional benchmark for social biases in T2I models that shows debiasing interventions frequently cause confounding discrimination effects.
-
Hierarchical, Interpretable, Label-Free Concept Bottleneck Model
HIL-CBM is a hierarchical label-free concept bottleneck model that improves classification accuracy and explanation quality over prior single-level CBMs using a visual consistency loss and dual heads.
-
Learnable Quantum Efficiency Filters for Urban Hyperspectral Segmentation
LQE is a physics-constrained learnable dimensionality reduction technique that improves average mIoU in hyperspectral urban segmentation on three datasets while using only 12-36 parameters.
-
FedACT: Concurrent Federated Intelligence across Heterogeneous Data Sources
FedACT schedules devices across concurrent FL jobs via alignment scoring and fairness to reduce average job completion time by up to 8.3x and raise accuracy by up to 44.5% versus baselines.
-
Practical Quantum Federated Learning for Privacy-Sensitive Healthcare: Communication Efficiency and Noise Resilience
Hybrid QFL cuts quantum transmissions from 3TNMP to {3t + 2(T-t)}NMP over T rounds while preserving near-centralized convergence and improving depolarizing-noise resilience via decentralized aggregation and Steane-code QEC.
-
TFusionOcc: T-Primitive Based Object-Centric Multi-Sensor Fusion Framework for 3D Occupancy Prediction
TFusionOcc uses a family of Student's t-distribution T-primitives and a T-mixture model for multi-sensor 3D occupancy prediction, reporting state-of-the-art results on nuScenes.
-
Contrastive Heliophysical Image Pretraining for Solar Dynamics Observatory Records
SolarCHIP contrastively pretrains CNN and Vision Transformer backbones on SDO AIA-HMI data with multi-granularity objectives, achieving SOTA on cross-modal translation and flare classification especially in low-resource settings.
-
MapRF: Weakly Supervised Online HD Map Construction via NeRF-Guided Self-Training
MapRF reaches about 75% of fully supervised HD map accuracy on Argoverse 2 and nuScenes by generating view-consistent pseudo labels via a NeRF conditioned on map predictions and refining them with Map-to-Ray Matching in self-training.
-
X-IONet: Cross-Platform Inertial Odometry Network for Pedestrian and Legged Robot
X-IONet combines rule-based platform classification with a dual-stage attention network to predict displacement and uncertainty from IMU data, then fuses outputs via EKF, achieving reported error reductions on pedestrian and quadruped datasets.
-
Remote Rowhammer Attack using Adversarial Observations on Federated Learning Clients
A reinforcement learning attacker manipulates client sensor observations in federated learning to induce repetitive server memory updates, achieving around 70% repeated update rate and enabling remote Rowhammer bit flips on an automatic speech recognition model.
-
Fixed-Length Dense Fingerprint Representation with Alignment and Robust Enhancement
FLARE introduces a fixed-length 3D dense fingerprint descriptor integrated with pose-based alignment and ridge enhancement for robust cross-modality matching.
-
Replacement Learning: Training Neural Networks with Fewer Parameters
Replacement Learning replaces selected blocks in CNNs and ViTs with learnable parameter-fusion surrogates derived from adjacent layers to reduce full-depth backpropagation redundancy.
-
LymphNode: A Plug-and-Play Access Control Method for Deep Neural Networks
LymphNode enforces default-deny access control on DNNs by injecting GSUAP into the feature space to neutralize utility for unauthorized queries and selectively restore it for authorized inputs carrying a stealthy credential, using under 100 samples from surrogate data.
-
iPay: Integrated Payment Action Recognition via Multimodal Networks and Adaptive Spatial Prior Learning
iPay fuses RGB and skeleton expert streams via dual-attention and a prior-driven Spatial Difference Discriminator to reach 83.45% accuracy on 500+ real-world payment clips from onboard transit cameras.
-
Evidence-based Decision Modeling for Synthetic Face Detection with Uncertainty-driven Active Learning
EMSFD uses Dirichlet-based evidence modeling to capture prediction uncertainty in synthetic face detection and applies uncertainty-driven active learning to achieve 15% higher accuracy than prior methods.
-
Multi-Level Bidirectional Biomimetic Learning for EEG-Based Visual Decoding
MB2L achieves 80.5% top-1 and 97.6% top-5 accuracy on zero-shot EEG-to-image retrieval by using biomimetic modules and bidirectional contrastive learning to align neural and visual features.
-
Deep Reprogramming Distillation for Medical Foundation Models
DRD introduces a reprogramming module and CKA-based distillation to enable efficient, robust adaptation of medical foundation models to downstream 2D/3D classification and segmentation tasks, outperforming prior PEFT and KD methods on 18 tasks.
-
FedPLT: Scalable, Resource-Efficient, and Heterogeneity-Aware Federated Learning via Partial Layer Training
FedPLT assigns client-specific model layers for training and matches or beats full-model federated learning accuracy with 71-82 percent fewer trainable parameters per client.
-
Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification
Dual-LoRA with a language-anchored adversary achieves 0.91% EER on the TidyVoice benchmark for cross-lingual speaker verification by targeting true linguistic cues while preserving speaker discriminability.
-
Meta-Ensemble Learning with Diverse Data Splits for Improved Respiratory Sound Classification
Meta-ensemble learning on diverse ICBHI data splits reaches 66.49% Score and improves generalization on two external datasets.
-
Stylistic-STORM (ST-STORM) : Perceiving the Semantic Nature of Appearance
ST-STORM introduces a dual-branch SSL framework that disentangles semantic content from stylistic appearance using gated latent streams, JEPA for content invariance, and adversarial constraints for style capture.