iMiGUE-3K is the largest in-the-wild micro-gesture video dataset with 3.4K clips and 37M frames from real interviews, supporting self-supervised foundation models and benchmarks that show micro-gestures improve emotion understanding.
hub Tool reference
Deep residual learning for image recognition
Tool reference. 80% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
MetaEarth-MM unifies multi-modal remote sensing image generation and any-to-any translation across five modalities via scene-centered joint modeling on the new EarthMM dataset.
A Mamba-based interactive state space model with cross-modal local scanning achieves competitive guided depth super-resolution performance at linear computational cost.
Class-level unlearning shortcuts via bias suppression in the classification head; new bias-aware training mechanisms and bias-specific metrics are introduced to diagnose and reduce this dependence.
InterMesh explicitly incorporates human-object interaction semantics into multi-person mesh recovery via a detector and two lightweight modules, delivering up to 9.9% MPJPE reduction on interaction-heavy datasets.
ShapeGrasp improves grasp success on unknown objects to 84-91% by iteratively updating a 3D shape model with visuo-haptic feedback during real-world grasp attempts.
CT-Lite combines Feature Attention Style Transfer (FAST) and Structured Factorized Projections (SFP) with contrastive learning to reach AUROC within 5-7% of uncompressed baselines on compressed CT volumes across three datasets while using far fewer parameters.
A spatio-channel clustering framework for CNN compression reduces FLOPs by 81% and raises brain tumor MRI classification accuracy from 87.76% to 89.80% compared with global SVD and Tucker baselines.
Latent space probing on CogVideoX achieves 97.29% F1 for adult content detection on a new 11k-clip dataset with 4-6ms overhead.
Unlearnable examples fail under pretraining-finetuning due to semantic filtering by frozen layers, but Shallow Semantic Camouflage restores effectiveness by confining perturbations to semantically valid subspaces.
FogFool creates fog-based adversarial perturbations using Perlin noise optimization to achieve high black-box transferability (83.74% TASR) and robustness to defenses in remote sensing classification.
CDPR integrates polarization priors into a diffusion-based monocular depth estimator via shared latent space and adaptive gating, outperforming RGB-only methods in challenging scenes.
TinySet-9M dataset and DEAL point-prompted framework deliver 31.4% relative AP75 gain over supervised baselines for small object detection with one click at inference and generalization to unseen categories.
SBL algorithms are unified under majorization-minimization with new convergence results, and a dimension-invariant neural network learns superior data-driven update rules that generalize across matrices and parameters.
SABLE shows that semantics-aware natural triggers enable effective backdoor attacks in federated learning against multiple aggregation rules while preserving benign accuracy.
UMID infers membership in contrastive pre-training data using only text queries by performing latent inversion and comparing similarity and variability signals to synthetic gibberish references via unsupervised anomaly detection.
CBEN provides paired optical-radar images with cloud occlusion, revealing 23-33 point AP drops in clear-sky trained models and 17-29 point relative gains when models are trained on cloudy data.
CoLA-Flow Policy encodes action sequences into a continuous latent space and learns an explicit flow there, yielding near-single-step inference with up to 93.7% smoother trajectories and 25-point higher task success than raw-action flow baselines.
GRAIL trains graph predictors via imitation learning by modeling generation as sequential decisions on partial graph embeddings, matching or exceeding prior methods on 18 benchmarks.
AdaLoc keeps a model locked to authorized users by confining all post-deployment updates to a chosen subset of weights, preserving both task performance for authorized use and near-random accuracy for unauthorized use across vision and language models.
TAR uses frozen text encoders on remote sensing scene descriptions to boost high-level features for coarse-to-fine optical-SAR image registration under large deformations.
RGSE adapts text embeddings at test time via evolutionary search, using cosine similarity rewards from high-confidence visual proposals to improve open-vocabulary object detection under distribution shifts.
RFPrompt adapts the Large Wireless Model via deep prompt tokens to improve out-of-distribution robustness in modulation classification while training only a small number of parameters.
MSACT improves localization stability and task success rates in limited-data bimanual manipulation by extracting stable 2D attention points and aligning predicted attention sequences across frames without keypoint labels.
citing papers explorer
- GFSR: Geometric Fidelity and Spatial Refinement for Reliable Lane Detection
- Lowering the Barrier to IREX Participation: Open-Source Algorithms, Toolkit, and Benchmarking for Iris Recognition
- Explainable AI in Speaker Recognition -- Making Latent Representations Understandable
- EmbodiTTA: Resource-Efficient Test-Time Adaptation for Embodied Visual Systems