Pretrained LLMs adapted via convolutional projections and LoRA act as efficient frozen backbones for sensor-based human activity recognition, delivering strong data efficiency and cross-dataset transfer.
hub Canonical reference
Attention is all you need
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 5polarities
background 5representative citing papers
A joint fullband-subband model using high-resolution 44.1 kHz audio outperforms standard 16 kHz detectors for singing voice deepfake detection by exploiting spectrum-specific synthesis artifacts.
MDS-DETR introduces a masked duplicate suppressor in self-attention to enable one-to-many supervision inside a single decoder, yielding +2.8 mAP over Deformable-DETR on COCO with 5% more training time and outperforming MR.DETR by 0.3 mAP while training 20% faster.
MSACT improves localization stability and task success rates in limited-data bimanual manipulation by extracting stable 2D attention points and aligning predicted attention sequences across frames without keypoint labels.
A stereo multistage spatial attention deep predictive learning system improves robustness and success rates for real-time mobile manipulation under visual scale variation and disturbances.
Diffusion models for in-context meta-learning of robot dynamics outperform deterministic Transformers in robustness to distribution shifts while enabling real-time operation via warm-started sampling.
Unsupervised contrastive learning with multi-domain equivalent transformations produces robust radio signal embeddings that outperform baselines in few-shot and cross-domain settings.
CFM uses unstable predictions via contrastive learning to improve SST quality on 3 decision policies and 8 languages in MuST-C v1.0.
MTA-RL predicts 3D driving affordances from multi-modal sensors with a transformer and uses them as the observation space for an RL policy, yielding better route completion and generalization than baselines in CARLA urban scenarios.
A Meta AutoEncoder framework enables adaptive, progressive compression of visual features for low-latency edge-cloud VLM inference without model fine-tuning.
REINA-SAN and REINA-TAN add temporal context to information-based read/write policies, improving the quality-latency tradeoff in simultaneous speech translation by up to 7.1% on Normalized Streaming Efficiency.
A rectified flow model trained on 30 actuation-space demonstrations produces control sequences that yield 97.5% grasp success across the workspace, with generalization to object size changes of ±33% and execution speed scaling from 20% to 200%.
DIVER uses RL-guided diffusion to produce diverse feasible trajectories from one ground-truth path, addressing mode collapse in imitation learning for autonomous driving.
MSDformer introduces a multi-scale discrete transformer that tokenizes time series at multiple scales and models them autoregressively in discrete space, claiming superior performance over prior DTM methods with rate-distortion theoretical support.
A conditional flow matching model generates realistic safety-critical traffic scenarios by turning nominal scenes into dangerous rollouts using combined simulation and real data.
An agentic LLM/LVM framework generates adaptive behavior trees on-the-fly for AV navigation in CARLA+Nav2 simulation, succeeding in obstacle avoidance where static BTs fail.
A survey reviewing statistical and deep learning approaches to synthetic network traffic generation, with comparisons, an AI comparison tool, open challenges, and future directions.
A systematic review finds research on the sustainability of LLM-generated code to be limited, fragmented, and without accepted frameworks for measurement or benchmarking.
A literature review of intelligent automation approaches using robotics, AI, and control for disassembly, inspection, sorting, and reprocessing of end-of-life electronics.
citing papers explorer
-
Efficient and Adaptive Human Activity Recognition via LLM Backbones
Pretrained LLMs adapted via convolutional projections and LoRA act as efficient frozen backbones for sensor-based human activity recognition, delivering strong data efficiency and cross-dataset transfer.
-
Joint Fullband-Subband Modeling for High-Resolution SingFake Detection
A joint fullband-subband model using high-resolution 44.1 kHz audio outperforms standard 16 kHz detectors for singing voice deepfake detection by exploiting spectrum-specific synthesis artifacts.
-
MDS-DETR: DETR with Masked Duplicate Suppressor
MDS-DETR introduces a masked duplicate suppressor in self-attention to enable one-to-many supervision inside a single decoder, yielding +2.8 mAP over Deformable-DETR on COCO with 5% more training time and outperforming MR.DETR by 0.3 mAP while training 20% faster.
-
MSACT: Multistage Spatial Alignment for Stable Low-Latency Fine Manipulation
MSACT improves localization stability and task success rates in limited-data bimanual manipulation by extracting stable 2D attention points and aligning predicted attention sequences across frames without keypoint labels.
-
Stereo Multistage Spatial Attention for Real-Time Mobile Manipulation Under Visual Scale Variation and Disturbances
A stereo multistage spatial attention deep predictive learning system improves robustness and success rates for real-time mobile manipulation under visual scale variation and disturbances.
-
Diffusion Sequence Models for Generative In-Context Meta-Learning of Robot Dynamics
Diffusion models for in-context meta-learning of robot dynamics outperform deterministic Transformers in robustness to distribution shifts while enabling real-time operation via warm-started sampling.
-
Unsupervised Equivalent Contrastive Learning for Radio Signal Recognition
Unsupervised contrastive learning with multi-domain equivalent transformations produces robust radio signal embeddings that outperform baselines in few-shot and cross-domain settings.
-
Contrastive Feedback Mechanism for Simultaneous Speech Translation
CFM uses unstable predictions via contrastive learning to improve SST quality on 3 decision policies and 8 languages in MuST-C v1.0.
-
MTA-RL: Robust Urban Driving via Multi-modal Transformer-based 3D Affordances and Reinforcement Learning
MTA-RL predicts 3D driving affordances from multi-modal sensors with a transformer and uses them as the observation space for an RL policy, yielding better route completion and generalization than baselines in CARLA urban scenarios.
-
Progressive Semantic Communication for Efficient Edge-Cloud Vision-Language Models
A Meta AutoEncoder framework enables adaptive, progressive compression of visual features for low-latency edge-cloud VLM inference without model fine-tuning.
-
Regularized Entropy Information Adaptation with Temporal-Awareness Networks for Simultaneous Speech Translation
REINA-SAN and REINA-TAN add temporal context to information-based read/write policies, improving the quality-latency tradeoff in simultaneous speech translation by up to 7.1% on Normalized Streaming Efficiency.
-
Lightweight Learning from Actuation-Space Demonstrations via Flow Matching for Whole-Body Soft Robotic Grasping
A rectified flow model trained on 30 actuation-space demonstrations produces control sequences that yield 97.5% grasp success across the workspace, with generalization to object size changes of ±33% and execution speed scaling from 20% to 200%.
-
DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving
DIVER uses RL-guided diffusion to produce diverse feasible trajectories from one ground-truth path, addressing mode collapse in imitation learning for autonomous driving.
-
MSDformer: Multi-scale Discrete Transformer For Time Series Generation
MSDformer introduces a multi-scale discrete transformer that tokenizes time series at multiple scales and models them autoregressively in discrete space, claiming superior performance over prior DTM methods with rate-distortion theoretical support.
-
Conditional Flow-VAE for Safety-Critical Traffic Scenario Generation
A conditional flow matching model generates realistic safety-critical traffic scenarios by turning nominal scenes into dangerous rollouts using combined simulation and real data.
-
From Prompts to Pavement: LMMs-based Agentic Behavior-Tree Generation Framework for Autonomous Vehicles
An agentic LLM/LVM framework generates adaptive behavior trees on-the-fly for AV navigation in CARLA+Nav2 simulation, succeeding in obstacle avoidance where static BTs fail.
-
A Comprehensive Survey on Network Traffic Synthesis: From Statistical Models to Deep Learning
A survey reviewing statistical and deep learning approaches to synthetic network traffic generation, with comparisons, an AI comparison tool, open challenges, and future directions.
-
Sustainable Code Generation Using Large Language Models: A Systematic Literature Review
A systematic review finds research on the sustainability of LLM-generated code to be limited, fragmented, and without accepted frameworks for measurement or benchmarking.
-
Redefining End-of-Life: Intelligent Automation for Electronics Remanufacturing Systems
A literature review of intelligent automation approaches using robotics, AI, and control for disassembly, inspection, sorting, and reprocessing of end-of-life electronics.
- Adaptive Head Budgeting for Efficient Multi-Head Attention