ABMamba uses Mamba-based linear-complexity processing plus a novel Aligned Hierarchical Bidirectional Scan to deliver competitive video captioning on VATEX and MSR-VTT at roughly 3x higher throughput than typical Transformer MLLMs.
arXiv preprint arXiv:2405.04404 (2024)
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
StampFormer fuses geometry and material properties in a Swin-UNet backbone with custom modules to predict stamping FEA fields at <8.5% relative error in under one second.
The paper consolidates existing research on Mamba models, their architecture variants, adaptations to different data modalities, and applications across domains.
citing papers explorer
-
ABMAMBA: Multimodal Large Language Model with Aligned Hierarchical Bidirectional Scan for Efficient Video Captioning
ABMamba uses Mamba-based linear-complexity processing plus a novel Aligned Hierarchical Bidirectional Scan to deliver competitive video captioning on VATEX and MSR-VTT at roughly 3x higher throughput than typical Transformer MLLMs.
-
StampFormer: A Physics-Guided Material-Geometry-Coupled Multimodal Model for Rapid Prediction of Physical Fields in Sheet Metal Stamping
StampFormer fuses geometry and material properties in a Swin-UNet backbone with custom modules to predict stamping FEA fields at <8.5% relative error in under one second.
-
A Survey of Mamba
The paper consolidates existing research on Mamba models, their architecture variants, adaptations to different data modalities, and applications across domains.