EdgeDAM: Real-time Object Tracking for Mobile Devices
Pith reviewed 2026-05-15 16:16 UTC · model grok-4.3
The pith
EdgeDAM uses a dual-buffer memory system and confidence-based switching to enable real-time bounding-box tracking on mobile devices while resisting distractors and occlusions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EdgeDAM reformulates distractor-aware memory for bounding-box tracking by combining a Recent-Aware Memory that preserves temporally consistent target hypotheses with a Distractor-Resolving Memory that stores hard negatives and penalizes their re-selection, paired with Confidence-Driven Switching that activates detection or memory-guided recovery only when reliability criteria are met and a held-box mechanism that freezes and expands the estimate to block distractor intrusion, delivering improved robustness under occlusion and fast motion at real-time speeds on mobile devices.
What carries the argument
Dual-Buffer Distractor-Aware Memory (DAM) that separates recent target hypotheses from stored hard negatives, together with Confidence-Driven Switching and Held-Box Stabilization that adaptively triggers re-identification while freezing the estimate during low-confidence periods.
If this is right
- Tracking remains accurate on benchmarks that emphasize distractors and temporary occlusion.
- The system sustains 25 frames per second on an iPhone 15 while using only bounding-box output.
- No mask prediction or attention-driven memory updates are required, lowering overhead relative to segmentation trackers.
- Detection is invoked only when confidence criteria indicate unreliability, preserving throughput under normal conditions.
Where Pith is reading between the lines
- The held-box stabilization could be tested on other intermittent-visibility tasks such as hand tracking in AR.
- Scaling the two memory buffers might support short-term multi-object scenarios on the same edge hardware.
- The separation of recent and negative memories offers a template for lightweight memory modules in other real-time vision pipelines.
Load-bearing premise
The dual-buffer memory and switching rules will increase robustness to occlusions and distractors without exceeding the strict compute and memory limits of typical mobile hardware.
What would settle it
Independent reproduction on the DiDi dataset that measures accuracy below 80 percent or frame rate below 20 FPS on an iPhone 15 would falsify the performance claims.
read the original abstract
Single-object tracking (SOT) on edge devices is a critical computer vision task, requiring accurate and continuous target localization across video frames under occlusion, distractor interference, and fast motion. However, recent state-of-the-art distractor-aware memory mechanisms are largely built on segmentation-based trackers and rely on mask prediction and attention-driven memory updates, which introduce substantial computational overhead and limit real-time deployment on resource-constrained hardware; meanwhile, lightweight trackers sustain high throughput but are prone to drift when visually similar distractors appear. To address these challenges, we propose EdgeDAM, a lightweight detection-guided tracking framework that reformulates distractor-aware memory for bounding-box tracking under strict edge constraints. EdgeDAM introduces two key strategies: (1) Dual-Buffer Distractor-Aware Memory (DAM), which integrates a Recent-Aware Memory to preserve temporally consistent target hypotheses and a Distractor-Resolving Memory to explicitly store hard negative candidates and penalize their re-selection during recovery; and (2) Confidence-Driven Switching with Held-Box Stabilization, where tracker reliability and temporal consistency criteria adaptively activate detection and memory-guided re-identification during occlusion, while a held-box mechanism temporarily freezes and expands the estimate to suppress distractor contamination. Extensive experiments on five benchmarks, including the distractor-focused DiDi dataset, demonstrate improved robustness under occlusion and fast motion while maintaining real-time performance on mobile devices, achieving 88.2% accuracy on DiDi and 25 FPS on an iPhone 15. Code will be released.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes EdgeDAM, a lightweight detection-guided single-object tracking framework for edge devices. It introduces Dual-Buffer Distractor-Aware Memory (DAM) combining a Recent-Aware Memory to maintain temporally consistent target hypotheses and a Distractor-Resolving Memory to store and penalize hard negative candidates, along with Confidence-Driven Switching and Held-Box Stabilization to adaptively handle occlusion and distractors while preserving real-time performance. The abstract reports 88.2% accuracy on the DiDi dataset and 25 FPS on an iPhone 15 across five benchmarks, with code to be released.
Significance. If the reported results and robustness claims are substantiated, the work could meaningfully advance practical single-object tracking on resource-constrained mobile hardware by adapting distractor-aware memory techniques to bounding-box outputs without the overhead of segmentation-based methods. The dual-buffer design and stabilization mechanism address a clear gap between heavy accurate trackers and fast but drift-prone alternatives.
major comments (2)
- [Abstract] Abstract: The central claims of 88.2% accuracy on DiDi and 25 FPS on iPhone 15, plus improved robustness under occlusion and fast motion, are presented without any experimental details, baselines, ablations, error bars, or failure-case analysis. This directly undermines evaluation of whether the Dual-Buffer DAM and switching strategy deliver the stated gains within edge constraints.
- [Abstract] Abstract: The Dual-Buffer Distractor-Aware Memory is described only at a conceptual level (Recent-Aware and Distractor-Resolving components, penalization of hard negatives); no implementation specifics such as buffer sizes, update rules, or computational overhead are provided, which is load-bearing for the real-time edge-deployment claim.
minor comments (1)
- [Abstract] Abstract: The manuscript states experiments on five benchmarks but reports quantitative results only for DiDi; a brief summary of performance on the remaining benchmarks would strengthen the abstract.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract. We agree that the current version lacks sufficient supporting details for the performance claims and the Dual-Buffer DAM description. We will revise the abstract to address these points while preserving its brevity and focus.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claims of 88.2% accuracy on DiDi and 25 FPS on iPhone 15, plus improved robustness under occlusion and fast motion, are presented without any experimental details, baselines, ablations, error bars, or failure-case analysis. This directly undermines evaluation of whether the Dual-Buffer DAM and switching strategy deliver the stated gains within edge constraints.
Authors: We accept this observation. The revised abstract will include a concise reference to the experimental protocol, noting comparisons to representative baselines across the five benchmarks and highlighting ablation results that quantify the contribution of the dual-buffer memory and switching strategy to robustness under occlusion and fast motion. Detailed tables with error bars, full baseline comparisons, and failure-case analysis remain in the experimental section due to abstract length limits, but we will add a supporting clause to indicate that the reported metrics are backed by these evaluations. revision: yes
-
Referee: [Abstract] Abstract: The Dual-Buffer Distractor-Aware Memory is described only at a conceptual level (Recent-Aware and Distractor-Resolving components, penalization of hard negatives); no implementation specifics such as buffer sizes, update rules, or computational overhead are provided, which is load-bearing for the real-time edge-deployment claim.
Authors: We agree that the abstract's high-level description is insufficient to substantiate the real-time claim. In the revision we will add brief implementation specifics drawn from the method section, such as buffer capacities, the update and penalization rules, and the measured per-frame overhead on the target mobile platform. This will be done concisely to fit within abstract constraints while directly addressing the load-bearing aspects of the edge-deployment argument. revision: yes
Circularity Check
No significant circularity in abstract-only description
full rationale
The abstract presents EdgeDAM as a new lightweight detection-guided tracking framework that reformulates distractor-aware memory concepts for bounding-box tracking under edge constraints. It introduces Dual-Buffer DAM (Recent-Aware Memory plus Distractor-Resolving Memory) and Confidence-Driven Switching with Held-Box Stabilization, then reports empirical results on external benchmarks (88.2% on DiDi, 25 FPS on iPhone 15). No equations, parameter fits, self-citations, uniqueness theorems, or ansatzes are provided that could reduce any claimed prediction or derivation to the inputs by construction. The central claims rest on external evaluation rather than internal self-reference, making the description self-contained against the available text.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.