Proposal-level spatio-temporal context aggregation for video object detection achieves 80.3% mAP on ImageNet VID, improving Faster R-CNN baseline by 5.8%.
DenseBox: Unifying Landmark Localization with End to End Object Detection
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
How can a single fully convolutional neural network (FCN) perform on object detection? We introduce DenseBox, a unified end-to-end FCN framework that directly predicts bounding boxes and object class confidences through all locations and scales of an image. Our contribution is two-fold. First, we show that a single FCN, if designed and optimized carefully, can detect multiple different objects extremely accurately and efficiently. Second, we show that when incorporating with landmark localization during multi-task learning, DenseBox further improves object detection accuray. We present experimental results on public benchmark datasets including MALF face detection and KITTI car detection, that indicate our DenseBox is the state-of-the-art system for detecting challenging objects such as faces and cars.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Adaptations to CSP including compressing width prediction achieve 9.3% MR on CityPersons reasonable set, showing anchor-free one-stage detectors can reach high accuracy.
Survey proposing a taxonomy for document parsing into pipeline-based systems and VLM-driven unified models, reviewing components, metrics, benchmarks, and challenges.
citing papers explorer
-
Object Detection in Video with Spatial-temporal Context Aggregation
Proposal-level spatio-temporal context aggregation for video object detection achieves 80.3% mAP on ImageNet VID, improving Faster R-CNN baseline by 5.8%.
-
Adapted Center and Scale Prediction: More Stable and More Accurate
Adaptations to CSP including compressing width prediction achieve 9.3% MR on CityPersons reasonable set, showing anchor-free one-stage detectors can reach high accuracy.
-
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
Survey proposing a taxonomy for document parsing into pipeline-based systems and VLM-driven unified models, reviewing components, metrics, benchmarks, and challenges.