DenseBox: Unifying Landmark Localization with End to End Object Detection

Lichao Huang; Yafeng Deng; Yinan Yu; Yi Yang

arxiv: 1509.04874 · v3 · pith:4KTIBGRZnew · submitted 2015-09-16 · 💻 cs.CV

DenseBox: Unifying Landmark Localization with End to End Object Detection

Lichao Huang , Yi Yang , Yafeng Deng , Yinan Yu This is my paper

classification 💻 cs.CV

keywords detectiondenseboxobjectlandmarklocalizationobjectssingleaccurately

0 comments

read the original abstract

How can a single fully convolutional neural network (FCN) perform on object detection? We introduce DenseBox, a unified end-to-end FCN framework that directly predicts bounding boxes and object class confidences through all locations and scales of an image. Our contribution is two-fold. First, we show that a single FCN, if designed and optimized carefully, can detect multiple different objects extremely accurately and efficiently. Second, we show that when incorporating with landmark localization during multi-task learning, DenseBox further improves object detection accuray. We present experimental results on public benchmark datasets including MALF face detection and KITTI car detection, that indicate our DenseBox is the state-of-the-art system for detecting challenging objects such as faces and cars.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Object Detection in Video with Spatial-temporal Context Aggregation
cs.CV 2019-07 unverdicted novelty 6.0

Proposal-level spatio-temporal context aggregation for video object detection achieves 80.3% mAP on ImageNet VID, improving Faster R-CNN baseline by 5.8%.
Adapted Center and Scale Prediction: More Stable and More Accurate
cs.CV 2020-02 unverdicted novelty 4.0

Adaptations to CSP including compressing width prediction achieve 9.3% MR on CityPersons reasonable set, showing anchor-free one-stage detectors can reach high accuracy.
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
cs.MM 2024-10 unverdicted novelty 3.0

Survey proposing a taxonomy for document parsing into pipeline-based systems and VLM-driven unified models, reviewing components, metrics, benchmarks, and challenges.