pith. sign in

arxiv: 1509.04874 · v3 · pith:4KTIBGRZnew · submitted 2015-09-16 · 💻 cs.CV

DenseBox: Unifying Landmark Localization with End to End Object Detection

classification 💻 cs.CV
keywords detectiondenseboxobjectlandmarklocalizationobjectssingleaccurately
0
0 comments X
read the original abstract

How can a single fully convolutional neural network (FCN) perform on object detection? We introduce DenseBox, a unified end-to-end FCN framework that directly predicts bounding boxes and object class confidences through all locations and scales of an image. Our contribution is two-fold. First, we show that a single FCN, if designed and optimized carefully, can detect multiple different objects extremely accurately and efficiently. Second, we show that when incorporating with landmark localization during multi-task learning, DenseBox further improves object detection accuray. We present experimental results on public benchmark datasets including MALF face detection and KITTI car detection, that indicate our DenseBox is the state-of-the-art system for detecting challenging objects such as faces and cars.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Object Detection in Video with Spatial-temporal Context Aggregation

    cs.CV 2019-07 unverdicted novelty 6.0

    Proposal-level spatio-temporal context aggregation for video object detection achieves 80.3% mAP on ImageNet VID, improving Faster R-CNN baseline by 5.8%.

  2. Adapted Center and Scale Prediction: More Stable and More Accurate

    cs.CV 2020-02 unverdicted novelty 4.0

    Adaptations to CSP including compressing width prediction achieve 9.3% MR on CityPersons reasonable set, showing anchor-free one-stage detectors can reach high accuracy.

  3. Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

    cs.MM 2024-10 unverdicted novelty 3.0

    Survey proposing a taxonomy for document parsing into pipeline-based systems and VLM-driven unified models, reviewing components, metrics, benchmarks, and challenges.