arxiv: 2304.02643 · v1 · submitted 2023-04-05 · 💻 cs.CV · cs.AI· cs.LG

Recognition: 3 theorem links

Segment Anything

Alexander C. Berg, Alexander Kirillov, Chloe Rolland, Eric Mintun, Hanzi Mao, Laura Gustafson, Nikhila Ravi, Piotr Doll\'ar, Ross Girshick, Spencer Whitehead, Tete Xiao, Wan-Yen Lo

Authors on Pith no claims yet

Pith reviewed 2026-05-11 06:08 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords image segmentationzero-shot transferpromptable modelslarge-scale datasetfoundation modelscomputer visionSAMSA-1B

0 comments

The pith

A promptable model trained on a billion masks enables zero-shot segmentation that often matches supervised results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a new task, model, and dataset for image segmentation. Using an efficient version of the model to collect data in a loop, the authors assembled the largest segmentation dataset to date, containing over one billion masks across eleven million images. The resulting model is built to accept prompts such as points or boxes, allowing it to generalize zero-shot to new image distributions and tasks without retraining. Evaluations across many tasks show that this zero-shot performance is often competitive with or better than earlier models trained with full supervision for each specific task. The work releases both the model and dataset to support further research on foundation models for vision.

Core claim

What carries the argument

The promptable Segment Anything Model (SAM) that takes user-provided prompts such as points, boxes, or coarse masks and outputs object segmentation masks.

If this is right

The model can be applied directly to new image types and tasks without collecting new labeled data or retraining.
Prompt-based interaction becomes a practical way to guide segmentation on arbitrary images.
The released dataset supports training or fine-tuning of additional vision models at large scale.
Releasing both the model and data lowers the barrier for researchers to experiment with promptable segmentation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The self-supervised data collection loop may offer a template for building large datasets in other vision domains where annotation is expensive.
Promptable architectures could extend beyond segmentation to tasks like detection or editing with similar zero-shot benefits.
Interactive tools built on this model might reduce the need for per-task model training in applied settings such as medical imaging or content creation.
Performance on video sequences or 3D data would test whether the promptable property holds across temporal and spatial dimensions.

Load-bearing premise

That the zero-shot results reflect genuine generalization to new distributions rather than overfitting to the self-collected data or the evaluation tasks.

What would settle it

A controlled test on a fresh image domain and segmentation task where the model's zero-shot accuracy falls clearly below a model trained with full supervision on that same task.

read the original abstract

We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at https://segment-anything.com to foster research into foundation models for computer vision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SAM scales promptable segmentation to a billion masks via a model-assisted data loop and claims competitive zero-shot results, but the self-referential collection process is the key thing to watch.

read the letter

The main things to know are the promptable Segment Anything Model and the SA-1B dataset of over a billion masks collected through a staged data engine that starts with humans and shifts to model-driven automation. The model takes points, boxes, or coarse masks as input and produces object masks, with the goal of working zero-shot on new images and tasks without retraining. The public release of both the model and the full dataset is the practical advance that matters most for the field. Earlier interactive segmentation papers existed at much smaller scale, so the combination of prompt flexibility and this dataset size is what is new. The data engine description is clear and the efficiency gains from moving to automatic stages are well explained. Releasing the weights and the 11M images removes a major barrier for follow-up work. The soft spot is the circularity the stress-test flags. Because later collection stages rely on the model itself to generate and refine masks, the training distribution is shaped by the model's own priors. This makes it harder to claim that strong benchmark results prove broad generalization rather than alignment with the collection process. The early human stages reduce the problem but do not eliminate it, and any overlap between evaluation images or prompt styles and the data engine would weaken the zero-shot story. The abstract gives no numbers, so the actual strength of the competitiveness claim depends on the tables and ablations in the full paper. This work is aimed at computer vision researchers building or adapting segmentation tools and at groups studying foundation models. A reader who needs a strong starting point for prompt-based tasks or who wants to experiment with the released data will get immediate value. It deserves peer review because the scale and the public artifacts are substantial even if the generalization evidence needs closer examination by referees.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Segment Anything (SA) project, comprising a new promptable segmentation task, the Segment Anything Model (SAM), and the SA-1B dataset of over 1 billion masks on 11 million images. The dataset is constructed via a multi-stage data engine that uses an efficient version of the model itself to propose, refine, and automatically generate masks. The central claim is that the resulting promptable model transfers zero-shot to new image distributions and tasks, with performance that is often competitive with or superior to prior fully supervised methods; the model and dataset are released publicly.

Significance. If the zero-shot generalization results are robust, this work would mark a notable advance toward foundation models in computer vision by demonstrating a single promptable model that can handle diverse segmentation tasks without task-specific training or fine-tuning. The unprecedented scale of SA-1B and the open release of both model and data constitute clear strengths that could enable substantial follow-on research.

major comments (2)

[Data Engine section] Data Engine section: The staged collection process (particularly stages 2 and 3) invokes the model to generate and refine masks, so the training distribution is shaped by the model's own inductive biases. This creates a circularity risk that could undermine the zero-shot claim; the manuscript must supply a concrete analysis (e.g., ablation on held-out image sources or comparison of mask statistics before/after the automatic stages) showing that reported gains on external benchmarks are not artifacts of distribution alignment.
[Experiments section] Experiments section: The claim that zero-shot performance is 'often competitive with or even superior to prior fully supervised results' is load-bearing, yet the manuscript provides limited detail on exact metrics, chosen baselines, error bars, and statistical tests across the evaluated tasks. Full per-task tables with variance estimates and explicit comparison protocols are required to substantiate the central performance assertion.

minor comments (2)

[Abstract] Abstract: The phrase 'numerous tasks' is vague; a short parenthetical list of the primary evaluation benchmarks would improve immediate clarity.
[Model section] Notation: The distinction between the 'efficient' model used in the data engine and the final SAM should be introduced with explicit symbols or subsection headings to avoid reader confusion in later sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the data engine and experimental reporting. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of our results.

read point-by-point responses

Referee: [Data Engine section] Data Engine section: The staged collection process (particularly stages 2 and 3) invokes the model to generate and refine masks, so the training distribution is shaped by the model's own inductive biases. This creates a circularity risk that could undermine the zero-shot claim; the manuscript must supply a concrete analysis (e.g., ablation on held-out image sources or comparison of mask statistics before/after the automatic stages) showing that reported gains on external benchmarks are not artifacts of distribution alignment.

Authors: We acknowledge the potential circularity concern arising from the model's role in stages 2 and 3 of the data engine. However, the zero-shot evaluations use entirely external benchmarks and image distributions that were never seen during data collection or training. To directly address this, we will add a new analysis subsection in the revised manuscript that includes: (1) mask statistic comparisons (e.g., size, complexity, and diversity metrics) before and after the automatic stages, and (2) performance ablations on held-out image sources excluded from the data engine. These additions will demonstrate that the reported zero-shot gains on external tasks are not artifacts of distribution alignment. revision: yes
Referee: [Experiments section] Experiments section: The claim that zero-shot performance is 'often competitive with or even superior to prior fully supervised results' is load-bearing, yet the manuscript provides limited detail on exact metrics, chosen baselines, error bars, and statistical tests across the evaluated tasks. Full per-task tables with variance estimates and explicit comparison protocols are required to substantiate the central performance assertion.

Authors: We agree that the central performance claim requires more granular reporting for full substantiation. In the revised manuscript, we will expand the Experiments section with complete per-task tables that include exact metrics for all evaluated tasks, the specific baselines used, error bars or variance estimates (from multiple seeds or cross-validation where feasible), and results of statistical significance tests. We will also add explicit descriptions of the comparison protocols, including how prompts were generated and how zero-shot transfer was measured against fully supervised methods. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical zero-shot evaluations remain independent of data engine

full rationale

The paper's core claim is empirical: a promptable model trained on SA-1B achieves competitive zero-shot results on external tasks and image distributions. The data engine (staged collection using an efficient model variant) is a practical annotation procedure whose outputs are then used for supervised training; the reported evaluations use separate benchmarks whose ground truth and image sources are not generated by the same loop. No equation, uniqueness theorem, or prediction reduces by construction to a fitted parameter or self-generated input. Self-citations are absent from the load-bearing steps, and the architecture's promptability is justified by design and training rather than by renaming prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract provides no explicit free parameters, axioms, or invented entities beyond standard deep-learning training assumptions and the effectiveness of the promptable design.

axioms (1)

domain assumption Standard deep learning assumptions on generalization from large-scale training data
The zero-shot transfer claim rests on the model learning general segmentation capabilities from the collected data.

pith-pipeline@v0.9.0 · 5466 in / 1071 out tokens · 36264 ms · 2026-05-11T06:08:52.324361+00:00 · methodology

discussion (0)

Forward citations

Cited by 58 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting
cs.CV 2026-05 conditional novelty 8.0

Current CAC models often count the wrong objects because they misalign text prompts with visual content, as demonstrated by new negative-label and distractor tests on the MUCCA dataset.
MedCore: Boundary-Preserving Medical Core Pruning for MedSAM
cs.CV 2026-05 unverdicted novelty 7.0

MedCore achieves 60% parameter and 58.4% FLOP reduction on MedSAM with Dice 0.9549 and preserved boundary metrics via dual-intervention pruning and a new boundary leverage principle.
Qwen3-VL-Seg: Unlocking Open-World Referring Segmentation with Vision-Language Grounding
cs.CV 2026-05 unverdicted novelty 7.0

Qwen3-VL-Seg decodes MLLM bounding boxes into pixel-level referring segmentation via a lightweight box-guided mask decoder, new SA1B-ORS training data, and ORS-Bench evaluation, showing strong open-world performance.
OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation
cs.RO 2026-05 unverdicted novelty 7.0

OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.
Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures
cs.CV 2026-05 unverdicted novelty 7.0

HeadsUp maps multi-view captures to UV-parameterized 3D Gaussians on a template via an encoder-decoder, achieving state-of-the-art quality and generalization after training on more than 10,000 subjects.
Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting
cs.CV 2026-05 unverdicted novelty 7.0

Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.
IMPACT-CYCLE: A Contract-Based Multi-Agent System for Claim-Level Supervisory Correction of Long-Video Semantic Memory
cs.CV 2026-04 unverdicted novelty 7.0

A contract-based multi-agent system maintains a claim-level semantic memory for long videos, enabling targeted corrections that raise VQA accuracy from 0.71 to 0.79 and cut human arbitration cost by 4.8x on VidOR.
A 3D SAM-Based Progressive Prompting Framework for Multi-Task Segmentation of Radiotherapy-induced Normal Tissue Injuries in Limited-Data Settings
cs.CV 2026-04 unverdicted novelty 7.0

A progressive prompting framework on 3D SAM with text, dose-box, and click prompts plus small-target loss achieves reliable multi-task segmentation of osteoradionecrosis, cerebral edema, and cerebral radiation necrosi...
Seg2Change: Adapting Open-Vocabulary Semantic Segmentation Model for Remote Sensing Change Detection
cs.CV 2026-04 conditional novelty 7.0

Seg2Change adapts open-vocabulary segmentation models to open-vocabulary change detection via a category-agnostic change head and new dataset CA-CDD, delivering +9.52 IoU on WHU-CD and +5.50 mIoU on SECOND.
Off-the-shelf Vision Models Benefit Image Manipulation Localization
cs.CV 2026-04 unverdicted novelty 7.0

ReVi adapter enables off-the-shelf vision models to localize image manipulations by separating and enhancing manipulation cues from semantic features without full model retraining.
Training a Student Expert via Semi-Supervised Foundation Model Distillation
cs.CV 2026-04 conditional novelty 7.0

A semi-supervised framework distills vision foundation models into compact instance segmentation experts that outperform their teachers by up to 11.9 AP on Cityscapes and 8.6 AP on ADE20K while being 11 times smaller.
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
cs.CL 2023-07 unverdicted novelty 7.0

SEED-Bench is a new benchmark of 19K multiple-choice questions for evaluating generative comprehension in multimodal LLMs across 12 image and video dimensions.
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
cs.RO 2023-07 unverdicted novelty 7.0

VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
cs.CV 2023-03 conditional novelty 7.0

LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
ASIP-Planner: Adaptive Planning for UAV Surface Inspection in Partially Known Indoor Environments
cs.RO 2026-05 unverdicted novelty 6.0

ASIP-Planner achieves near-complete surface coverage and shorter trajectories in partially known indoor environments by clustering inspection targets globally and adapting viewing angles locally to handle occlusions.
HeteroGenManip: Generalizable Manipulation For Heterogeneous Object Interactions
cs.RO 2026-05 unverdicted novelty 6.0

HeteroGenManip decouples grasp localization from interaction planning using task-conditioned foundation models and multi-model diffusion policies, delivering 31% average gains in broad simulation tasks and 36.7% in fo...
HeteroGenManip: Generalizable Manipulation For Heterogeneous Object Interactions
cs.RO 2026-05 unverdicted novelty 6.0

A task-conditioned two-stage system decouples grasp localization from interaction trajectory planning using specialized foundation models to improve generalization across heterogeneous object types.
Few-Click-Driven Interactive 3D Segmentation with Semantic Embedding
cs.CV 2026-05 unverdicted novelty 6.0

A point-Transformer interactive 3D instance segmentation model handles multiple clicks jointly in one pass and reports over 20% mIoU gains versus baselines plus 8-10% cross-dataset improvement for one-click-per-instan...
Leveraging Image Generators to Address Training Data Scarcity: The Gen4Regen Dataset for Forest Regeneration Mapping
cs.CV 2026-05 conditional novelty 6.0

Mixing real UAV imagery with 2101 AI-generated image-mask pairs improves semantic segmentation F1 scores for fine-grained forest species by over 15 percentage points overall and up to 30 points for rare classes.
YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts
cs.LG 2026-05 unverdicted novelty 6.0

YOTOnet achieves improved zero-shot cross-domain fault diagnosis on bearing datasets by combining a physics-aware invariant feature distiller with domain-conditioned sparse experts, showing performance scaling as more...
Approaching human parity in the quality of automated organoid image segmentation
cs.CV 2026-05 conditional novelty 6.0

A composite SAM-based method segments organoid images with accuracy matching or approaching inter-observer variability among human annotators.
Learning Equivariant Neural-Augmented Object Dynamics From Few Interactions
cs.RO 2026-05 unverdicted novelty 6.0

PIEGraph augments a spring-mass particle model with an equivariant GNN and novel action representation to predict accurate object dynamics for robotic manipulation from few interactions.
GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning
cs.RO 2026-04 unverdicted novelty 6.0

GS-Playground delivers a high-throughput photorealistic simulator for vision-informed robot learning via parallel physics integrated with batch 3D Gaussian Splatting at 10^4 FPS and an automated Real2Sim workflow for ...
DiffuSAM: Diffusion-Based Prompt-Free SAM2 for Few-Shot and Source-Free Medical Image Segmentation
cs.CV 2026-04 unverdicted novelty 6.0

DiffuSAM synthesizes SAM2-compatible mask embeddings via a diffusion prior conditioned on prior slices to enable accurate prompt-free medical image segmentation under SF-UDA and few-shot settings.
AgentLens: Adaptive Visual Modalities for Human-Agent Interaction in Mobile GUI Agents
cs.HC 2026-04 unverdicted novelty 6.0

AgentLens adaptively deploys Full UI, Partial UI, and GenUI modalities with virtual display overlays for mobile GUI agents, yielding 85.7% user preference and best-in-study usability in a 21-participant evaluation.
SpaceDex: Generalizable Dexterous Grasping in Tiered Workspaces
cs.RO 2026-04 unverdicted novelty 6.0

SpaceDex achieves 63% success grasping unseen objects in tiered workspaces via VLM spatial planning and arm-hand feature separation, beating a 39% tabletop baseline in 100 real trials.
Chain Of Interaction Benchmark (COIN): When Reasoning meets Embodied Interaction
cs.RO 2026-04 unverdicted novelty 6.0

COIN provides 50 interactive robotic tasks, a 1000-demonstration dataset collected via AR teleoperation, and metrics showing that CodeAsPolicy, VLA, and H-VLA models fail at causally-dependent interactive reasoning du...
One-Shot Cross-Geometry Skill Transfer through Part Decomposition
cs.RO 2026-04 unverdicted novelty 6.0

Part decomposition with generative shape models allows one-shot robot skill transfer across unfamiliar object geometries in simulation and real settings.
From Boundaries to Semantics: Prompt-Guided Multi-Task Learning for Petrographic Thin-section Segmentation
cs.CV 2026-04 unverdicted novelty 6.0

Petro-SAM adapts SAM via a Merge Block for polarized views plus multi-scale fusion and color-entropy priors to jointly achieve grain-edge and lithology segmentation in petrographic images.
Granularity-Aware Transfer for Tree Instance Segmentation in Synthetic and Real Forests
cs.CV 2026-04 unverdicted novelty 6.0

Granularity-aware distillation improves tree instance segmentation accuracy on real forest images by merging logits and unifying masks from fine-grained synthetic teachers despite coarse real labels.
GTPBD-MM: A Global Terraced Parcel and Boundary Dataset with Multi-Modality
cs.CV 2026-04 unverdicted novelty 6.0

GTPBD-MM is the first multimodal benchmark for global terraced parcel extraction, integrating image, text, and DEM data with experiments showing that textual and terrain cues improve delineation accuracy over image-on...
Self-supervised Pretraining of Cell Segmentation Models
cs.CV 2026-04 unverdicted novelty 6.0

DINOCell achieves a SEG score of 0.784 on LIVECell by self-supervised domain adaptation of DINOv2, improving 10.42% over SAM-based models and showing strong zero-shot transfer.
Text-Guided 6D Object Pose Rearrangement via Closed-Loop VLM Agents
cs.CV 2026-04 unverdicted novelty 6.0

Closed-loop VLM agents using multi-view reasoning, object-centered visualization, and single-axis rotation prediction achieve superior text-guided 6D pose rearrangement for target objects in scenes.
GESS: Multi-cue Guided Local Feature Learning via Geometric and Semantic Synergy
cs.CV 2026-04 unverdicted novelty 6.0

GESS introduces joint semantic-normal and depth stability prediction heads, the SDAK keypoint mechanism, and the UTCF descriptor fusion module to leverage multi-cue synergy for improved robustness and discriminability.
Moondream Segmentation: From Words to Masks
cs.CV 2026-04 unverdicted novelty 6.0

Moondream Segmentation achieves 80.2% cIoU on RefCOCO by autoregressively decoding paths from referring expressions and using RL to refine masks, plus releases a cleaned RefCOCO-M dataset.
DeepSeek-OCR: Contexts Optical Compression
cs.CV 2025-10 unverdicted novelty 6.0

DeepSeek-OCR compresses text contexts up to 20x via 2D optical mapping while achieving 97% OCR accuracy below 10x and 60% at 20x, outperforming prior OCR tools with fewer vision tokens.
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
cs.RO 2025-10 unverdicted novelty 6.0

InternVLA-M1 uses spatially guided pre-training on 2.3M examples followed by action post-training to deliver up to 17% gains on robot manipulation benchmarks and 20.6% on unseen objects.
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
cs.RO 2024-03 accept novelty 6.0

DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
cs.CV 2024-01 unverdicted novelty 6.0

Grounded SAM integrates Grounding DINO and SAM to support text-prompted open-world detection and segmentation, achieving 48.7 mean AP on SegInW zero-shot with the base detector and huge segmenter.
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
cs.CV 2023-11 conditional novelty 6.0

A new 1.2M-caption dataset generated via GPT-4V improves LMMs on MME and MMBench by 222.8/22.0/22.3 and 2.7/1.3/1.5 points respectively when used for supervised fine-tuning.
TD-MPC2: Scalable, Robust World Models for Continuous Control
cs.LG 2023-10 conditional novelty 6.0

TD-MPC2 scales an implicit world-model RL method to a 317M-parameter agent that masters 80 tasks across four domains with a single hyperparameter configuration.
CrackMorph-XAI-Net: A Topology-Preserving and Explainable Framework for Automated Crack Morphology
math.GM 2026-05 unverdicted novelty 5.0

CrackMorph-XAI-Net extracts crack skeletons with Dice 0.991 and topology preservation in 98.5% of cases, detects junctions with F1 0.887, and computes morphology descriptors with correlations above 0.95 on an extended...
Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures
cs.CV 2026-05 unverdicted novelty 5.0

Pith review generated a malformed one-line summary.
FUS3DMaps: Scalable and Accurate Open-Vocabulary Semantic Mapping by 3D Fusion of Voxel- and Instance-Level Layers
cs.RO 2026-05 unverdicted novelty 5.0

FUS3DMaps fuses voxel- and instance-level open-vocabulary layers inside a shared 3D voxel map to improve both layers and enable scalable accurate semantic mapping.
CoAX: Cognitive-Oriented Attribution eXplanation User Model of Human Understanding of AI Explanations
cs.AI 2026-04 unverdicted novelty 5.0

Cognitive models of user reasoning strategies with XAI methods on tabular data fit human forward-simulation decisions better than ML baselines and support hypothesis testing without new user studies.
Semantic Foam: Unifying Spatial and Semantic Scene Decomposition
cs.CV 2026-04 unverdicted novelty 5.0

Semantic Foam unifies spatial Voronoi decomposition with cell-level semantic features to achieve superior object segmentation by enabling direct spatial regularization that avoids occlusion and view-inconsistency artifacts.
DiffuSAM: Diffusion Guided Zero-Shot Object Grounding for Remote Sensing Imagery
cs.CV 2026-04 unverdicted novelty 5.0

DiffuSAM fuses diffusion-based localization cues with SAM models to deliver over 14% higher Acc@0.5 in zero-shot object grounding for remote sensing imagery compared to prior methods.
SGP-SAM: Self-Gated Prompting for Transferring 3D Segment Anything Models to Lesion Segmentation
cs.CV 2026-04 unverdicted novelty 5.0

SGP-SAM transfers 3D SAM to lesion segmentation using a self-gated module for conditional multi-scale enhancement and a Zoom Loss, achieving 7.3% mDice gain over fine-tuning on MSD Liver Tumor data.
STEP-Parts: Geometric Partitioning of Boundary Representations for Large-Scale CAD Processing
cs.GR 2026-04 unverdicted novelty 5.0

STEP-Parts produces tessellation-robust geometric part labels from STEP B-Reps by deterministic merging of same-primitive faces, enabling consistent supervision on 180k+ models.
MyoVision: A Mobile Research Tool and NEATBoost-Attention Ensemble Framework for Real Time Chicken Breast Myopathy Detection
cs.LG 2026-04 unverdicted novelty 5.0

Smartphone transillumination imaging paired with a neuroevolution-tuned ensemble model classifies chicken breast myopathies at 82.4% accuracy on 336 fillets, matching costly hyperspectral systems.
Robotic Nanoparticle Synthesis via Solution-based Processes
cs.RO 2026-04 unverdicted novelty 5.0

Screw-based motion planning extracted from single demonstrations enables robots to autonomously execute long-horizon nanoparticle synthesis protocols.
FF3R: Feedforward Feature 3D Reconstruction from Unconstrained views
cs.CV 2026-04 unverdicted novelty 5.0

FF3R unifies geometric and semantic 3D reconstruction in a single annotation-free feed-forward network trained solely via RGB and feature rendering supervision.
F3G-Avatar : Face Focused Full-body Gaussian Avatar
cs.CV 2026-04 unverdicted novelty 5.0

F3G-Avatar improves full-body Gaussian avatars by adding a dedicated face-focused deformation branch to better preserve facial geometry and expressions from multi-view RGB video.
Edge Deep Learning in Computer Vision and Medical Diagnostics: A Comprehensive Survey
cs.CV 2026-05 unverdicted novelty 4.0

A comprehensive survey of edge deep learning in computer vision and medical diagnostics that presents a novel categorization of hardware platforms by performance and usage scenarios.
Label-Efficient School Detection from Aerial Imagery via Weakly Supervised Pretraining and Fine-Tuning
cs.CV 2026-05 unverdicted novelty 4.0

A two-stage weakly supervised pipeline pretrains on auto-generated school labels from sparse points and fine-tunes on only 50 manual examples to achieve strong detection performance in aerial imagery.
AMO-ENE: Attention-based Multi-Omics Fusion Model for Outcome Prediction in Extra Nodal Extension and HPV-associated Oropharyngeal Cancer
eess.IV 2026-04 unverdicted novelty 4.0

An attention-based fusion model combining semi-supervised CT segmentation, radiomics, and clinical features predicts metastatic recurrence, overall survival, and disease-free survival in HPV+ oropharyngeal cancer with...
DeepSeek-VL: Towards Real-World Vision-Language Understanding
cs.AI 2024-03 unverdicted novelty 4.0

DeepSeek-VL develops open-source 1.3B and 7B vision-language models that achieve competitive or state-of-the-art results on real-world visual-language benchmarks through diverse data curation, a hybrid vision encoder,...
Semantic-Fast-SAM: Efficient Semantic Segmenter
cs.CV 2026-04 unverdicted novelty 3.0

Semantic-Fast-SAM matches prior SAM-based semantic segmentation accuracy on Cityscapes and ADE20K while running about 20 times faster by combining FastSAM with SSA labeling and CLIP for open-vocabulary cases.

Reference graph

Works this paper leans on

196 extracted references · 196 canonical work pages · cited by 55 Pith papers · 9 internal anchors

[1]

On seeing stuff: the perception of materials by humans and machines

Edward H Adelson. On seeing stuff: the perception of materials by humans and machines. Human vision and electronic imaging VI ,

work page
[2]

What is an object? CVPR, 2010

Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. What is an object? CVPR, 2010. 4, 10

work page 2010
[3]

Contour detection and hierarchical image segmentation

Pablo Arbel ´aez, Michael Maire, Charless Fowlkes, and Jitendra Malik. Contour detection and hierarchical image segmentation. TPAMI, 2010. 4, 10, 21, 28

work page 2010
[4]

Layer Normalization

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv:1607.06450, 2016. 16

work page internal anchor Pith review Pith/arXiv arXiv 2016
[5]

BEiT: BERT Pre-Training of Image Transformers

Hangbo Bao, Li Dong, and Furu Wei. BEiT: BERT pre-training of image transformers. arXiv:2106.08254, 2021. 17

work page internal anchor Pith review arXiv 2021
[6]

ZeroWaste dataset: Towards deformable object segmentation in cluttered scenes

Dina Bashkirova, Mohamed Abdelfattah, Ziliang Zhu, James Akl, Fadi Alladkani, Ping Hu, Vitaly Ablavsky, Berk Calli, Sarah Adel Bargal, and Kate Saenko. ZeroWaste dataset: Towards deformable object segmentation in cluttered scenes. CVPR, 2022. 9, 20

work page 2022
[7]

Straehle, Bernhard X

Stuart Berg, Dominik Kutra, Thorben Kroeger, Christoph N. Straehle, Bernhard X. Kausler, Carsten Haubold, Martin Schiegg, Janez Ales, Thorsten Beier, Markus Rudy, Kemal Eren, Jaime I. Cervantes, Buote Xu, Fynn Beuttenmueller, Adrian Wolny, Chong Zhang, Ullrich Koethe, Fred A. Hamprecht, and Anna Kreshuk. ilastik: interactive machine learning for (bio)imag...

work page 2019
[8]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportu- nities and risks of foundation models. arXiv:2108.07258, 2021. 1, 12

work page internal anchor Pith review Pith/arXiv arXiv 2021
[9]

Iterative interaction training for segmentation editing networks

Gustav Bredell, Christine Tanner, and Ender Konukoglu. Iterative interaction training for segmentation editing networks. MICCAI,

work page
[10]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott G...

work page 2020
[11]

Cascade R-CNN: Delving into high quality object detection

Zhaowei Cai and Nuno Vasconcelos. Cascade R-CNN: Delving into high quality object detection. CVPR, 2018. 10

work page 2018
[12]

Caicedo, Allen Goodman, Kyle W

Juan C. Caicedo, Allen Goodman, Kyle W. Karhohs, Beth A. Ci- mini, Jeanelle Ackerman, Marzieh Haghighi, CherKeng Heng, Tim Becker, Minh Doan, Claire McQuin, Mohammad Rohban, Shan- tanu Singh, and Anne E. Carpenter. Nucleus segmentation across imaging experiments: the 2018 data science bowl. Nature Methods,

work page 2018
[13]

A computational approach to edge detection

John Canny. A computational approach to edge detection. TPAMI,

work page
[14]

End-to-end object detection with Transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with Transformers. ECCV, 2020. 5, 16, 17

work page 2020
[15]

Automatic image colorization via multimodal predictions

Guillaume Charpiat, Matthias Hofmann, and Bernhard Sch ¨olkopf. Automatic image colorization via multimodal predictions. ECCV,

work page
[16]

Object-proposal evaluation protocol is’ gameable’

Neelima Chavali, Harsh Agrawal, Aroma Mahendru, and Dhruv Batra. Object-proposal evaluation protocol is’ gameable’. CVPR,

work page
[17]

3D instance segmentation of MVS buildings

Jiazhou Chen, Yanghui Xu, Shufang Lu, Ronghua Liang, and Lian- gliang Nan. 3D instance segmentation of MVS buildings. IEEE Transactions on Geoscience and Remote Sensing, 2022. 9, 19, 20, 23, 24

work page 2022
[18]

FocalClick: towards practical interactive image segmentation

Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, and Hengshuang Zhao. FocalClick: towards practical interactive image segmentation. CVPR, 2022. 8, 9, 12, 19

work page 2022
[19]

Masked-attention mask transformer for universal image segmentation

Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kir- illov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. CVPR, 2022. 4

work page 2022
[20]

Per- pixel classiﬁcation is not all you need for semantic segmentation

Bowen Cheng, Alex Schwing, and Alexander Kirillov. Per- pixel classiﬁcation is not all you need for semantic segmentation. NeurIPS, 2021. 5, 16, 17

work page 2021
[21]

PaLM: Scaling Language Modeling with Pathways

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. PaLM: Scaling language modeling with pathways. arXiv:2204.02311, 2022. 1

work page internal anchor Pith review Pith/arXiv arXiv 2022
[22]

Domain adaptation for trafﬁc density estimation

Luca Ciampi, Carlos Santiago, Joao Costeira, Claudio Gennaro, and Giuseppe Amato. Domain adaptation for trafﬁc density estimation. International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2021. 9, 20

work page 2021
[23]

Luca Ciampi, Carlos Santiago, Joao Costeira, Claudio Gennaro, and Giuseppe Amato. Night and day instance segmented park (NDIS- Park) dataset: a collection of images taken by day and by night for vehicle detection, segmentation and counting in parking areas.Zen- odo, 2022. 9, 20

work page 2022
[24]

Semantic segmen- tation in art paintings

Nadav Cohen, Yael Newman, and Ariel Shamir. Semantic segmen- tation in art paintings. Computer Graphics Forum, 2022. 9, 19, 20, 23, 24

work page 2022
[25]

The Cityscapes dataset for semantic urban scene understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The Cityscapes dataset for semantic urban scene understanding. CVPR, 2016. 9, 19, 20

work page 2016
[26]

Learning parameterized skills

Bruno da Silva, George Konidaris, and Andrew Barto. Learning parameterized skills. ICML, 2012. 4

work page 2012
[27]

Rescaling egocentric vision: Collection, pipeline and challenges for EPIC- KITCHENS-100

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Jian Ma, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. Rescaling egocentric vision: Collection, pipeline and challenges for EPIC- KITCHENS-100. IJCV, 2022. 9, 20, 23, 24

work page 2022
[28]

EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations

Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, and Dima Damen. EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations. NeurIPS, 2022. 9, 19, 20, 23, 24

work page 2022
[29]

Does object recognition work for everyone?CVPR workshops, 2019

Terrance De Vries, Ishan Misra, Changhan Wang, and Laurens Van der Maaten. Does object recognition work for everyone?CVPR workshops, 2019. 18

work page 2019
[30]

Crowd- WorkSheets: Accounting for individual and collective identities un- derlying crowdsourced dataset annotation

Mark D ´ıaz, Ian Kivlichan, Rachel Rosen, Dylan Baker, Razvan Amironesei, Vinodkumar Prabhakaran, and Emily Denton. Crowd- WorkSheets: Accounting for individual and collective identities un- derlying crowdsourced dataset annotation. ACM Conference on Fairness, Accountability, and Transparency, 2022. 25

work page 2022
[31]

PhraseClick: toward achieving ﬂexible interactive segmentation by phrase and click

Henghui Ding, Scott Cohen, Brian Price, and Xudong Jiang. PhraseClick: toward achieving ﬂexible interactive segmentation by phrase and click. ECCV, 2020. 11

work page 2020
[32]

Fast edge detection using structured forests

Piotr Doll ´ar and C Lawrence Zitnick. Fast edge detection using structured forests. TPAMI, 2014. 21

work page 2014
[33]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa De- hghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021. 5, 8, 16

work page 2021
[34]

Alireza Fathi, Xiaofeng Ren, and James M. Rehg. Learning to rec- ognize objects in egocentric activities. CVPR, 2011. 9, 19, 20

work page 2011
[35]

Efﬁcient graph- based image segmentation

Pedro F Felzenszwalb and Daniel P Huttenlocher. Efﬁcient graph- based image segmentation. IJCV, 2004. 10

work page 2004
[36]

Fitzpatrick

Thomas B. Fitzpatrick. The validity and practicality of sun-reactive skin types i through vi. Archives of Dermatology, 1988. 8

work page 1988
[37]

Getting to 99% accuracy in interactive segmentation

Marco Forte, Brian Price, Scott Cohen, Ning Xu, and Franc ¸ois Piti´e. Getting to 99% accuracy in interactive segmentation. arXiv:2003.07932, 2020. 5, 17

work page arXiv 2003
[38]

Instance segmentation for au- tonomous log grasping in forestry operations

Jean-Michel Fortin, Olivier Gamache, Vincent Grondin, Franc ¸ois Pomerleau, and Philippe Gigu `ere. Instance segmentation for au- tonomous log grasping in forestry operations. IROS, 2022. 9, 20 13

work page 2022
[39]

Datasheets for datasets

Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jen- nifer Wortman Vaughan, Hanna Wallach, Hal Daum´e Iii, and Kate Crawford. Datasheets for datasets. Communications of the ACM ,

work page
[40]

Simple copy-paste is a strong data augmentation method for instance segmentation.CVPR,

Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D Cubuk, Quoc V Le, and Barret Zoph. Simple copy-paste is a strong data augmentation method for instance segmentation.CVPR,

work page
[41]

Rich feature hierarchies for accurate object detection and semantic segmentation

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR, 2014. 10

work page 2014
[42]

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Priya Goyal, Piotr Doll ´ar, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv:1706.02677, 2017. 17

work page internal anchor Pith review arXiv 2017
[43]

Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Na- garajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhong- cong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent C...

work page 2022
[44]

LVIS: A dataset for large vocabulary instance segmentation

Agrim Gupta, Piotr Dollar, and Ross Girshick. LVIS: A dataset for large vocabulary instance segmentation. CVPR, 2019. 2, 6, 7, 9, 10, 11, 19, 20, 21, 24

work page 2019
[45]

Multiple choice learning: Learning to produce multiple structured outputs

Abner Guzman-Rivera, Dhruv Batra, and Pushmeet Kohli. Multiple choice learning: Learning to produce multiple structured outputs. NeurIPS, 2012. 5, 17

work page 2012
[46]

K ¨uhl, and V olker Steinhage

Timm Haucke, Hjalmar S. K ¨uhl, and V olker Steinhage. SOCRATES: Introducing depth in visual wildlife monitoring using stereo vision. Sensors, 2022. 9, 20

work page 2022
[47]

Masked autoencoders are scalable vision learn- ers

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll ´ar, and Ross Girshick. Masked autoencoders are scalable vision learn- ers. CVPR, 2022. 5, 8, 12, 16, 17

work page 2022
[48]

Mask R-CNN

Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Girshick. Mask R-CNN. ICCV, 2017. 10

work page 2017
[49]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CVPR, 2016. 16

work page 2016
[50]

Gaussian Error Linear Units (GELUs)

Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv:1606.08415, 2016. 16

work page internal anchor Pith review Pith/arXiv arXiv 2016
[51]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. arXiv:2203.15556, 2022. 1

work page internal anchor Pith review Pith/arXiv arXiv 2022
[52]

TrashCan: A semantically-segmented dataset towards visual detection of marine debris

Jungseok Hong, Michael Fulton, and Junaed Sattar. TrashCan: A semantically-segmented dataset towards visual detection of marine debris. arXiv:2007.08097, 2020. 9, 19, 20

work page arXiv 2007
[53]

Deep networks with stochastic depth

Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q Wein- berger. Deep networks with stochastic depth. ECCV, 2016. 17

work page 2016
[54]

Oneformer: One transformer to rule universal image segmentation

Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, and Humphrey Shi. Oneformer: One transformer to rule universal image segmentation. arXiv:2211.06220, 2022. 4

work page arXiv 2022
[55]

Scaling up visual and vision-language representation learning with noisy text supervision

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. ICML, 2021. 1

work page 2021
[56]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv:2001.08361, 2020. 1

work page internal anchor Pith review Pith/arXiv arXiv 2001
[57]

Snakes: Active contour models

Michael Kass, Andrew Witkin, and Demetri Terzopoulos. Snakes: Active contour models. IJCV, 1988. 4

work page 1988
[58]

Learning open-world object proposals without learning to classify

Dahun Kim, Tsung-Yi Lin, Anelia Angelova, In So Kweon, and Weicheng Kuo. Learning open-world object proposals without learning to classify. IEEE Robotics and Automation Letters, 2022. 21

work page 2022
[59]

Panoptic segmentation

Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Doll´ar. Panoptic segmentation. CVPR, 2019. 4

work page 2019
[60]

The open images dataset v4: Uniﬁed image classiﬁcation, object detection, and visual relationship detection at scale

Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, and Vittorio Ferrari. The open images dataset v4: Uniﬁed image classiﬁcation, object detection, and visual relationship detection at scale. IJCV, 2020. 2, 6, 7, 18, 19

work page 2020
[61]

Quantifying the carbon emissions of machine learning

Alexandre Lacoste, Alexandra Luccioni, Victor Schmidt, and Thomas Dandres. Quantifying the carbon emissions of machine learning. arXiv:1910.09700, 2019. 28

work page arXiv 1910
[62]

Explor- ing plain vision transformer backbones for object detection

Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He. Explor- ing plain vision transformer backbones for object detection. ECCV,

work page
[63]

5, 10, 11, 16, 21, 23, 24

work page
[64]

Yin Li, Zhefan Ye, and James M. Rehg. Delving into egocentric actions. CVPR, 2015. 9, 20

work page 2015
[65]

Interactive image segmentation with latent diversity

Zhuwen Li, Qifeng Chen, and Vladlen Koltun. Interactive image segmentation with latent diversity. CVPR, 2018. 5, 17, 19

work page 2018
[66]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. ICCV, 2017. 5, 17

work page 2017
[67]

Mi- crosoft COCO: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Mi- crosoft COCO: Common objects in context. ECCV, 2014. 2, 4, 6, 7, 11, 18, 19, 20

work page 2014
[68]

Sim- pleClick: Interactive image segmentation with simple vision trans- formers

Qin Liu, Zhenlin Xu, Gedas Bertasius, and Marc Niethammer. Sim- pleClick: Interactive image segmentation with simple vision trans- formers. arXiv:2210.11006, 2022. 8, 9, 12, 19

work page arXiv 2022
[69]

Decoupled weight decay regu- larization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regu- larization. ICLR, 2019. 17

work page 2019
[70]

Gelatinous zoo- plankton biomass in the global oceans: geographic variation and environmental drivers

Cathy H Lucas, Daniel OB Jones, Catherine J Hollyhead, Robert H Condon, Carlos M Duarte, William M Graham, Kelly L Robinson, Kylie A Pitt, Mark Schildhauer, and Jim Regetz. Gelatinous zoo- plankton biomass in the global oceans: geographic variation and environmental drivers. Global Ecology and Biogeography , 2014. 20

work page 2014
[71]

Iter- atively trained interactive segmentation

Sabarinath Mahadevan, Paul V oigtlaender, and Bastian Leibe. Iter- atively trained interactive segmentation. BMVC, 2018. 4, 17

work page 2018
[72]

Deep extreme cut: From extreme points to object seg- mentation

Kevis-Kokitsi Maninis, Sergi Caelles, Jordi Pont-Tuset, and Luc Van Gool. Deep extreme cut: From extreme points to object seg- mentation. CVPR, 2018. 6

work page 2018
[73]

A database of human segmented natural images and its applica- tion to evaluating segmentation algorithms and measuring ecologi- cal statistics

David Martin, Charless Fowlkes, Doron Tal, and Jitendra Malik. A database of human segmented natural images and its applica- tion to evaluating segmentation algorithms and measuring ecologi- cal statistics. ICCV, 2001. 10, 21, 28

work page 2001
[74]

V-Net: Fully convolutional neural networks for volumetric medical image segmentation

Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. 3DV, 2016. 5, 17

work page 2016
[75]

Tsaftaris

Massimo Minervini, Andreas Fischbach, Hanno Scharr, and Sotirios A. Tsaftaris. Finely-grained annotated datasets for image- based plant phenotyping. Pattern Recognition Letters, 2016. 9, 20

work page 2016
[76]

Model cards for model reporting

Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Debo- rah Raji, and Timnit Gebru. Model cards for model reporting. Pro- ceedings of the conference on fairness, accountability, and trans- parency, 2019. 25, 28 14

work page 2019
[77]

Extreme clicking for efﬁcient object annotation

Dim P Papadopoulos, Jasper RR Uijlings, Frank Keller, and Vittorio Ferrari. Extreme clicking for efﬁcient object annotation. ICCV,

work page
[78]

Carbon Emissions and Large Neural Network Training

David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis- Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. Carbon emissions and large neural network training. arXiv:2104.10350, 2021. 28

work page internal anchor Pith review arXiv 2021
[79]

Semi-supervised sequence tagging with bidirectional language models

Matthew E Peters, Waleed Ammar, Chandra Bhagavatula, and Rus- sell Power. Semi-supervised sequence tagging with bidirectional language models. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017. 18

work page 2017
[80]

EDTER: Edge detection with transformer

Mengyang Pu, Yaping Huang, Yuming Liu, Qingji Guan, and Haibin Ling. EDTER: Edge detection with transformer. CVPR,

work page

Showing first 80 references.