Segment Anything

Alexander C. Berg; Alexander Kirillov; Chloe Rolland; Eric Mintun; Hanzi Mao; Laura Gustafson; Nikhila Ravi; Piotr Doll\'ar; Ross Girshick; Spencer Whitehead

arxiv: 2304.02643 · v1 · submitted 2023-04-05 · 💻 cs.CV · cs.AI· cs.LG

Segment Anything

Alexander Kirillov , Eric Mintun , Nikhila Ravi , Hanzi Mao , Chloe Rolland , Laura Gustafson , Tete Xiao , Spencer Whitehead

show 4 more authors

Alexander C. Berg Wan-Yen Lo Piotr Doll\'ar Ross Girshick

This is my paper

Pith reviewed 2026-05-11 06:08 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords image segmentationzero-shot transferpromptable modelslarge-scale datasetfoundation modelscomputer visionSAMSA-1B

0 comments

The pith

A promptable model trained on a billion masks enables zero-shot segmentation that often matches supervised results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a new task, model, and dataset for image segmentation. Using an efficient version of the model to collect data in a loop, the authors assembled the largest segmentation dataset to date, containing over one billion masks across eleven million images. The resulting model is built to accept prompts such as points or boxes, allowing it to generalize zero-shot to new image distributions and tasks without retraining. Evaluations across many tasks show that this zero-shot performance is often competitive with or better than earlier models trained with full supervision for each specific task. The work releases both the model and dataset to support further research on foundation models for vision.

Core claim

What carries the argument

The promptable Segment Anything Model (SAM) that takes user-provided prompts such as points, boxes, or coarse masks and outputs object segmentation masks.

If this is right

The model can be applied directly to new image types and tasks without collecting new labeled data or retraining.
Prompt-based interaction becomes a practical way to guide segmentation on arbitrary images.
The released dataset supports training or fine-tuning of additional vision models at large scale.
Releasing both the model and data lowers the barrier for researchers to experiment with promptable segmentation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The self-supervised data collection loop may offer a template for building large datasets in other vision domains where annotation is expensive.
Promptable architectures could extend beyond segmentation to tasks like detection or editing with similar zero-shot benefits.
Interactive tools built on this model might reduce the need for per-task model training in applied settings such as medical imaging or content creation.
Performance on video sequences or 3D data would test whether the promptable property holds across temporal and spatial dimensions.

Load-bearing premise

That the zero-shot results reflect genuine generalization to new distributions rather than overfitting to the self-collected data or the evaluation tasks.

What would settle it

A controlled test on a fresh image domain and segmentation task where the model's zero-shot accuracy falls clearly below a model trained with full supervision on that same task.

read the original abstract

We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at https://segment-anything.com to foster research into foundation models for computer vision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SAM scales promptable segmentation to a billion masks via a model-assisted data loop and claims competitive zero-shot results, but the self-referential collection process is the key thing to watch.

read the letter

The main things to know are the promptable Segment Anything Model and the SA-1B dataset of over a billion masks collected through a staged data engine that starts with humans and shifts to model-driven automation. The model takes points, boxes, or coarse masks as input and produces object masks, with the goal of working zero-shot on new images and tasks without retraining. The public release of both the model and the full dataset is the practical advance that matters most for the field. Earlier interactive segmentation papers existed at much smaller scale, so the combination of prompt flexibility and this dataset size is what is new. The data engine description is clear and the efficiency gains from moving to automatic stages are well explained. Releasing the weights and the 11M images removes a major barrier for follow-up work. The soft spot is the circularity the stress-test flags. Because later collection stages rely on the model itself to generate and refine masks, the training distribution is shaped by the model's own priors. This makes it harder to claim that strong benchmark results prove broad generalization rather than alignment with the collection process. The early human stages reduce the problem but do not eliminate it, and any overlap between evaluation images or prompt styles and the data engine would weaken the zero-shot story. The abstract gives no numbers, so the actual strength of the competitiveness claim depends on the tables and ablations in the full paper. This work is aimed at computer vision researchers building or adapting segmentation tools and at groups studying foundation models. A reader who needs a strong starting point for prompt-based tasks or who wants to experiment with the released data will get immediate value. It deserves peer review because the scale and the public artifacts are substantial even if the generalization evidence needs closer examination by referees.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the Segment Anything (SA) project, comprising a new promptable segmentation task, the Segment Anything Model (SAM), and the SA-1B dataset of over 1 billion masks on 11 million images. The dataset is constructed via a multi-stage data engine that uses an efficient version of the model itself to propose, refine, and automatically generate masks. The central claim is that the resulting promptable model transfers zero-shot to new image distributions and tasks, with performance that is often competitive with or superior to prior fully supervised methods; the model and dataset are released publicly.

Significance. If the zero-shot generalization results are robust, this work would mark a notable advance toward foundation models in computer vision by demonstrating a single promptable model that can handle diverse segmentation tasks without task-specific training or fine-tuning. The unprecedented scale of SA-1B and the open release of both model and data constitute clear strengths that could enable substantial follow-on research.

major comments (2)

[Data Engine section] Data Engine section: The staged collection process (particularly stages 2 and 3) invokes the model to generate and refine masks, so the training distribution is shaped by the model's own inductive biases. This creates a circularity risk that could undermine the zero-shot claim; the manuscript must supply a concrete analysis (e.g., ablation on held-out image sources or comparison of mask statistics before/after the automatic stages) showing that reported gains on external benchmarks are not artifacts of distribution alignment.
[Experiments section] Experiments section: The claim that zero-shot performance is 'often competitive with or even superior to prior fully supervised results' is load-bearing, yet the manuscript provides limited detail on exact metrics, chosen baselines, error bars, and statistical tests across the evaluated tasks. Full per-task tables with variance estimates and explicit comparison protocols are required to substantiate the central performance assertion.

minor comments (2)

[Abstract] Abstract: The phrase 'numerous tasks' is vague; a short parenthetical list of the primary evaluation benchmarks would improve immediate clarity.
[Model section] Notation: The distinction between the 'efficient' model used in the data engine and the final SAM should be introduced with explicit symbols or subsection headings to avoid reader confusion in later sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the data engine and experimental reporting. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of our results.

read point-by-point responses

Referee: [Data Engine section] Data Engine section: The staged collection process (particularly stages 2 and 3) invokes the model to generate and refine masks, so the training distribution is shaped by the model's own inductive biases. This creates a circularity risk that could undermine the zero-shot claim; the manuscript must supply a concrete analysis (e.g., ablation on held-out image sources or comparison of mask statistics before/after the automatic stages) showing that reported gains on external benchmarks are not artifacts of distribution alignment.

Authors: We acknowledge the potential circularity concern arising from the model's role in stages 2 and 3 of the data engine. However, the zero-shot evaluations use entirely external benchmarks and image distributions that were never seen during data collection or training. To directly address this, we will add a new analysis subsection in the revised manuscript that includes: (1) mask statistic comparisons (e.g., size, complexity, and diversity metrics) before and after the automatic stages, and (2) performance ablations on held-out image sources excluded from the data engine. These additions will demonstrate that the reported zero-shot gains on external tasks are not artifacts of distribution alignment. revision: yes
Referee: [Experiments section] Experiments section: The claim that zero-shot performance is 'often competitive with or even superior to prior fully supervised results' is load-bearing, yet the manuscript provides limited detail on exact metrics, chosen baselines, error bars, and statistical tests across the evaluated tasks. Full per-task tables with variance estimates and explicit comparison protocols are required to substantiate the central performance assertion.

Authors: We agree that the central performance claim requires more granular reporting for full substantiation. In the revised manuscript, we will expand the Experiments section with complete per-task tables that include exact metrics for all evaluated tasks, the specific baselines used, error bars or variance estimates (from multiple seeds or cross-validation where feasible), and results of statistical significance tests. We will also add explicit descriptions of the comparison protocols, including how prompts were generated and how zero-shot transfer was measured against fully supervised methods. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical zero-shot evaluations remain independent of data engine

full rationale

The paper's core claim is empirical: a promptable model trained on SA-1B achieves competitive zero-shot results on external tasks and image distributions. The data engine (staged collection using an efficient model variant) is a practical annotation procedure whose outputs are then used for supervised training; the reported evaluations use separate benchmarks whose ground truth and image sources are not generated by the same loop. No equation, uniqueness theorem, or prediction reduces by construction to a fitted parameter or self-generated input. Self-citations are absent from the load-bearing steps, and the architecture's promptability is justified by design and training rather than by renaming prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract provides no explicit free parameters, axioms, or invented entities beyond standard deep-learning training assumptions and the effectiveness of the promptable design.

axioms (1)

domain assumption Standard deep learning assumptions on generalization from large-scale training data
The zero-shot transfer claim rests on the model learning general segmentation capabilities from the collected data.

pith-pipeline@v0.9.0 · 5466 in / 1071 out tokens · 36264 ms · 2026-05-11T06:08:52.324361+00:00 · methodology

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting
cs.CV 2026-05 conditional novelty 8.0

Current CAC models often count the wrong objects because they misalign text prompts with visual content, as demonstrated by new negative-label and distractor tests on the MUCCA dataset.
SelvaBox: A high-resolution dataset for tropical tree crown detection
cs.CV 2025-06 accept novelty 8.0

SelvaBox is the largest open high-resolution dataset for tropical tree crown detection, with benchmarks showing that higher resolution improves accuracy and models trained on it generalize competitively to other unsee...
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
cs.CL 2023-09 unverdicted novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
Vision Harnessing Agent for Open Ad-hoc Segmentation
cs.CV 2026-05 unverdicted novelty 7.0

VASA is a vision-guided agent for open ad-hoc segmentation that creates and validates masks through planning, tool use, and error recovery, outperforming baselines on the new PARS benchmark and RefCOCOm.
Functionalization via Structure Completion and Motion Rectification
cs.CV 2026-05 unverdicted novelty 7.0

Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture wi...
MedCore: Boundary-Preserving Medical Core Pruning for MedSAM
cs.CV 2026-05 unverdicted novelty 7.0

MedCore achieves 60% parameter and 58.4% FLOP reduction on MedSAM with Dice 0.9549 and preserved boundary metrics via dual-intervention pruning and a new boundary leverage principle.
Qwen3-VL-Seg: Unlocking Open-World Referring Segmentation with Vision-Language Grounding
cs.CV 2026-05 unverdicted novelty 7.0

Qwen3-VL-Seg decodes MLLM bounding boxes into pixel-level referring segmentation via a lightweight box-guided mask decoder, new SA1B-ORS training data, and ORS-Bench evaluation, showing strong open-world performance.
OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation
cs.RO 2026-05 unverdicted novelty 7.0

OA-WAM uses persistent address vectors and dynamic content vectors in object slots to enable addressable world-action prediction, improving robustness on manipulation benchmarks under scene changes.
Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures
cs.CV 2026-05 unverdicted novelty 7.0

HeadsUp maps multi-view captures to UV-parameterized 3D Gaussians on a template via an encoder-decoder, achieving state-of-the-art quality and generalization after training on more than 10,000 subjects.
Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting
cs.CV 2026-05 unverdicted novelty 7.0

Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.
IMPACT-CYCLE: A Contract-Based Multi-Agent System for Claim-Level Supervisory Correction of Long-Video Semantic Memory
cs.CV 2026-04 unverdicted novelty 7.0

A contract-based multi-agent system maintains a claim-level semantic memory for long videos, enabling targeted corrections that raise VQA accuracy from 0.71 to 0.79 and cut human arbitration cost by 4.8x on VidOR.
A 3D SAM-Based Progressive Prompting Framework for Multi-Task Segmentation of Radiotherapy-induced Normal Tissue Injuries in Limited-Data Settings
cs.CV 2026-04 unverdicted novelty 7.0

A progressive prompting framework on 3D SAM with text, dose-box, and click prompts plus small-target loss achieves reliable multi-task segmentation of osteoradionecrosis, cerebral edema, and cerebral radiation necrosi...
Seg2Change: Adapting Open-Vocabulary Semantic Segmentation Model for Remote Sensing Change Detection
cs.CV 2026-04 conditional novelty 7.0

Seg2Change adapts open-vocabulary segmentation models to open-vocabulary change detection via a category-agnostic change head and new dataset CA-CDD, delivering +9.52 IoU on WHU-CD and +5.50 mIoU on SECOND.
Off-the-shelf Vision Models Benefit Image Manipulation Localization
cs.CV 2026-04 unverdicted novelty 7.0

ReVi adapter enables off-the-shelf vision models to localize image manipulations by separating and enhancing manipulation cues from semantic features without full model retraining.
Training a Student Expert via Semi-Supervised Foundation Model Distillation
cs.CV 2026-04 conditional novelty 7.0

A semi-supervised framework distills vision foundation models into compact instance segmentation experts that outperform their teachers by up to 11.9 AP on Cityscapes and 8.6 AP on ADE20K while being 11 times smaller.
TSegAgent: Zero-Shot Tooth Segmentation via Geometry-Aware Vision-Language Agents
cs.CV 2026-03 unverdicted novelty 7.0

TSegAgent achieves accurate zero-shot tooth segmentation on 3D dental scans via geometry-aware vision-language reasoning without task-specific training.
OPTED: Open Preprocessed Trachoma Eye Dataset Using Zero-Shot SAM 3 Segmentation
cs.CV 2026-03 accept novelty 7.0

OPTED is a publicly released preprocessed trachoma eye image dataset generated via zero-shot SAM 3 segmentation of the tarsal conjunctiva with an optimal text prompt and quality filtering.
PhysMem: Scaling Test-Time Memory for Embodied Physical Reasoning
cs.RO 2026-02 unverdicted novelty 7.0

PhysMem enables VLM-based robot planners to learn and verify physical properties through test-time interaction and hypothesis testing, raising success on a brick insertion task from 23% to 76%.
Descriptor: Parasitoid Wasps and Associated Hymenoptera Dataset (DAPWH)
cs.CV 2026-02 unverdicted novelty 7.0

Releases the DAPWH dataset of 3556 wasp images including 1739 COCO-annotated examples to enable AI models for identifying Ichneumonoidea and associated families.
SAM 2++: Tracking Anything at Any Granularity
cs.CV 2025-10 conditional novelty 7.0

SAM 2++ unifies video tracking across mask, box, and point granularities via task-specific prompts, a unified decoder, task-adaptive memory, and a new multi-granularity dataset, reporting state-of-the-art results.
ASTRA: Let Arbitrary Subjects Transform in Video Editing
cs.CV 2025-10 unverdicted novelty 7.0

ASTRA is a plug-and-play training-free method for precise multi-subject video editing that uses prompt-guided multimodal alignment and prior-based mask retargeting to avoid attention dilution and boundary issues.
Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning
cs.CV 2025-07 unverdicted novelty 7.0

Presents Reason50K dataset and ReasonBrain framework for hypothetical instruction-based image editing that requires physical, temporal, causal, and story reasoning.
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
cs.AI 2024-10 unverdicted novelty 7.0

PolyMATH is a new 5,000-image benchmark where top MLLMs reach at most 41 percent accuracy on multi-modal mathematical reasoning, with ablation showing minimal gain from text over images.
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
cs.RO 2024-09 conditional novelty 7.0

ReKep encodes robotic tasks as optimizable Python functions over 3D keypoints that are generated automatically from language and RGB-D input, enabling real-time hierarchical planning on single- and dual-arm platforms ...
Deep Time Series Models: A Comprehensive Survey and Benchmark
cs.LG 2024-07 unverdicted novelty 7.0

This survey and benchmark of deep time series models using the released TSLib library finds that models with specific structures perform well only on distinct analysis tasks.
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
cs.CV 2024-03 conditional novelty 7.0

MathVerse is a benchmark that tests multi-modal LLMs on visual math by providing each problem in six versions with progressively less diagram and text information to measure true visual understanding.
Project Aria: A New Tool for Egocentric Multi-Modal AI Research
cs.HC 2023-08 accept novelty 7.0

Project Aria presents a new wearable egocentric multi-modal recording device and software tools to accelerate AI research for augmented reality applications.
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
cs.CL 2023-07 unverdicted novelty 7.0

SEED-Bench is a new benchmark of 19K multiple-choice questions for evaluating generative comprehension in multimodal LLMs across 12 image and video dimensions.
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
cs.RO 2023-07 unverdicted novelty 7.0

VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.
Objaverse-XL: A Universe of 10M+ 3D Objects
cs.CV 2023-07 accept novelty 7.0

Objaverse-XL supplies over 10 million diverse 3D objects that, when used to render 100 million views, improve zero-shot novel-view synthesis in models such as Zero123.
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
cs.CV 2023-03 conditional novelty 7.0

LLaMA-Adapter turns frozen LLaMA 7B into a capable instruction follower using only 1.2M new parameters and zero-init attention, matching Alpaca while extending to image-conditioned reasoning on ScienceQA and COCO.
RiGS: Rigid-aware 4D Gaussian Splatting from a Single Monocular Video
cs.CV 2026-05 unverdicted novelty 6.0

RiGS decomposes scenes into static, rigid, and transient 4D Gaussians with an object-wise dynamic mask and scene flow guidance to model multi-scale motions and achieve SOTA novel view synthesis.
B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation
cs.CV 2026-05 unverdicted novelty 6.0

B-GRTO extends GRPO by reusing rollouts to optimize auxiliary segmentation decoder objectives, yielding substantial gains over plain GRPO on referring segmentation tasks.
SR-Ground: Image Quality Grounding for Super-Resolved Content
cs.CV 2026-05 unverdicted novelty 6.0

The paper releases SR-Ground, a crowdsourced dataset for pixel-level segmentation of six artifact types in super-resolved images, and shows its use for training grounded IQA models and artifact-reducing fine-tuning.
RT-Splatting: Joint Reflection-Transmission Modeling with Gaussian Splatting
cs.CV 2026-05 unverdicted novelty 6.0

RT-Splatting adds a disentangled occupancy-opacity factorization and specular-aware gradient gating to 3D Gaussian Splatting, enabling joint high-fidelity reflection and transmission in real-time novel view synthesis.
ASIP-Planner: Adaptive Planning for UAV Surface Inspection in Partially Known Indoor Environments
cs.RO 2026-05 unverdicted novelty 6.0

ASIP-Planner achieves near-complete surface coverage and shorter trajectories in partially known indoor environments by clustering inspection targets globally and adapting viewing angles locally to handle occlusions.
HeteroGenManip: Generalizable Manipulation For Heterogeneous Object Interactions
cs.RO 2026-05 unverdicted novelty 6.0

HeteroGenManip decouples grasp localization from interaction planning using task-conditioned foundation models and multi-model diffusion policies, delivering 31% average gains in broad simulation tasks and 36.7% in fo...
HeteroGenManip: Generalizable Manipulation For Heterogeneous Object Interactions
cs.RO 2026-05 unverdicted novelty 6.0

A task-conditioned two-stage system decouples grasp localization from interaction trajectory planning using specialized foundation models to improve generalization across heterogeneous object types.
ClickSeg3D: Few-Click Interactive Segmentation via Semantic Embeddings
cs.CV 2026-05 unverdicted novelty 6.0

A point-Transformer interactive 3D instance segmentation model handles multiple clicks jointly in one pass and reports over 20% mIoU gains versus baselines plus 8-10% cross-dataset improvement for one-click-per-instan...
ClickSeg3D: Few-Click Interactive Segmentation via Semantic Embeddings
cs.CV 2026-05 unverdicted novelty 6.0

ClickSeg3D uses a point Transformer encoder and hierarchical mask decoder with semantic embeddings to enable single-pass multi-object 3D interactive segmentation from sparse points, reporting over 20% mIoU gains versu...
Leveraging Image Generators to Address Training Data Scarcity: The Gen4Regen Dataset for Forest Regeneration Mapping
cs.CV 2026-05 conditional novelty 6.0

Mixing real UAV imagery with 2101 AI-generated image-mask pairs improves semantic segmentation F1 scores for fine-grained forest species by over 15 percentage points overall and up to 30 points for rare classes.
YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts
cs.LG 2026-05 unverdicted novelty 6.0

YOTOnet achieves improved zero-shot cross-domain fault diagnosis on bearing datasets by combining a physics-aware invariant feature distiller with domain-conditioned sparse experts, showing performance scaling as more...
Approaching human parity in the quality of automated organoid image segmentation
cs.CV 2026-05 conditional novelty 6.0

A composite SAM-based method segments organoid images with accuracy matching or approaching inter-observer variability among human annotators.
Learning Equivariant Neural-Augmented Object Dynamics From Few Interactions
cs.RO 2026-05 unverdicted novelty 6.0

PIEGraph augments a spring-mass particle model with an equivariant GNN and novel action representation to predict accurate object dynamics for robotic manipulation from few interactions.
GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning
cs.RO 2026-04 unverdicted novelty 6.0

GS-Playground delivers a high-throughput photorealistic simulator for vision-informed robot learning via parallel physics integrated with batch 3D Gaussian Splatting at 10^4 FPS and an automated Real2Sim workflow for ...
DiffuSAM: Diffusion-Based Prompt-Free SAM2 for Few-Shot and Source-Free Medical Image Segmentation
cs.CV 2026-04 unverdicted novelty 6.0

DiffuSAM synthesizes SAM2-compatible mask embeddings via a diffusion prior conditioned on prior slices to enable accurate prompt-free medical image segmentation under SF-UDA and few-shot settings.
AgentLens: Adaptive Visual Modalities for Human-Agent Interaction in Mobile GUI Agents
cs.HC 2026-04 unverdicted novelty 6.0

AgentLens adaptively deploys Full UI, Partial UI, and GenUI modalities with virtual display overlays for mobile GUI agents, yielding 85.7% user preference and best-in-study usability in a 21-participant evaluation.
SpaceDex: Generalizable Dexterous Grasping in Tiered Workspaces
cs.RO 2026-04 unverdicted novelty 6.0

SpaceDex achieves 63% success grasping unseen objects in tiered workspaces via VLM spatial planning and arm-hand feature separation, beating a 39% tabletop baseline in 100 real trials.
Chain Of Interaction Benchmark (COIN): When Reasoning meets Embodied Interaction
cs.RO 2026-04 unverdicted novelty 6.0

COIN provides 50 interactive robotic tasks, a 1000-demonstration dataset collected via AR teleoperation, and metrics showing that CodeAsPolicy, VLA, and H-VLA models fail at causally-dependent interactive reasoning du...
One-Shot Cross-Geometry Skill Transfer through Part Decomposition
cs.RO 2026-04 unverdicted novelty 6.0

Part decomposition with generative shape models allows one-shot robot skill transfer across unfamiliar object geometries in simulation and real settings.
From Boundaries to Semantics: Prompt-Guided Multi-Task Learning for Petrographic Thin-section Segmentation
cs.CV 2026-04 unverdicted novelty 6.0

Petro-SAM adapts SAM via a Merge Block for polarized views plus multi-scale fusion and color-entropy priors to jointly achieve grain-edge and lithology segmentation in petrographic images.
Granularity-Aware Transfer for Tree Instance Segmentation in Synthetic and Real Forests
cs.CV 2026-04 unverdicted novelty 6.0

Granularity-aware distillation improves tree instance segmentation accuracy on real forest images by merging logits and unifying masks from fine-grained synthetic teachers despite coarse real labels.
GTPBD-MM: A Global Terraced Parcel and Boundary Dataset with Multi-Modality
cs.CV 2026-04 unverdicted novelty 6.0

GTPBD-MM is the first multimodal benchmark for global terraced parcel extraction, integrating image, text, and DEM data with experiments showing that textual and terrain cues improve delineation accuracy over image-on...
Self-supervised Pretraining of Cell Segmentation Models
cs.CV 2026-04 unverdicted novelty 6.0

DINOCell achieves a SEG score of 0.784 on LIVECell by self-supervised domain adaptation of DINOv2, improving 10.42% over SAM-based models and showing strong zero-shot transfer.
Text-Guided 6D Object Pose Rearrangement via Closed-Loop VLM Agents
cs.CV 2026-04 unverdicted novelty 6.0

Closed-loop VLM agents using multi-view reasoning, object-centered visualization, and single-axis rotation prediction achieve superior text-guided 6D pose rearrangement for target objects in scenes.
GESS: Multi-cue Guided Local Feature Learning via Geometric and Semantic Synergy
cs.CV 2026-04 unverdicted novelty 6.0

GESS introduces joint semantic-normal and depth stability prediction heads, the SDAK keypoint mechanism, and the UTCF descriptor fusion module to leverage multi-cue synergy for improved robustness and discriminability.
Moondream Segmentation: From Words to Masks
cs.CV 2026-04 unverdicted novelty 6.0

Moondream Segmentation achieves 80.2% cIoU on RefCOCO by autoregressively decoding paths from referring expressions and using RL to refine masks, plus releases a cleaned RefCOCO-M dataset.
TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization
cs.CV 2026-03 unverdicted novelty 6.0

TrianguLang achieves state-of-the-art feed-forward text-guided 3D localization and segmentation by using predicted geometry to gate cross-view semantic correspondences without ground-truth poses.
FG-TreeSeg: Flow-Guided Tree Crown Segmentation without Instance Annotations
cs.CV 2026-01 unverdicted novelty 6.0

FG-TreeSeg applies Cellpose-SAM flow fields to model tree crowns as star-convex objects and separate overlapping instances without training or instance annotations.
Visual Funnel: Resolving Contextual Blindness in Multimodal Large Language Models
cs.CV 2025-12 unverdicted novelty 6.0

Visual Funnel resolves contextual blindness in MLLMs by constructing an entropy-scaled portfolio of hierarchically structured image crops that preserves both local detail and global context.

Reference graph

Works this paper leans on

196 extracted references · 196 canonical work pages · cited by 128 Pith papers · 10 internal anchors

[1]

On seeing stuff: the perception of materials by humans and machines

Edward H Adelson. On seeing stuff: the perception of materials by humans and machines. Human vision and electronic imaging VI ,

work page
[2]

What is an object? CVPR, 2010

Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. What is an object? CVPR, 2010. 4, 10

work page 2010
[3]

Contour detection and hierarchical image segmentation

Pablo Arbel ´aez, Michael Maire, Charless Fowlkes, and Jitendra Malik. Contour detection and hierarchical image segmentation. TPAMI, 2010. 4, 10, 21, 28

work page 2010
[4]

Layer Normalization

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv:1607.06450, 2016. 16

work page internal anchor Pith review Pith/arXiv arXiv 2016
[5]

BEiT: BERT Pre-Training of Image Transformers

Hangbo Bao, Li Dong, and Furu Wei. BEiT: BERT pre-training of image transformers. arXiv:2106.08254, 2021. 17

work page internal anchor Pith review arXiv 2021
[6]

ZeroWaste dataset: Towards deformable object segmentation in cluttered scenes

Dina Bashkirova, Mohamed Abdelfattah, Ziliang Zhu, James Akl, Fadi Alladkani, Ping Hu, Vitaly Ablavsky, Berk Calli, Sarah Adel Bargal, and Kate Saenko. ZeroWaste dataset: Towards deformable object segmentation in cluttered scenes. CVPR, 2022. 9, 20

work page 2022
[7]

Straehle, Bernhard X

Stuart Berg, Dominik Kutra, Thorben Kroeger, Christoph N. Straehle, Bernhard X. Kausler, Carsten Haubold, Martin Schiegg, Janez Ales, Thorsten Beier, Markus Rudy, Kemal Eren, Jaime I. Cervantes, Buote Xu, Fynn Beuttenmueller, Adrian Wolny, Chong Zhang, Ullrich Koethe, Fred A. Hamprecht, and Anna Kreshuk. ilastik: interactive machine learning for (bio)imag...

work page 2019
[8]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportu- nities and risks of foundation models. arXiv:2108.07258, 2021. 1, 12

work page internal anchor Pith review Pith/arXiv arXiv 2021
[9]

Iterative interaction training for segmentation editing networks

Gustav Bredell, Christine Tanner, and Ender Konukoglu. Iterative interaction training for segmentation editing networks. MICCAI,

work page
[10]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott G...

work page 2020
[11]

Cascade R-CNN: Delving into high quality object detection

Zhaowei Cai and Nuno Vasconcelos. Cascade R-CNN: Delving into high quality object detection. CVPR, 2018. 10

work page 2018
[12]

Caicedo, Allen Goodman, Kyle W

Juan C. Caicedo, Allen Goodman, Kyle W. Karhohs, Beth A. Ci- mini, Jeanelle Ackerman, Marzieh Haghighi, CherKeng Heng, Tim Becker, Minh Doan, Claire McQuin, Mohammad Rohban, Shan- tanu Singh, and Anne E. Carpenter. Nucleus segmentation across imaging experiments: the 2018 data science bowl. Nature Methods,

work page 2018
[13]

A computational approach to edge detection

John Canny. A computational approach to edge detection. TPAMI,

work page
[14]

End-to-end object detection with Transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with Transformers. ECCV, 2020. 5, 16, 17

work page 2020
[15]

Automatic image colorization via multimodal predictions

Guillaume Charpiat, Matthias Hofmann, and Bernhard Sch ¨olkopf. Automatic image colorization via multimodal predictions. ECCV,

work page
[16]

Object-proposal evaluation protocol is’ gameable’

Neelima Chavali, Harsh Agrawal, Aroma Mahendru, and Dhruv Batra. Object-proposal evaluation protocol is’ gameable’. CVPR,

work page
[17]

3D instance segmentation of MVS buildings

Jiazhou Chen, Yanghui Xu, Shufang Lu, Ronghua Liang, and Lian- gliang Nan. 3D instance segmentation of MVS buildings. IEEE Transactions on Geoscience and Remote Sensing, 2022. 9, 19, 20, 23, 24

work page 2022
[18]

FocalClick: towards practical interactive image segmentation

Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, and Hengshuang Zhao. FocalClick: towards practical interactive image segmentation. CVPR, 2022. 8, 9, 12, 19

work page 2022
[19]

Masked-attention mask transformer for universal image segmentation

Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kir- illov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. CVPR, 2022. 4

work page 2022
[20]

Per- pixel classiﬁcation is not all you need for semantic segmentation

Bowen Cheng, Alex Schwing, and Alexander Kirillov. Per- pixel classiﬁcation is not all you need for semantic segmentation. NeurIPS, 2021. 5, 16, 17

work page 2021
[21]

PaLM: Scaling Language Modeling with Pathways

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. PaLM: Scaling language modeling with pathways. arXiv:2204.02311, 2022. 1

work page internal anchor Pith review Pith/arXiv arXiv 2022
[22]

Domain adaptation for trafﬁc density estimation

Luca Ciampi, Carlos Santiago, Joao Costeira, Claudio Gennaro, and Giuseppe Amato. Domain adaptation for trafﬁc density estimation. International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2021. 9, 20

work page 2021
[23]

Luca Ciampi, Carlos Santiago, Joao Costeira, Claudio Gennaro, and Giuseppe Amato. Night and day instance segmented park (NDIS- Park) dataset: a collection of images taken by day and by night for vehicle detection, segmentation and counting in parking areas.Zen- odo, 2022. 9, 20

work page 2022
[24]

Semantic segmen- tation in art paintings

Nadav Cohen, Yael Newman, and Ariel Shamir. Semantic segmen- tation in art paintings. Computer Graphics Forum, 2022. 9, 19, 20, 23, 24

work page 2022
[25]

The Cityscapes dataset for semantic urban scene understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The Cityscapes dataset for semantic urban scene understanding. CVPR, 2016. 9, 19, 20

work page 2016
[26]

Learning parameterized skills

Bruno da Silva, George Konidaris, and Andrew Barto. Learning parameterized skills. ICML, 2012. 4

work page 2012
[27]

Rescaling egocentric vision: Collection, pipeline and challenges for EPIC- KITCHENS-100

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Jian Ma, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. Rescaling egocentric vision: Collection, pipeline and challenges for EPIC- KITCHENS-100. IJCV, 2022. 9, 20, 23, 24

work page 2022
[28]

EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations

Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, and Dima Damen. EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations. NeurIPS, 2022. 9, 19, 20, 23, 24

work page 2022
[29]

Does object recognition work for everyone?CVPR workshops, 2019

Terrance De Vries, Ishan Misra, Changhan Wang, and Laurens Van der Maaten. Does object recognition work for everyone?CVPR workshops, 2019. 18

work page 2019
[30]

Crowd- WorkSheets: Accounting for individual and collective identities un- derlying crowdsourced dataset annotation

Mark D ´ıaz, Ian Kivlichan, Rachel Rosen, Dylan Baker, Razvan Amironesei, Vinodkumar Prabhakaran, and Emily Denton. Crowd- WorkSheets: Accounting for individual and collective identities un- derlying crowdsourced dataset annotation. ACM Conference on Fairness, Accountability, and Transparency, 2022. 25

work page 2022
[31]

PhraseClick: toward achieving ﬂexible interactive segmentation by phrase and click

Henghui Ding, Scott Cohen, Brian Price, and Xudong Jiang. PhraseClick: toward achieving ﬂexible interactive segmentation by phrase and click. ECCV, 2020. 11

work page 2020
[32]

Fast edge detection using structured forests

Piotr Doll ´ar and C Lawrence Zitnick. Fast edge detection using structured forests. TPAMI, 2014. 21

work page 2014
[33]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa De- hghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021. 5, 8, 16

work page 2021
[34]

Alireza Fathi, Xiaofeng Ren, and James M. Rehg. Learning to rec- ognize objects in egocentric activities. CVPR, 2011. 9, 19, 20

work page 2011
[35]

Efﬁcient graph- based image segmentation

Pedro F Felzenszwalb and Daniel P Huttenlocher. Efﬁcient graph- based image segmentation. IJCV, 2004. 10

work page 2004
[36]

Fitzpatrick

Thomas B. Fitzpatrick. The validity and practicality of sun-reactive skin types i through vi. Archives of Dermatology, 1988. 8

work page 1988
[37]

Getting to 99% accuracy in interactive seg- mentation

Marco Forte, Brian Price, Scott Cohen, Ning Xu, and Franc ¸ois Piti´e. Getting to 99% accuracy in interactive segmentation. arXiv:2003.07932, 2020. 5, 17

work page arXiv 2003
[38]

Instance segmentation for au- tonomous log grasping in forestry operations

Jean-Michel Fortin, Olivier Gamache, Vincent Grondin, Franc ¸ois Pomerleau, and Philippe Gigu `ere. Instance segmentation for au- tonomous log grasping in forestry operations. IROS, 2022. 9, 20 13

work page 2022
[39]

Datasheets for datasets

Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jen- nifer Wortman Vaughan, Hanna Wallach, Hal Daum´e Iii, and Kate Crawford. Datasheets for datasets. Communications of the ACM ,

work page
[40]

Simple copy-paste is a strong data augmentation method for instance segmentation.CVPR,

Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D Cubuk, Quoc V Le, and Barret Zoph. Simple copy-paste is a strong data augmentation method for instance segmentation.CVPR,

work page
[41]

Rich feature hierarchies for accurate object detection and semantic segmentation

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR, 2014. 10

work page 2014
[42]

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Priya Goyal, Piotr Doll ´ar, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv:1706.02677, 2017. 17

work page internal anchor Pith review arXiv 2017
[43]

Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Na- garajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhong- cong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent C...

work page 2022
[44]

LVIS: A dataset for large vocabulary instance segmentation

Agrim Gupta, Piotr Dollar, and Ross Girshick. LVIS: A dataset for large vocabulary instance segmentation. CVPR, 2019. 2, 6, 7, 9, 10, 11, 19, 20, 21, 24

work page 2019
[45]

Multiple choice learning: Learning to produce multiple structured outputs

Abner Guzman-Rivera, Dhruv Batra, and Pushmeet Kohli. Multiple choice learning: Learning to produce multiple structured outputs. NeurIPS, 2012. 5, 17

work page 2012
[46]

K ¨uhl, and V olker Steinhage

Timm Haucke, Hjalmar S. K ¨uhl, and V olker Steinhage. SOCRATES: Introducing depth in visual wildlife monitoring using stereo vision. Sensors, 2022. 9, 20

work page 2022
[47]

Masked autoencoders are scalable vision learn- ers

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll ´ar, and Ross Girshick. Masked autoencoders are scalable vision learn- ers. CVPR, 2022. 5, 8, 12, 16, 17

work page 2022
[48]

Mask R-CNN

Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Girshick. Mask R-CNN. ICCV, 2017. 10

work page 2017
[49]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CVPR, 2016. 16

work page 2016
[50]

Gaussian Error Linear Units (GELUs)

Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv:1606.08415, 2016. 16

work page internal anchor Pith review Pith/arXiv arXiv 2016
[51]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. arXiv:2203.15556, 2022. 1

work page internal anchor Pith review Pith/arXiv arXiv 2022
[52]

Trash- can: A semantically-segmented dataset towards visual de- tection of marine debris

Jungseok Hong, Michael Fulton, and Junaed Sattar. TrashCan: A semantically-segmented dataset towards visual detection of marine debris. arXiv:2007.08097, 2020. 9, 19, 20

work page arXiv 2007
[53]

Deep networks with stochastic depth

Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q Wein- berger. Deep networks with stochastic depth. ECCV, 2016. 17

work page 2016
[54]

Oneformer: One transformer to rule universal image segmentation

Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, and Humphrey Shi. Oneformer: One transformer to rule universal image segmentation. arXiv:2211.06220, 2022. 4

work page arXiv 2022
[55]

Scaling up visual and vision-language representation learning with noisy text supervision

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. ICML, 2021. 1

work page 2021
[56]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv:2001.08361, 2020. 1

work page internal anchor Pith review Pith/arXiv arXiv 2001
[57]

Snakes: Active contour models

Michael Kass, Andrew Witkin, and Demetri Terzopoulos. Snakes: Active contour models. IJCV, 1988. 4

work page 1988
[58]

Learning open-world object proposals without learning to classify

Dahun Kim, Tsung-Yi Lin, Anelia Angelova, In So Kweon, and Weicheng Kuo. Learning open-world object proposals without learning to classify. IEEE Robotics and Automation Letters, 2022. 21

work page 2022
[59]

Panoptic segmentation

Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Doll´ar. Panoptic segmentation. CVPR, 2019. 4

work page 2019
[60]

The open images dataset v4: Uniﬁed image classiﬁcation, object detection, and visual relationship detection at scale

Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, and Vittorio Ferrari. The open images dataset v4: Uniﬁed image classiﬁcation, object detection, and visual relationship detection at scale. IJCV, 2020. 2, 6, 7, 18, 19

work page 2020
[61]

Quantifying the Carbon Emissions of Machine Learning

Alexandre Lacoste, Alexandra Luccioni, Victor Schmidt, and Thomas Dandres. Quantifying the carbon emissions of machine learning. arXiv:1910.09700, 2019. 28

work page internal anchor Pith review arXiv 1910
[62]

Explor- ing plain vision transformer backbones for object detection

Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He. Explor- ing plain vision transformer backbones for object detection. ECCV,

work page
[63]

5, 10, 11, 16, 21, 23, 24

work page
[64]

Yin Li, Zhefan Ye, and James M. Rehg. Delving into egocentric actions. CVPR, 2015. 9, 20

work page 2015
[65]

Interactive image segmentation with latent diversity

Zhuwen Li, Qifeng Chen, and Vladlen Koltun. Interactive image segmentation with latent diversity. CVPR, 2018. 5, 17, 19

work page 2018
[66]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. ICCV, 2017. 5, 17

work page 2017
[67]

Mi- crosoft COCO: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Mi- crosoft COCO: Common objects in context. ECCV, 2014. 2, 4, 6, 7, 11, 18, 19, 20

work page 2014
[68]

Sim- pleClick: Interactive image segmentation with simple vision trans- formers

Qin Liu, Zhenlin Xu, Gedas Bertasius, and Marc Niethammer. Sim- pleClick: Interactive image segmentation with simple vision trans- formers. arXiv:2210.11006, 2022. 8, 9, 12, 19

work page arXiv 2022
[69]

Decoupled weight decay regu- larization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regu- larization. ICLR, 2019. 17

work page 2019
[70]

Gelatinous zoo- plankton biomass in the global oceans: geographic variation and environmental drivers

Cathy H Lucas, Daniel OB Jones, Catherine J Hollyhead, Robert H Condon, Carlos M Duarte, William M Graham, Kelly L Robinson, Kylie A Pitt, Mark Schildhauer, and Jim Regetz. Gelatinous zoo- plankton biomass in the global oceans: geographic variation and environmental drivers. Global Ecology and Biogeography , 2014. 20

work page 2014
[71]

Iter- atively trained interactive segmentation

Sabarinath Mahadevan, Paul V oigtlaender, and Bastian Leibe. Iter- atively trained interactive segmentation. BMVC, 2018. 4, 17

work page 2018
[72]

Deep extreme cut: From extreme points to object seg- mentation

Kevis-Kokitsi Maninis, Sergi Caelles, Jordi Pont-Tuset, and Luc Van Gool. Deep extreme cut: From extreme points to object seg- mentation. CVPR, 2018. 6

work page 2018
[73]

A database of human segmented natural images and its applica- tion to evaluating segmentation algorithms and measuring ecologi- cal statistics

David Martin, Charless Fowlkes, Doron Tal, and Jitendra Malik. A database of human segmented natural images and its applica- tion to evaluating segmentation algorithms and measuring ecologi- cal statistics. ICCV, 2001. 10, 21, 28

work page 2001
[74]

V-Net: Fully convolutional neural networks for volumetric medical image segmentation

Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. 3DV, 2016. 5, 17

work page 2016
[75]

Tsaftaris

Massimo Minervini, Andreas Fischbach, Hanno Scharr, and Sotirios A. Tsaftaris. Finely-grained annotated datasets for image- based plant phenotyping. Pattern Recognition Letters, 2016. 9, 20

work page 2016
[76]

Model cards for model reporting

Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Debo- rah Raji, and Timnit Gebru. Model cards for model reporting. Pro- ceedings of the conference on fairness, accountability, and trans- parency, 2019. 25, 28 14

work page 2019
[77]

Extreme clicking for efﬁcient object annotation

Dim P Papadopoulos, Jasper RR Uijlings, Frank Keller, and Vittorio Ferrari. Extreme clicking for efﬁcient object annotation. ICCV,

work page
[78]

Carbon Emissions and Large Neural Network Training

David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis- Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. Carbon emissions and large neural network training. arXiv:2104.10350, 2021. 28

work page internal anchor Pith review arXiv 2021
[79]

Semi-supervised sequence tagging with bidirectional language models

Matthew E Peters, Waleed Ammar, Chandra Bhagavatula, and Rus- sell Power. Semi-supervised sequence tagging with bidirectional language models. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017. 18

work page 2017
[80]

EDTER: Edge detection with transformer

Mengyang Pu, Yaping Huang, Yuming Liu, Qingji Guan, and Haibin Ling. EDTER: Edge detection with transformer. CVPR,

work page

Showing first 80 references.

[1] [1]

On seeing stuff: the perception of materials by humans and machines

Edward H Adelson. On seeing stuff: the perception of materials by humans and machines. Human vision and electronic imaging VI ,

work page

[2] [2]

What is an object? CVPR, 2010

Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. What is an object? CVPR, 2010. 4, 10

work page 2010

[3] [3]

Contour detection and hierarchical image segmentation

Pablo Arbel ´aez, Michael Maire, Charless Fowlkes, and Jitendra Malik. Contour detection and hierarchical image segmentation. TPAMI, 2010. 4, 10, 21, 28

work page 2010

[4] [4]

Layer Normalization

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv:1607.06450, 2016. 16

work page internal anchor Pith review Pith/arXiv arXiv 2016

[5] [5]

BEiT: BERT Pre-Training of Image Transformers

Hangbo Bao, Li Dong, and Furu Wei. BEiT: BERT pre-training of image transformers. arXiv:2106.08254, 2021. 17

work page internal anchor Pith review arXiv 2021

[6] [6]

ZeroWaste dataset: Towards deformable object segmentation in cluttered scenes

Dina Bashkirova, Mohamed Abdelfattah, Ziliang Zhu, James Akl, Fadi Alladkani, Ping Hu, Vitaly Ablavsky, Berk Calli, Sarah Adel Bargal, and Kate Saenko. ZeroWaste dataset: Towards deformable object segmentation in cluttered scenes. CVPR, 2022. 9, 20

work page 2022

[7] [7]

Straehle, Bernhard X

Stuart Berg, Dominik Kutra, Thorben Kroeger, Christoph N. Straehle, Bernhard X. Kausler, Carsten Haubold, Martin Schiegg, Janez Ales, Thorsten Beier, Markus Rudy, Kemal Eren, Jaime I. Cervantes, Buote Xu, Fynn Beuttenmueller, Adrian Wolny, Chong Zhang, Ullrich Koethe, Fred A. Hamprecht, and Anna Kreshuk. ilastik: interactive machine learning for (bio)imag...

work page 2019

[8] [8]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportu- nities and risks of foundation models. arXiv:2108.07258, 2021. 1, 12

work page internal anchor Pith review Pith/arXiv arXiv 2021

[9] [9]

Iterative interaction training for segmentation editing networks

Gustav Bredell, Christine Tanner, and Ender Konukoglu. Iterative interaction training for segmentation editing networks. MICCAI,

work page

[10] [10]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott G...

work page 2020

[11] [11]

Cascade R-CNN: Delving into high quality object detection

Zhaowei Cai and Nuno Vasconcelos. Cascade R-CNN: Delving into high quality object detection. CVPR, 2018. 10

work page 2018

[12] [12]

Caicedo, Allen Goodman, Kyle W

Juan C. Caicedo, Allen Goodman, Kyle W. Karhohs, Beth A. Ci- mini, Jeanelle Ackerman, Marzieh Haghighi, CherKeng Heng, Tim Becker, Minh Doan, Claire McQuin, Mohammad Rohban, Shan- tanu Singh, and Anne E. Carpenter. Nucleus segmentation across imaging experiments: the 2018 data science bowl. Nature Methods,

work page 2018

[13] [13]

A computational approach to edge detection

John Canny. A computational approach to edge detection. TPAMI,

work page

[14] [14]

End-to-end object detection with Transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with Transformers. ECCV, 2020. 5, 16, 17

work page 2020

[15] [15]

Automatic image colorization via multimodal predictions

Guillaume Charpiat, Matthias Hofmann, and Bernhard Sch ¨olkopf. Automatic image colorization via multimodal predictions. ECCV,

work page

[16] [16]

Object-proposal evaluation protocol is’ gameable’

Neelima Chavali, Harsh Agrawal, Aroma Mahendru, and Dhruv Batra. Object-proposal evaluation protocol is’ gameable’. CVPR,

work page

[17] [17]

3D instance segmentation of MVS buildings

Jiazhou Chen, Yanghui Xu, Shufang Lu, Ronghua Liang, and Lian- gliang Nan. 3D instance segmentation of MVS buildings. IEEE Transactions on Geoscience and Remote Sensing, 2022. 9, 19, 20, 23, 24

work page 2022

[18] [18]

FocalClick: towards practical interactive image segmentation

Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, and Hengshuang Zhao. FocalClick: towards practical interactive image segmentation. CVPR, 2022. 8, 9, 12, 19

work page 2022

[19] [19]

Masked-attention mask transformer for universal image segmentation

Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kir- illov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. CVPR, 2022. 4

work page 2022

[20] [20]

Per- pixel classiﬁcation is not all you need for semantic segmentation

Bowen Cheng, Alex Schwing, and Alexander Kirillov. Per- pixel classiﬁcation is not all you need for semantic segmentation. NeurIPS, 2021. 5, 16, 17

work page 2021

[21] [21]

PaLM: Scaling Language Modeling with Pathways

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. PaLM: Scaling language modeling with pathways. arXiv:2204.02311, 2022. 1

work page internal anchor Pith review Pith/arXiv arXiv 2022

[22] [22]

Domain adaptation for trafﬁc density estimation

Luca Ciampi, Carlos Santiago, Joao Costeira, Claudio Gennaro, and Giuseppe Amato. Domain adaptation for trafﬁc density estimation. International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2021. 9, 20

work page 2021

[23] [23]

Luca Ciampi, Carlos Santiago, Joao Costeira, Claudio Gennaro, and Giuseppe Amato. Night and day instance segmented park (NDIS- Park) dataset: a collection of images taken by day and by night for vehicle detection, segmentation and counting in parking areas.Zen- odo, 2022. 9, 20

work page 2022

[24] [24]

Semantic segmen- tation in art paintings

Nadav Cohen, Yael Newman, and Ariel Shamir. Semantic segmen- tation in art paintings. Computer Graphics Forum, 2022. 9, 19, 20, 23, 24

work page 2022

[25] [25]

The Cityscapes dataset for semantic urban scene understanding

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The Cityscapes dataset for semantic urban scene understanding. CVPR, 2016. 9, 19, 20

work page 2016

[26] [26]

Learning parameterized skills

Bruno da Silva, George Konidaris, and Andrew Barto. Learning parameterized skills. ICML, 2012. 4

work page 2012

[27] [27]

Rescaling egocentric vision: Collection, pipeline and challenges for EPIC- KITCHENS-100

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Jian Ma, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. Rescaling egocentric vision: Collection, pipeline and challenges for EPIC- KITCHENS-100. IJCV, 2022. 9, 20, 23, 24

work page 2022

[28] [28]

EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations

Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, and Dima Damen. EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations. NeurIPS, 2022. 9, 19, 20, 23, 24

work page 2022

[29] [29]

Does object recognition work for everyone?CVPR workshops, 2019

Terrance De Vries, Ishan Misra, Changhan Wang, and Laurens Van der Maaten. Does object recognition work for everyone?CVPR workshops, 2019. 18

work page 2019

[30] [30]

Crowd- WorkSheets: Accounting for individual and collective identities un- derlying crowdsourced dataset annotation

Mark D ´ıaz, Ian Kivlichan, Rachel Rosen, Dylan Baker, Razvan Amironesei, Vinodkumar Prabhakaran, and Emily Denton. Crowd- WorkSheets: Accounting for individual and collective identities un- derlying crowdsourced dataset annotation. ACM Conference on Fairness, Accountability, and Transparency, 2022. 25

work page 2022

[31] [31]

PhraseClick: toward achieving ﬂexible interactive segmentation by phrase and click

Henghui Ding, Scott Cohen, Brian Price, and Xudong Jiang. PhraseClick: toward achieving ﬂexible interactive segmentation by phrase and click. ECCV, 2020. 11

work page 2020

[32] [32]

Fast edge detection using structured forests

Piotr Doll ´ar and C Lawrence Zitnick. Fast edge detection using structured forests. TPAMI, 2014. 21

work page 2014

[33] [33]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa De- hghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021. 5, 8, 16

work page 2021

[34] [34]

Alireza Fathi, Xiaofeng Ren, and James M. Rehg. Learning to rec- ognize objects in egocentric activities. CVPR, 2011. 9, 19, 20

work page 2011

[35] [35]

Efﬁcient graph- based image segmentation

Pedro F Felzenszwalb and Daniel P Huttenlocher. Efﬁcient graph- based image segmentation. IJCV, 2004. 10

work page 2004

[36] [36]

Fitzpatrick

Thomas B. Fitzpatrick. The validity and practicality of sun-reactive skin types i through vi. Archives of Dermatology, 1988. 8

work page 1988

[37] [37]

Getting to 99% accuracy in interactive seg- mentation

Marco Forte, Brian Price, Scott Cohen, Ning Xu, and Franc ¸ois Piti´e. Getting to 99% accuracy in interactive segmentation. arXiv:2003.07932, 2020. 5, 17

work page arXiv 2003

[38] [38]

Instance segmentation for au- tonomous log grasping in forestry operations

Jean-Michel Fortin, Olivier Gamache, Vincent Grondin, Franc ¸ois Pomerleau, and Philippe Gigu `ere. Instance segmentation for au- tonomous log grasping in forestry operations. IROS, 2022. 9, 20 13

work page 2022

[39] [39]

Datasheets for datasets

Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jen- nifer Wortman Vaughan, Hanna Wallach, Hal Daum´e Iii, and Kate Crawford. Datasheets for datasets. Communications of the ACM ,

work page

[40] [40]

Simple copy-paste is a strong data augmentation method for instance segmentation.CVPR,

Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D Cubuk, Quoc V Le, and Barret Zoph. Simple copy-paste is a strong data augmentation method for instance segmentation.CVPR,

work page

[41] [41]

Rich feature hierarchies for accurate object detection and semantic segmentation

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR, 2014. 10

work page 2014

[42] [42]

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Priya Goyal, Piotr Doll ´ar, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv:1706.02677, 2017. 17

work page internal anchor Pith review arXiv 2017

[43] [43]

Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Na- garajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhong- cong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent C...

work page 2022

[44] [44]

LVIS: A dataset for large vocabulary instance segmentation

Agrim Gupta, Piotr Dollar, and Ross Girshick. LVIS: A dataset for large vocabulary instance segmentation. CVPR, 2019. 2, 6, 7, 9, 10, 11, 19, 20, 21, 24

work page 2019

[45] [45]

Multiple choice learning: Learning to produce multiple structured outputs

Abner Guzman-Rivera, Dhruv Batra, and Pushmeet Kohli. Multiple choice learning: Learning to produce multiple structured outputs. NeurIPS, 2012. 5, 17

work page 2012

[46] [46]

K ¨uhl, and V olker Steinhage

Timm Haucke, Hjalmar S. K ¨uhl, and V olker Steinhage. SOCRATES: Introducing depth in visual wildlife monitoring using stereo vision. Sensors, 2022. 9, 20

work page 2022

[47] [47]

Masked autoencoders are scalable vision learn- ers

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll ´ar, and Ross Girshick. Masked autoencoders are scalable vision learn- ers. CVPR, 2022. 5, 8, 12, 16, 17

work page 2022

[48] [48]

Mask R-CNN

Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Girshick. Mask R-CNN. ICCV, 2017. 10

work page 2017

[49] [49]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CVPR, 2016. 16

work page 2016

[50] [50]

Gaussian Error Linear Units (GELUs)

Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv:1606.08415, 2016. 16

work page internal anchor Pith review Pith/arXiv arXiv 2016

[51] [51]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. arXiv:2203.15556, 2022. 1

work page internal anchor Pith review Pith/arXiv arXiv 2022

[52] [52]

Trash- can: A semantically-segmented dataset towards visual de- tection of marine debris

Jungseok Hong, Michael Fulton, and Junaed Sattar. TrashCan: A semantically-segmented dataset towards visual detection of marine debris. arXiv:2007.08097, 2020. 9, 19, 20

work page arXiv 2007

[53] [53]

Deep networks with stochastic depth

Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q Wein- berger. Deep networks with stochastic depth. ECCV, 2016. 17

work page 2016

[54] [54]

Oneformer: One transformer to rule universal image segmentation

Jitesh Jain, Jiachen Li, MangTik Chiu, Ali Hassani, Nikita Orlov, and Humphrey Shi. Oneformer: One transformer to rule universal image segmentation. arXiv:2211.06220, 2022. 4

work page arXiv 2022

[55] [55]

Scaling up visual and vision-language representation learning with noisy text supervision

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. Scaling up visual and vision-language representation learning with noisy text supervision. ICML, 2021. 1

work page 2021

[56] [56]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv:2001.08361, 2020. 1

work page internal anchor Pith review Pith/arXiv arXiv 2001

[57] [57]

Snakes: Active contour models

Michael Kass, Andrew Witkin, and Demetri Terzopoulos. Snakes: Active contour models. IJCV, 1988. 4

work page 1988

[58] [58]

Learning open-world object proposals without learning to classify

Dahun Kim, Tsung-Yi Lin, Anelia Angelova, In So Kweon, and Weicheng Kuo. Learning open-world object proposals without learning to classify. IEEE Robotics and Automation Letters, 2022. 21

work page 2022

[59] [59]

Panoptic segmentation

Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Doll´ar. Panoptic segmentation. CVPR, 2019. 4

work page 2019

[60] [60]

The open images dataset v4: Uniﬁed image classiﬁcation, object detection, and visual relationship detection at scale

Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, Tom Duerig, and Vittorio Ferrari. The open images dataset v4: Uniﬁed image classiﬁcation, object detection, and visual relationship detection at scale. IJCV, 2020. 2, 6, 7, 18, 19

work page 2020

[61] [61]

Quantifying the Carbon Emissions of Machine Learning

Alexandre Lacoste, Alexandra Luccioni, Victor Schmidt, and Thomas Dandres. Quantifying the carbon emissions of machine learning. arXiv:1910.09700, 2019. 28

work page internal anchor Pith review arXiv 1910

[62] [62]

Explor- ing plain vision transformer backbones for object detection

Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He. Explor- ing plain vision transformer backbones for object detection. ECCV,

work page

[63] [63]

5, 10, 11, 16, 21, 23, 24

work page

[64] [64]

Yin Li, Zhefan Ye, and James M. Rehg. Delving into egocentric actions. CVPR, 2015. 9, 20

work page 2015

[65] [65]

Interactive image segmentation with latent diversity

Zhuwen Li, Qifeng Chen, and Vladlen Koltun. Interactive image segmentation with latent diversity. CVPR, 2018. 5, 17, 19

work page 2018

[66] [66]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. ICCV, 2017. 5, 17

work page 2017

[67] [67]

Mi- crosoft COCO: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Mi- crosoft COCO: Common objects in context. ECCV, 2014. 2, 4, 6, 7, 11, 18, 19, 20

work page 2014

[68] [68]

Sim- pleClick: Interactive image segmentation with simple vision trans- formers

Qin Liu, Zhenlin Xu, Gedas Bertasius, and Marc Niethammer. Sim- pleClick: Interactive image segmentation with simple vision trans- formers. arXiv:2210.11006, 2022. 8, 9, 12, 19

work page arXiv 2022

[69] [69]

Decoupled weight decay regu- larization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regu- larization. ICLR, 2019. 17

work page 2019

[70] [70]

Gelatinous zoo- plankton biomass in the global oceans: geographic variation and environmental drivers

Cathy H Lucas, Daniel OB Jones, Catherine J Hollyhead, Robert H Condon, Carlos M Duarte, William M Graham, Kelly L Robinson, Kylie A Pitt, Mark Schildhauer, and Jim Regetz. Gelatinous zoo- plankton biomass in the global oceans: geographic variation and environmental drivers. Global Ecology and Biogeography , 2014. 20

work page 2014

[71] [71]

Iter- atively trained interactive segmentation

Sabarinath Mahadevan, Paul V oigtlaender, and Bastian Leibe. Iter- atively trained interactive segmentation. BMVC, 2018. 4, 17

work page 2018

[72] [72]

Deep extreme cut: From extreme points to object seg- mentation

Kevis-Kokitsi Maninis, Sergi Caelles, Jordi Pont-Tuset, and Luc Van Gool. Deep extreme cut: From extreme points to object seg- mentation. CVPR, 2018. 6

work page 2018

[73] [73]

A database of human segmented natural images and its applica- tion to evaluating segmentation algorithms and measuring ecologi- cal statistics

David Martin, Charless Fowlkes, Doron Tal, and Jitendra Malik. A database of human segmented natural images and its applica- tion to evaluating segmentation algorithms and measuring ecologi- cal statistics. ICCV, 2001. 10, 21, 28

work page 2001

[74] [74]

V-Net: Fully convolutional neural networks for volumetric medical image segmentation

Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. 3DV, 2016. 5, 17

work page 2016

[75] [75]

Tsaftaris

Massimo Minervini, Andreas Fischbach, Hanno Scharr, and Sotirios A. Tsaftaris. Finely-grained annotated datasets for image- based plant phenotyping. Pattern Recognition Letters, 2016. 9, 20

work page 2016

[76] [76]

Model cards for model reporting

Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Debo- rah Raji, and Timnit Gebru. Model cards for model reporting. Pro- ceedings of the conference on fairness, accountability, and trans- parency, 2019. 25, 28 14

work page 2019

[77] [77]

Extreme clicking for efﬁcient object annotation

Dim P Papadopoulos, Jasper RR Uijlings, Frank Keller, and Vittorio Ferrari. Extreme clicking for efﬁcient object annotation. ICCV,

work page

[78] [78]

Carbon Emissions and Large Neural Network Training

David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis- Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean. Carbon emissions and large neural network training. arXiv:2104.10350, 2021. 28

work page internal anchor Pith review arXiv 2021

[79] [79]

Semi-supervised sequence tagging with bidirectional language models

Matthew E Peters, Waleed Ammar, Chandra Bhagavatula, and Rus- sell Power. Semi-supervised sequence tagging with bidirectional language models. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017. 18

work page 2017

[80] [80]

EDTER: Edge detection with transformer

Mengyang Pu, Yaping Huang, Yuming Liu, Qingji Guan, and Haibin Ling. EDTER: Edge detection with transformer. CVPR,

work page