ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries

Hyejin Park; Jungseul Ok; Junhyuk Kwon; Kyle Min; Seungjoon Lee

arxiv: 2605.06223 · v3 · pith:6J5G7XOFnew · submitted 2026-05-07 · 💻 cs.AI · cs.RO

ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries

Junhyuk Kwon , Seungjoon Lee , Hyejin Park , Kyle Min , Jungseul Ok This is my paper

Pith reviewed 2026-05-19 16:56 UTC · model grok-4.3

classification 💻 cs.AI cs.RO

keywords instance navigationambiguous queriescomparative judgmentcandidate pruningbinary questionsproactive agent

0 comments

The pith

ProCompNav resolves ambiguous navigation queries by asking binary questions that split candidate pools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ProCompNav to help navigation agents handle unclear user requests for specific items among similar ones. Instead of asking for detailed descriptions or guessing early, it builds a set of possible candidates and then asks yes-or-no questions about attributes that best separate those candidates. Each answer removes all candidates that do not match, quickly narrowing to the target. This leads to higher success rates on test benchmarks and shorter replies from users compared to prior approaches that either stop too soon or require more input.

Core claim

The core discovery is that reframing disambiguation as pool-level discriminative questioning, where each binary query is chosen to split the current candidate set, allows an agent to identify the target instance more reliably and with less user effort than methods relying on individual candidate attributes or upfront detailed descriptions.

What carries the argument

The two-stage framework consisting of candidate pool construction followed by iterative selection of splitting attribute-value pairs for binary questioning and immediate pruning of inconsistent candidates.

Load-bearing premise

Reliable extraction of attribute-value pairs from candidates and that user binary answers will accurately and fully prune the candidate pool without introducing new ambiguities.

What would settle it

Observing cases where the extracted attributes fail to distinguish key distractors or where pruning leads to incorrect elimination of the true target due to inconsistent answers would falsify the effectiveness of the comparative judgment approach.

Figures

Figures reproduced from arXiv: 2605.06223 by Hyejin Park, Jungseul Ok, Junhyuk Kwon, Kyle Min, Seungjoon Lee.

**Figure 1.** Figure 1: Three strategies for instance navigation under an ambiguous user query. (a) view at source ↗

**Figure 2.** Figure 2: Recursive Comparative Judgment. At iteration t, ProCompNav splits the candidate pool Ut into a core set Gc and a remainder set Gr by similarity. It identifies a discriminative attribute a ∗ t , that is common in Gc but not in Gr. Finally, it asks whether the target has a ∗ t , and prunes the pool to obtain the next candidate pool Ut+1 based on the user’s response. Because distractors D and T ∗ share many a… view at source ↗

**Figure 3.** Figure 3: Termination-step analysis of AIUTA and ProCompNav. The x-axis shows termination steps in 100-step bins, except the max exploration step; bars (left y-axis) show number of terminated episodes, and lines (right y-axis) show cumulative number of successful episodes. To demonstrate the advantage of our collect-thencompare strategy, we compare the episode termination steps and success rates of AIUTA and Pro… view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of Independent Matching and Comparative Judgment under a view at source ↗

**Figure 5.** Figure 5: TextNav adaptation of the Recursive Comparison Stage. In TextNav, ProCompNav pre view at source ↗

**Figure 6.** Figure 6: Examples of multi-view candidates produced by the Pool Construction Stage. For each view at source ↗

**Figure 6.** Figure 6: Examples of multi-view candidates produced by the Pool Construction Stage. For each [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Effect of the candidate pool size threshold view at source ↗

read the original abstract

Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user's burden by actively asking only the information needed to distinguish the target from similar distractors, rather than requiring a detailed description upfront. Existing approaches often fall short of this goal: they may stop at the first plausible candidate before sufficiently exploring alternatives, or, even after collecting multiple candidates, ask about the target's attributes derived from individual candidates rather than questions selected to distinguish candidates in the pool. As a result, despite the dialogue, the agent may still fail to distinguish the target from distractors, leading to premature decisions and lengthy user responses. We propose Proactive Instance Navigation with Comparative Judgment (ProCompNav), a two-stage framework that first constructs a candidate pool and then identifies the target through comparative judgment. At each round, ProCompNav extracts an attribute-value pair that splits the current pool, asks a binary yes/no question, and prunes all inconsistent candidates at once. This reframes disambiguation from open-ended target description to pool-level discriminative questioning, where each question is chosen to narrow the candidate set. On CoIN-Bench, ProCompNav improves Success Rate over interactive baselines with the same minimal input and non-interactive baselines with detailed descriptions, while substantially reducing Response Length. ProCompNav also achieves state-of-the-art Success Rate on TextNav, suggesting that comparative judgment is broadly useful for instance-level navigation among similar distractors. Code is available at https://github.com/tree-jhk/procompnav.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ProCompNav turns ambiguous navigation into a loop of binary pool-splitting questions rather than per-object descriptions, and the reported gains on CoIN-Bench and TextNav rest on that shift.

read the letter

The paper's core move is to build a candidate pool first, then repeatedly pick an attribute-value pair that splits the pool, ask a yes/no question, and drop every inconsistent candidate at once. This replaces open-ended target descriptions or single-candidate attribute queries with comparative questions chosen to shrink the set. That reframing is the clearest difference from the baselines mentioned in the abstract, and it directly targets the problem of lengthy or premature user responses in instance navigation among similar objects. The reported results show higher success rates than both minimal-input interactive methods and detailed-description non-interactive ones on CoIN-Bench, plus state-of-the-art on TextNav, with noticeably shorter user responses. Releasing the code is useful for anyone who wants to inspect the extraction and pruning steps. The approach is simple enough that the gains, if they hold, would matter for practical robot or assistant deployments where users dislike long clarifications. The main soft spot is the untested assumption that attribute extraction stays accurate and that each binary answer cleanly removes only the right subset without false negatives or lingering ambiguity. The abstract gives no error rates on extraction, no ablations on noisy answers, and no description of what happens when the pool empties or stays tied. If those steps introduce moderate noise, the claimed advantage over baselines would shrink. This work is aimed at researchers building interactive navigation agents or studying efficient disambiguation dialogues. Readers who care about reducing user burden in reference resolution will find the comparative-judgment loop worth examining. It deserves peer review because the algorithmic idea is concrete, the benchmarks are relevant, and the practical motivation is clear, even though more analysis of failure modes would help.

Referee Report

2 major / 1 minor

Summary. The paper proposes ProCompNav, a two-stage framework for instance navigation under ambiguous queries. It first builds a candidate pool from minimal user input and then iteratively extracts attribute-value pairs from the pool, poses binary yes/no questions, and prunes inconsistent candidates until the target is isolated. The authors claim higher Success Rate than both interactive baselines (same minimal input) and non-interactive baselines (detailed descriptions) on CoIN-Bench, substantially shorter Response Length, and state-of-the-art Success Rate on TextNav.

Significance. If the empirical results are robust, the comparative-judgment loop offers a principled way to reduce user burden in disambiguation tasks by replacing open-ended descriptions with pool-level discriminative questions. The public code release is a clear strength that enables direct reproduction and extension.

major comments (2)

[Experiments] Experiments section: the reported Success Rate gains and Response Length reductions on CoIN-Bench (and SOTA on TextNav) are presented without error bars, dataset statistics, ablation tables, or quantitative measurements of attribute-extraction accuracy; these omissions leave the central performance claims unsupported.
[Method] Method (§3): the framework rests on the assumption that attribute-value extraction is reliable and that each binary answer correctly and exhaustively prunes only inconsistent candidates without false negatives (discarding the target) or false positives (retaining distractors); no error-rate analysis, noisy-answer ablation, or fallback mechanism is described, which directly undermines the claimed advantage over baselines.

minor comments (1)

[Abstract] Abstract: the phrase 'substantially reducing Response Length' is not accompanied by the precise metric or the exact baseline values being compared.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our work. We address each major comment below and have made revisions to the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [Experiments] Experiments section: the reported Success Rate gains and Response Length reductions on CoIN-Bench (and SOTA on TextNav) are presented without error bars, dataset statistics, ablation tables, or quantitative measurements of attribute-extraction accuracy; these omissions leave the central performance claims unsupported.

Authors: We agree with the referee that the experimental results would benefit from additional statistical rigor and supporting analyses. In the revised version, we have included error bars for the Success Rate and Response Length metrics on CoIN-Bench, calculated across five independent runs with different random seeds. We have also added a table presenting key dataset statistics for both CoIN-Bench and TextNav. Furthermore, we now include an ablation table that evaluates the contribution of each component, including quantitative measurements of attribute-extraction accuracy using precision and recall on a validation set. These additions provide better support for the central performance claims. revision: yes
Referee: [Method] Method (§3): the framework rests on the assumption that attribute-value extraction is reliable and that each binary answer correctly and exhaustively prunes only inconsistent candidates without false negatives (discarding the target) or false positives (retaining distractors); no error-rate analysis, noisy-answer ablation, or fallback mechanism is described, which directly undermines the claimed advantage over baselines.

Authors: The referee correctly identifies that our framework relies on the reliability of attribute-value extraction and the correctness of the pruning process. To strengthen the method section, we have expanded §3 with a discussion of these assumptions. We now report the accuracy of the attribute extraction module on a held-out portion of the data. Additionally, we present results from a noisy-answer ablation study, where we introduce simulated errors in user responses at varying rates and measure the impact on success rate. We also describe a fallback strategy: if the candidate pool size does not decrease below a threshold after a fixed number of questions, the system requests a more detailed description from the user. These revisions address the potential issues of false negatives and false positives and better substantiate the advantages over baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: algorithmic procedure evaluated on external benchmarks

full rationale

The paper describes ProCompNav as a two-stage algorithmic procedure (candidate pool construction followed by iterative attribute-value extraction, binary questioning, and pruning) whose performance is measured against external benchmarks (CoIN-Bench, TextNav) and baselines. No equations, fitted parameters, or self-referential quantities appear in the provided text. Claims of improved Success Rate and reduced Response Length rest on empirical results rather than any derivation that reduces to its own inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked in a manner that creates circularity. The framework is self-contained as a method whose validity is externally falsifiable.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method relies on standard assumptions about attribute extraction accuracy and question-answering reliability in navigation agents; no new free parameters, physical entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

domain assumption Attribute-value pairs can be extracted from scene candidates with sufficient accuracy to enable reliable pool splitting.
The comparative judgment step depends on this extraction step being effective.

pith-pipeline@v0.9.0 · 6860 in / 1035 out tokens · 47663 ms · 2026-05-19T16:56:40.611075+00:00 · methodology

Review history (3 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

At each round, ProCompNav extracts an attribute-value pair that splits the current pool, asks a binary yes/no question, and prunes all inconsistent candidates at once.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We compute the pairwise similarity S(i,j) by averaging text and visual similarities... a∗t = arg max a∈A (Ei∈Gc [s(di,a)] − Ej∈Gr [s(dj,a)])

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 4 internal anchors

[1]

FirstName Alpher , title =

work page
[2]

Journal of Foo , volume = 13, number = 1, pages =

FirstName Alpher and FirstName Fotheringham-Smythe , title =. Journal of Foo , volume = 13, number = 1, pages =

work page
[3]

Journal of Foo , volume = 14, number = 1, pages =

FirstName Alpher and FirstName Fotheringham-Smythe and FirstName Gamow , title =. Journal of Foo , volume = 14, number = 1, pages =

work page
[4]

FirstName Alpher and FirstName Gamow , title =

work page
[5]

Computer Vision -- ECCV 2022 , year =

work page 2022
[6]

Conference on Robot Learning , pages=

Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners , author=. Conference on Robot Learning , pages=. 2023 , organization=

work page 2023
[7]

Advances in Neural Information Processing Systems , volume=

Introspective planning: Aligning robots' uncertainty with inherent task ambiguity , author=. Advances in Neural Information Processing Systems , volume=

work page
[8]

2022 International Conference on Robotics and Automation (ICRA) , pages=

Interactive robotic grasping with attribute-guided disambiguation , author=. 2022 International Conference on Robotics and Automation (ICRA) , pages=. 2022 , organization=

work page 2022
[9]

Conference on Robot Learning , pages=

Inner Monologue: Embodied Reasoning through Planning with Language Models , author=. Conference on Robot Learning , pages=. 2023 , organization=

work page 2023
[10]

2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=

Robotic task ambiguity resolution via natural language interaction , author=. 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2025 , organization=

work page 2025
[11]

arXiv preprint arXiv:2509.15061 , year=

Ask-to-Clarify: Resolving Instruction Ambiguity through Multi-turn Dialogue , author=. arXiv preprint arXiv:2509.15061 , year=

work page arXiv
[12]

IEEE Robotics and Automation Letters , volume=

Doro: Disambiguation of referred object for embodied agents , author=. IEEE Robotics and Automation Letters , volume=. 2022 , publisher=

work page 2022
[13]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Collaborative instance object navigation: Leveraging uncertainty-awareness to minimize human-agent dialogues , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page
[14]

Objectnav revisited: On evaluation of embodied agents navigating to objects.arXiv preprint arXiv:2006.13171, 2020

Objectnav revisited: On evaluation of embodied agents navigating to objects , author=. arXiv preprint arXiv:2006.13171 , year=

work page arXiv 2006
[15]

Advances in Neural Information Processing Systems , volume=

Personalized instance-based navigation toward user-specific objects in realistic environments , author=. Advances in Neural Information Processing Systems , volume=

work page
[16]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Unigoal: Towards universal zero-shot goal-oriented navigation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[17]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Tango: training-free embodied ai agents for open-world tasks , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page
[18]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

3D-mem: 3D scene memory for embodied exploration and reasoning , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page
[19]

Goat: Go to any thing,

Goat: Go to any thing , author=. arXiv preprint arXiv:2311.06430 , year=

work page arXiv
[20]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Goat-bench: A benchmark for multi-modal lifelong navigation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[21]

European Conference on Computer Vision , pages=

Prioritized semantic learning for zero-shot instance navigation , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024
[22]

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=

Hm3d-ovon: A dataset and benchmark for open-vocabulary object goal navigation , author=. 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2024 , organization=

work page 2024
[23]

2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=

Vlfm: Vision-language frontier maps for zero-shot semantic navigation , author=. 2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2024 , organization=

work page 2024
[24]

European conference on computer vision , pages=

Grounding dino: Marrying dino with grounded pre-training for open-set object detection , author=. European conference on computer vision , pages=. 2024 , organization=

work page 2024
[25]

DINOv2: Learning Robust Visual Features without Supervision

Dinov2: Learning robust visual features without supervision , author=. arXiv preprint arXiv:2304.07193 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[26]

International workshop on approximation algorithms for combinatorial optimization , pages=

Greedy approximation algorithms for finding dense components in a graph , author=. International workshop on approximation algorithms for combinatorial optimization , pages=. 2000 , organization=

work page 2000
[27]

International colloquium on automata, languages, and programming , pages=

On finding dense subgraphs , author=. International colloquium on automata, languages, and programming , pages=. 2009 , organization=

work page 2009
[28]

ACM Computing Surveys , volume=

A survey on the densest subgraph problem and its variants , author=. ACM Computing Surveys , volume=. 2024 , publisher=

work page 2024
[29]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Reimers, Nils and Gurevych, Iryna. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. 2019

work page 2019
[30]

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing , author=. arXiv preprint arXiv:2111.09543 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[31]

Political Analysis , volume=

Less annotating, more classifying: Addressing the data scarcity issue of supervised machine learning with deep transfer learning and bert-nli , author=. Political Analysis , volume=. 2024 , publisher=

work page 2024
[32]

Faster Segment Anything: Towards Lightweight SAM for Mobile Applications

Faster segment anything: Towards lightweight sam for mobile applications , author=. arXiv preprint arXiv:2306.14289 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[33]

arXiv preprint arXiv:2506.06487 , year=

Beliefmapnav: 3d voxel-based belief map for zero-shot object navigation , author=. arXiv preprint arXiv:2506.06487 , year=

work page arXiv
[34]

Instructnav: Zero-shot system for generic instruction navigation in unexplored environment.arXiv preprint arXiv:2406.04882, 2024

Instructnav: Zero-shot system for generic instruction navigation in unexplored environment , author=. arXiv preprint arXiv:2406.04882 , year=

work page arXiv
[35]

Advances in neural information processing systems , volume=

Sg-nav: Online 3d scene graph prompting for llm-based zero-shot object navigation , author=. Advances in neural information processing systems , volume=

work page
[36]

Findings of the Association for Computational Linguistics: NAACL 2024 , pages=

Openfmnav: Towards open-set zero-shot object navigation via vision-language foundation models , author=. Findings of the Association for Computational Linguistics: NAACL 2024 , pages=

work page 2024
[37]

Advances in Neural Information Processing Systems , volume=

Gamap: Zero-shot object goal navigation with multi-scale geometric-affordance guidance , author=. Advances in Neural Information Processing Systems , volume=

work page
[38]

V oronav: V oronoi-based zero-shot object navigation with large language model,

Voronav: Voronoi-based zero-shot object navigation with large language model , author=. arXiv preprint arXiv:2401.02695 , year=

work page arXiv
[39]

arXiv preprint arXiv:2410.09874 , year=

Imaginenav: Prompting vision-language models as embodied navigator through scene imagination , author=. arXiv preprint arXiv:2410.09874 , year=

work page arXiv
[40]

2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=

L3mvn: Leveraging large language models for visual target navigation , author=. 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2023 , organization=

work page 2023
[41]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Cows on pasture: Baselines and benchmarks for language-driven zero-shot object navigation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[42]

International Conference on Machine Learning , pages=

Esc: Exploration with soft commonsense constraints for zero-shot object navigation , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[43]

2011 International conference on computer vision , pages=

Relative attributes , author=. 2011 International conference on computer vision , pages=. 2011 , organization=

work page 2011
[44]

Transactions of the association for computational linguistics , volume=

Lost in the middle: How language models use long contexts , author=. Transactions of the association for computational linguistics , volume=

work page
[45]

The Fourteenth International Conference on Learning Representations (ICLR) , year=

Experience-based Knowledge Correction for Robust Planning in Minecraft , author=. The Fourteenth International Conference on Learning Representations (ICLR) , year=

work page
[46]

Qwen3-VL Technical Report

Qwen3-vl technical report , author=. arXiv preprint arXiv:2511.21631 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[47]

Proceedings of the NIPS workshop on cost-sensitive learning , volume=

Active learning with real annotation costs , author=. Proceedings of the NIPS workshop on cost-sensitive learning , volume=. 2008 , organization=

work page 2008
[48]

Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP , pages=

Semi-supervised active learning for sequence labeling , author=. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP , pages=

work page
[49]

arXiv preprint arXiv:2603.09506 , year=

Context-Nav: Context-Driven Exploration and Viewpoint-Aware 3D Spatial Reasoning for Instance Navigation , author=. arXiv preprint arXiv:2603.09506 , year=

work page arXiv

[1] [1]

FirstName Alpher , title =

work page

[2] [2]

Journal of Foo , volume = 13, number = 1, pages =

FirstName Alpher and FirstName Fotheringham-Smythe , title =. Journal of Foo , volume = 13, number = 1, pages =

work page

[3] [3]

Journal of Foo , volume = 14, number = 1, pages =

FirstName Alpher and FirstName Fotheringham-Smythe and FirstName Gamow , title =. Journal of Foo , volume = 14, number = 1, pages =

work page

[4] [4]

FirstName Alpher and FirstName Gamow , title =

work page

[5] [5]

Computer Vision -- ECCV 2022 , year =

work page 2022

[6] [6]

Conference on Robot Learning , pages=

Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners , author=. Conference on Robot Learning , pages=. 2023 , organization=

work page 2023

[7] [7]

Advances in Neural Information Processing Systems , volume=

Introspective planning: Aligning robots' uncertainty with inherent task ambiguity , author=. Advances in Neural Information Processing Systems , volume=

work page

[8] [8]

2022 International Conference on Robotics and Automation (ICRA) , pages=

Interactive robotic grasping with attribute-guided disambiguation , author=. 2022 International Conference on Robotics and Automation (ICRA) , pages=. 2022 , organization=

work page 2022

[9] [9]

Conference on Robot Learning , pages=

Inner Monologue: Embodied Reasoning through Planning with Language Models , author=. Conference on Robot Learning , pages=. 2023 , organization=

work page 2023

[10] [10]

2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=

Robotic task ambiguity resolution via natural language interaction , author=. 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2025 , organization=

work page 2025

[11] [11]

arXiv preprint arXiv:2509.15061 , year=

Ask-to-Clarify: Resolving Instruction Ambiguity through Multi-turn Dialogue , author=. arXiv preprint arXiv:2509.15061 , year=

work page arXiv

[12] [12]

IEEE Robotics and Automation Letters , volume=

Doro: Disambiguation of referred object for embodied agents , author=. IEEE Robotics and Automation Letters , volume=. 2022 , publisher=

work page 2022

[13] [13]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Collaborative instance object navigation: Leveraging uncertainty-awareness to minimize human-agent dialogues , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page

[14] [14]

Objectnav revisited: On evaluation of embodied agents navigating to objects.arXiv preprint arXiv:2006.13171, 2020

Objectnav revisited: On evaluation of embodied agents navigating to objects , author=. arXiv preprint arXiv:2006.13171 , year=

work page arXiv 2006

[15] [15]

Advances in Neural Information Processing Systems , volume=

Personalized instance-based navigation toward user-specific objects in realistic environments , author=. Advances in Neural Information Processing Systems , volume=

work page

[16] [16]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Unigoal: Towards universal zero-shot goal-oriented navigation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[17] [17]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Tango: training-free embodied ai agents for open-world tasks , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page

[18] [18]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

3D-mem: 3D scene memory for embodied exploration and reasoning , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page

[19] [19]

Goat: Go to any thing,

Goat: Go to any thing , author=. arXiv preprint arXiv:2311.06430 , year=

work page arXiv

[20] [20]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Goat-bench: A benchmark for multi-modal lifelong navigation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[21] [21]

European Conference on Computer Vision , pages=

Prioritized semantic learning for zero-shot instance navigation , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024

[22] [22]

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=

Hm3d-ovon: A dataset and benchmark for open-vocabulary object goal navigation , author=. 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2024 , organization=

work page 2024

[23] [23]

2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=

Vlfm: Vision-language frontier maps for zero-shot semantic navigation , author=. 2024 IEEE International Conference on Robotics and Automation (ICRA) , pages=. 2024 , organization=

work page 2024

[24] [24]

European conference on computer vision , pages=

Grounding dino: Marrying dino with grounded pre-training for open-set object detection , author=. European conference on computer vision , pages=. 2024 , organization=

work page 2024

[25] [25]

DINOv2: Learning Robust Visual Features without Supervision

Dinov2: Learning robust visual features without supervision , author=. arXiv preprint arXiv:2304.07193 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

International workshop on approximation algorithms for combinatorial optimization , pages=

Greedy approximation algorithms for finding dense components in a graph , author=. International workshop on approximation algorithms for combinatorial optimization , pages=. 2000 , organization=

work page 2000

[27] [27]

International colloquium on automata, languages, and programming , pages=

On finding dense subgraphs , author=. International colloquium on automata, languages, and programming , pages=. 2009 , organization=

work page 2009

[28] [28]

ACM Computing Surveys , volume=

A survey on the densest subgraph problem and its variants , author=. ACM Computing Surveys , volume=. 2024 , publisher=

work page 2024

[29] [29]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Reimers, Nils and Gurevych, Iryna. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. 2019

work page 2019

[30] [30]

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing , author=. arXiv preprint arXiv:2111.09543 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[31] [31]

Political Analysis , volume=

Less annotating, more classifying: Addressing the data scarcity issue of supervised machine learning with deep transfer learning and bert-nli , author=. Political Analysis , volume=. 2024 , publisher=

work page 2024

[32] [32]

Faster Segment Anything: Towards Lightweight SAM for Mobile Applications

Faster segment anything: Towards lightweight sam for mobile applications , author=. arXiv preprint arXiv:2306.14289 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[33] [33]

arXiv preprint arXiv:2506.06487 , year=

Beliefmapnav: 3d voxel-based belief map for zero-shot object navigation , author=. arXiv preprint arXiv:2506.06487 , year=

work page arXiv

[34] [34]

Instructnav: Zero-shot system for generic instruction navigation in unexplored environment.arXiv preprint arXiv:2406.04882, 2024

Instructnav: Zero-shot system for generic instruction navigation in unexplored environment , author=. arXiv preprint arXiv:2406.04882 , year=

work page arXiv

[35] [35]

Advances in neural information processing systems , volume=

Sg-nav: Online 3d scene graph prompting for llm-based zero-shot object navigation , author=. Advances in neural information processing systems , volume=

work page

[36] [36]

Findings of the Association for Computational Linguistics: NAACL 2024 , pages=

Openfmnav: Towards open-set zero-shot object navigation via vision-language foundation models , author=. Findings of the Association for Computational Linguistics: NAACL 2024 , pages=

work page 2024

[37] [37]

Advances in Neural Information Processing Systems , volume=

Gamap: Zero-shot object goal navigation with multi-scale geometric-affordance guidance , author=. Advances in Neural Information Processing Systems , volume=

work page

[38] [38]

V oronav: V oronoi-based zero-shot object navigation with large language model,

Voronav: Voronoi-based zero-shot object navigation with large language model , author=. arXiv preprint arXiv:2401.02695 , year=

work page arXiv

[39] [39]

arXiv preprint arXiv:2410.09874 , year=

Imaginenav: Prompting vision-language models as embodied navigator through scene imagination , author=. arXiv preprint arXiv:2410.09874 , year=

work page arXiv

[40] [40]

2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=

L3mvn: Leveraging large language models for visual target navigation , author=. 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pages=. 2023 , organization=

work page 2023

[41] [41]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Cows on pasture: Baselines and benchmarks for language-driven zero-shot object navigation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[42] [42]

International Conference on Machine Learning , pages=

Esc: Exploration with soft commonsense constraints for zero-shot object navigation , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023

[43] [43]

2011 International conference on computer vision , pages=

Relative attributes , author=. 2011 International conference on computer vision , pages=. 2011 , organization=

work page 2011

[44] [44]

Transactions of the association for computational linguistics , volume=

Lost in the middle: How language models use long contexts , author=. Transactions of the association for computational linguistics , volume=

work page

[45] [45]

The Fourteenth International Conference on Learning Representations (ICLR) , year=

Experience-based Knowledge Correction for Robust Planning in Minecraft , author=. The Fourteenth International Conference on Learning Representations (ICLR) , year=

work page

[46] [46]

Qwen3-VL Technical Report

Qwen3-vl technical report , author=. arXiv preprint arXiv:2511.21631 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[47] [47]

Proceedings of the NIPS workshop on cost-sensitive learning , volume=

Active learning with real annotation costs , author=. Proceedings of the NIPS workshop on cost-sensitive learning , volume=. 2008 , organization=

work page 2008

[48] [48]

Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP , pages=

Semi-supervised active learning for sequence labeling , author=. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP , pages=

work page

[49] [49]

arXiv preprint arXiv:2603.09506 , year=

Context-Nav: Context-Driven Exploration and Viewpoint-Aware 3D Spatial Reasoning for Instance Navigation , author=. arXiv preprint arXiv:2603.09506 , year=

work page arXiv