Recognition: 2 theorem links
· Lean TheoremLLMs Know When They Know, but Do Not Act on It: A Metacognitive Harness for Test-time Scaling
Pith reviewed 2026-05-15 04:46 UTC · model grok-4.3
The pith
Large language models can use their own pre- and post-solution self-assessments to control inference and raise accuracy on reasoning tasks without any training or fine-tuning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Inspired by the Nelson-Narens theory of metacognition, the work demonstrates that LLMs possess latent metacognitive ability in the form of feeling-of-knowing signals before solving and judgment-of-learning signals after each attempt. By separating these monitoring signals from the reasoning process and using them to guide decisions on trust, retry with compact feedback, and aggregation, the metacognitive harness improves a fixed base model across diverse benchmarks without parameter updates.
What carries the argument
The metacognitive harness, which elicits pre-solve feeling-of-knowing (FOK) and post-solve judgment-of-learning (JOL) signals from the LLM and uses them as control inputs to decide trust, retry, or aggregate.
If this is right
- Substantially improves accuracy on text, code, and multimodal reasoning benchmarks using a fixed model.
- Raises pooled accuracy from 48.3 to 56.9 on public benchmark snapshots.
- Exceeds strongest leaderboard entries on HLE-Verified, LiveCodeBench v6, and R-Bench-V.
- Requires no parameter updates or benchmark-specific fine-tuning.
Where Pith is reading between the lines
- Similar harnesses could be applied to other base models to test if metacognitive signals are a general property of strong LLMs.
- The separation of monitoring from reasoning may enable more robust self-correction loops in future systems.
- Explicit control interfaces might unlock further test-time scaling beyond current single-pass or simple sampling methods.
Load-bearing premise
The feeling-of-knowing and judgment-of-learning signals produced by the LLM are reliable, consistent, and free from systematic bias so that they can effectively direct the control decisions.
What would settle it
Observing that the harness produces lower accuracy than the base model on a held-out set of problems where the self-monitoring signals do not correlate with actual success would falsify the claim that these signals can be harnessed effectively.
Figures
read the original abstract
Large language models (LLMs) often expose useful signals of self-monitoring: before solving a problem, they can estimate whether they are likely to succeed, and after solving it, they can judge whether their answer is likely to be correct. However, these signals are typically measured or elicited in isolation, rather than used to control inference. In this work, we ask whether LLMs possess latent metacognitive ability that can be turned into effective test-time control. Inspired by the Nelson--Narens theory from cognitive psychology, we propose a metacognitive harness that separates monitoring from reasoning. For each problem, the model first reports a pre-solve feeling-of-knowing (FOK) signal; after each solve attempt, it reports a post-solve judgment-of-learning (JOL) signal. Rather than treating these signals as passive confidence estimates, the harness turns them into an explicit control interface for reasoning: it decides when to trust the current solution, when to retry with compact metacognitive feedback, and when to pass multiple attempts to a final aggregator. Across text, code, and multimodal reasoning benchmarks, our harness substantially improves a fixed Claude Sonnet-4.6 base model without parameter updates or benchmark-specific fine-tuning. On the evaluated public benchmark snapshots, it raises pooled accuracy from 48.3 to 56.9 and exceeds the strongest listed leaderboard entries on the three primary evaluation settings: HLE-Verified, LiveCodeBench v6, and R-Bench-V. These results suggest that strong LLMs may already possess useful metacognitive ability, but require an explicit control harness to act on it during reasoning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a metacognitive harness, inspired by Nelson-Narens theory, that elicits pre-solve feeling-of-knowing (FOK) and post-solve judgment-of-learning (JOL) signals from an LLM to control test-time decisions: whether to trust a solution, retry with metacognitive feedback, or aggregate multiple attempts. Applied without parameter updates or fine-tuning to a fixed Claude Sonnet-4.6 model, the harness is reported to raise pooled accuracy from 48.3% to 56.9% across text, code, and multimodal benchmarks and to exceed listed leaderboard entries on HLE-Verified, LiveCodeBench v6, and R-Bench-V.
Significance. If the FOK/JOL signals demonstrably supply control value beyond equivalent multi-attempt compute, the result would show that strong LLMs already possess latent, actionable metacognitive monitoring that can be turned into a general test-time scaling mechanism without training. The parameter-free, cross-domain nature of the harness is a strength, but the current evidence does not yet isolate the metacognitive component from generic ensembling.
major comments (2)
- [Abstract] Abstract: the reported lift from 48.3% to 56.9% pooled accuracy is presented as evidence that the metacognitive harness supplies unique control value, yet no ablation is described against a signal-agnostic baseline that performs the same average number of model calls and applies the identical final aggregator; without this comparison the attribution to FOK/JOL control rather than multi-attempt scaling remains unsupported.
- [Abstract] Abstract and results: decision thresholds, prompt templates for eliciting FOK and JOL, and statistical controls (e.g., variance across runs, significance tests) are not specified, leaving the link between the elicited signals and the observed accuracy gains only partially supported and difficult to reproduce.
minor comments (1)
- [Abstract] Abstract: the three primary evaluation settings are named but no per-benchmark breakdown or error analysis is provided to show where the harness helps or hurts.
Simulated Author's Rebuttal
We thank the referee for the constructive and precise comments. The distinction between metacognitive control and generic multi-attempt scaling is central to the paper's claim, and we appreciate the opportunity to strengthen the evidence for it. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the reported lift from 48.3% to 56.9% pooled accuracy is presented as evidence that the metacognitive harness supplies unique control value, yet no ablation is described against a signal-agnostic baseline that performs the same average number of model calls and applies the identical final aggregator; without this comparison the attribution to FOK/JOL control rather than multi-attempt scaling remains unsupported.
Authors: We agree that the current manuscript does not contain a direct ablation against a compute-matched, signal-agnostic baseline, which leaves the unique contribution of the FOK/JOL-driven decisions incompletely isolated. In the revised manuscript we will add exactly this comparison: for each benchmark we will run the identical average number of model calls per problem, apply the same final aggregator, but replace the metacognitive decision logic (trust/retry/aggregate based on FOK and JOL) with either (a) random selection among attempts or (b) a fixed policy that always aggregates the maximum number of attempts. The resulting accuracy will be reported alongside the harness results. We expect the metacognitive policy to retain a measurable advantage; if it does not, we will revise the claims accordingly. revision: yes
-
Referee: [Abstract] Abstract and results: decision thresholds, prompt templates for eliciting FOK and JOL, and statistical controls (e.g., variance across runs, significance tests) are not specified, leaving the link between the elicited signals and the observed accuracy gains only partially supported and difficult to reproduce.
Authors: We acknowledge the omissions. In the revision we will add: (1) the complete prompt templates used to elicit FOK (pre-solve) and JOL (post-solve) in an appendix; (2) the exact numerical thresholds and decision rules that map the elicited signals to the actions trust/retry/aggregate, including how thresholds were selected; (3) standard deviations and full per-run accuracies across at least three independent executions for all reported figures; and (4) statistical significance tests (paired t-tests on accuracy and McNemar's test on per-problem correctness) comparing the harness to the base model and to the new signal-agnostic baseline. These additions will make the causal link between the metacognitive signals and the observed gains fully reproducible. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper describes an empirical procedure: elicit pre-solve FOK and post-solve JOL signals from a fixed LLM, then apply an external rule-based harness to decide trust/retry/aggregate. No equations, fitted parameters, or self-citations are load-bearing; the claimed accuracy lift (48.3 to 56.9) is presented as an observed outcome on public benchmarks rather than a quantity that reduces to the inputs by construction. The control logic is independent of the evaluation data and does not rename or smuggle in prior results.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs produce reliable feeling-of-knowing and judgment-of-learning signals when explicitly prompted
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the harness turns them into an explicit control interface for reasoning: it decides when to trust the current solution, when to retry with compact metacognitive feedback, and when to pass multiple attempts to a final aggregator
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Inspired by the Nelson–Narens theory from cognitive psychology, we propose a metacognitive harness that separates monitoring from reasoning
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Harness engineering: Leveraging codex in an agent-first world
Ryan Lopopolo. Harness engineering: Leveraging codex in an agent-first world. https: //openai.com/index/harness-engineering/, February 2026. OpenAI Engineering Blog. Accessed: 2026-05-04
work page 2026
-
[2]
Josh Achiam et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V . Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. InAdvances in Neural Information Processing Systems, volume 35, pages 24824–24837, 2022
work page 2022
-
[4]
Charlie Snell, Jaeho Lee, Kelvin Xu, and Aviral Kumar. Scaling llm test-time compute optimally can be more effective than scaling model parameters for reasoning. InInternational Conference on Learning Representations, 2025
work page 2025
-
[5]
Code Llama: Open Foundation Models for Code
Baptiste Rozière et al. Code llama: Open foundation models for code.arXiv preprint arXiv:2308.12950, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[6]
Solving Quantitative Reasoning Problems with Language Models
Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, and Vedant Misra. Solving quantitative reasoning problems with language models.arXiv preprint arXiv:2206.14858, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[7]
PAL: Program-aided Language Models
Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. Pal: Program-aided language models.arXiv preprint arXiv:2211.10435, 2023
work page Pith review arXiv 2023
-
[8]
React: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations, 2023
work page 2023
-
[9]
Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettle- moyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.arXiv preprint arXiv:2302.04761, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[10]
Effective harnesses for long-running agents
Anthropic. Effective harnesses for long-running agents. https://www.anthropic.com/ engineering/effective-harnesses-for-long-running-agents , November 2025. Anthropic Engineering Blog
work page 2025
-
[11]
The anatomy of an agent harness
Vivek Trivedy. The anatomy of an agent harness. https://www.langchain.com/blog/ the-anatomy-of-an-agent-harness, March 2026. LangChain Blog. 10
work page 2026
-
[12]
LangChain. Harness capabilities. https://docs.langchain.com/oss/python/ deepagents/harness, 2026. LangChain Documentation
work page 2026
-
[13]
Language Models (Mostly) Know What They Know
Saurav Kadavath, Thomas Conerly, Amanda Askell, Tom Henighan, Dawn Drain, Ethan Perez, Nicholas Schiefer, Zac Hatfield-Dodds, Nova DasSarma, Eli Tran-Johnson, et al. Language models (mostly) know what they know.arXiv preprint arXiv:2207.05221, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[14]
Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms
Miao Xiong, Zhixiong Hu, Xuming Lu, Yufei Li, Jiaxin Fu, Jiazhen He, and Bryan Hooi. Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms. In International Conference on Learning Representations, 2024
work page 2024
-
[15]
Metacognitive capabilities of llms: An exploration in mathematical problem solving
Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke, Siyuan Guo, Michal Valko, Timothy Lillicrap, Danilo Rezende, Yoshua Bengio, Michael Mozer, and Sanjeev Arora. Metacognitive capabilities of llms: An exploration in mathematical problem solving. InAdvances in Neural Information Processing Systems, volume 37, 2024
work page 2024
-
[16]
Fact-level confidence calibration and self-correction.arXiv preprint arXiv:2411.13343, 2024
Yifan Yuan, Bin Xu, Hao Tan, Fei Sun, Tong Xiao, Wen Li, Hua Shen, and Xueqi Cheng. Fact-level confidence calibration and self-correction.arXiv preprint arXiv:2411.13343, 2024
-
[17]
Thomas O. Nelson and Louis Narens. Metamemory: A theoretical framework and new findings. In Gordon H. Bower, editor,Psychology of Learning and Motivation, volume 26, pages 125–173. Academic Press, 1990
work page 1990
-
[18]
John H. Flavell. Metacognition and cognitive monitoring: A new area of cognitive- developmental inquiry.American Psychologist, 34(10):906–911, 1979
work page 1979
-
[19]
John T. Hart. Memory and the feeling-of-knowing experience.Journal of Educational Psychol- ogy, 56(4):208–216, 1965
work page 1965
-
[20]
Lynne M. Reder and Frank E. Ritter. What determines initial feeling of knowing? familiarity with question terms, not with the answer.Journal of Experimental Psychology: Learning, Memory, and Cognition, 18(3):435–451, 1992
work page 1992
-
[21]
Shimamura, editors.Metacognition: Knowing about Knowing
Janet Metcalfe and Arthur P. Shimamura, editors.Metacognition: Knowing about Knowing. MIT Press, 1994
work page 1994
-
[22]
On Verbalized Confidence Scores for LLMs
Dong Yang, Yu-Hsuan H. Tsai, and Makoto Yamada. On verbalized confidence scores for llms. arXiv preprint arXiv:2412.14737, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[23]
Tuning-free account- able intervention for llm deployment: A metacognitive approach
Zhen Tan, Jie Peng, Song Wang, Lijie Hu, Tianlong Chen, and Huan Liu. Tuning-free account- able intervention for llm deployment: A metacognitive approach. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 25237–25245, 2025
work page 2025
-
[24]
Decoupling metacognition from cognition: A framework for quantifying metacognitive ability in llms
Guoqing Wang, Wen Wu, Guangze Ye, Zhenxiao Cheng, Xi Chen, and Hong Zheng. Decoupling metacognition from cognition: A framework for quantifying metacognitive ability in llms. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 25353–25361, 2025
work page 2025
-
[25]
Large language models have intrinsic meta-cognition, but need a good lens
Ziyang Ma, Qingyue Yuan, Zhenglin Wang, and Deyu Zhou. Large language models have intrinsic meta-cognition, but need a good lens. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 3460–3477, 2025
work page 2025
-
[26]
Self-refine: Iterative refinement with self-feedback
Aman Madaan, Niket Tandon, Prakhar Gupta, Sean Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self-feedback. InAdvances in Neural Information Processing Systems, volume 36, 2023
work page 2023
-
[27]
Reflexion: Language agents with verbal reinforcement learning
Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. InAdvances in Neural Information Processing Systems, volume 36, 2023
work page 2023
-
[28]
Training Verifiers to Solve Math Word Problems
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021. 11
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[29]
Solving math word problems with process- and outcome-based feedback
Jonathan Uesato, Nate Kushman, Ramana Kumar, H. Francis Song, Noah Yamamoto Siegel, Lisa Wang, Antonia Creswell, Geoffrey Irving, and Irina Higgins. Solving math word problems with process- and outcome-based feedback.arXiv preprint arXiv:2211.14275, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[30]
Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. In International Conference on Learning Representations, 2024
work page 2024
- [31]
-
[32]
Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Fei-Fei Li, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, and Tatsunori Hashimoto. s1: Simple test-time scaling.arXiv preprint arXiv:2501.19393, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[33]
Chris Yuhao Liu, Liang Zeng, Yuzhen Xiao, Jujie He, Jiacai Liu, Chaojie Wang, Rui Yan, Wei Shen, Fuxiang Zhang, Jiacheng Xu, Yang Liu, and Yahui Zhou. Skywork-reward-v2: Scaling preference data curation via human-ai synergy.arXiv preprint arXiv:2507.01352, 2025
-
[34]
Weiyun Wang, Zhangwei Gao, Lianjie Chen, Zhe Chen, Jinguo Zhu, Xiangyu Zhao, Yangzhou Liu, Yue Cao, Shenglong Ye, Xizhou Zhu, Lewei Lu, Haodong Duan, Yu Qiao, Jifeng Dai, and Wenhai Wang. Visualprm: An effective process reward model for multimodal reasoning.arXiv preprint arXiv:2503.10291, 2025
-
[35]
Learning to reason across parallel samples for llm reasoning.arXiv preprint arXiv:2506.09014, 2025
Jianing Qi, Xi Ye, Hao Tang, Zhigang Zhu, and Eunsol Choi. Learning to reason across parallel samples for llm reasoning.arXiv preprint arXiv:2506.09014, 2025
-
[36]
Anthropic. Claude sonnet 4.6. https://www.anthropic.com/claude/sonnet, 2026. Ac- cessed 2026-05-03
work page 2026
-
[37]
Anthropic. Claude sonnet 4.6 system card. https://www.anthropic.com/ claude-sonnet-4-6-system-card, February 2026. Accessed 2026-05-03
work page 2026
-
[38]
OpenAI. All models. https://developers.openai.com/api/docs/models/all, 2026. Accessed 2026-05-03
work page 2026
-
[39]
OpenAI. Gpt-5.2 model. https://developers.openai.com/api/docs/models/gpt-5. 2, 2026. Accessed 2026-05-03
work page 2026
-
[40]
OpenAI. Gpt-5 mini model. https://developers.openai.com/api/docs/models/ gpt-5-mini, 2026. Accessed 2026-05-03
work page 2026
- [41]
-
[42]
OpenAI. o4-mini model. https://developers.openai.com/api/docs/models/ o4-mini, 2026. Accessed 2026-05-03
work page 2026
-
[43]
Anthropic. Claude opus 4.6 system card. https://www.anthropic.com/ claude-opus-4-6-system-card, February 2026. Accessed 2026-05-03
work page 2026
-
[44]
Anthropic. Model system cards. https://www.anthropic.com/system-cards, 2026. Accessed 2026-05-03
work page 2026
-
[45]
Google AI for Developers. Gemini 3 developer guide. https://ai.google.dev/ gemini-api/docs/gemini-3, 2026. Accessed 2026-05-03
work page 2026
-
[46]
Google AI for Developers. Gemini 2.5 pro. https://ai.google.dev/gemini-api/docs/ models/gemini-2.5-pro, 2026. Accessed 2026-05-03
work page 2026
-
[47]
Google AI for Developers. Gemini api models. https://ai.google.dev/gemini-api/ docs/models, 2026. Accessed 2026-05-03. 12
work page 2026
-
[48]
Qwen Team. Qwen2.5-vl-72b-instruct. https://huggingface.co/Qwen/Qwen2. 5-VL-72B-Instruct, 2025. Accessed 2026-05-03
work page 2025
-
[49]
Qwen2.5-vl.https://qwenlm.github.io/blog/qwen2.5-vl/, January 2025
Qwen Team. Qwen2.5-vl.https://qwenlm.github.io/blog/qwen2.5-vl/, January 2025. Accessed 2026-05-03
work page 2025
-
[50]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report. a...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[51]
FOK Metacognitive assessor; forbidden from solving {problem, image} {domain, FOK, FOK_reason}
-
[52]
Solve Same agent, now solving {problem, image, FOK, FOK_reason} {reasoning, answer}
-
[53]
JOL Same agent, post-hoc self- rating {problem, image, FOK, FOK_reason, own reasoning, own answer} (all in-call) {JOL_score, JOL_reason}
-
[54]
Retryk Same agent, instructed to try a different method {problem, image, FOK, FOK_reason, history of (an- swer, JOL, JOL_reason)}without previous reasoning chains new {reasoning, answer, JOL_score, JOL_reason}
-
[55]
try a DIFFERENT approach and address the concerns raised in previous JOL reasons
Select Independent judge agent; forbidden from producing new answers {problem, image, shuffled list of (an- swer, reasoning) for all attempts} {selected_index, justifica- tion} A few design choices are worth highlighting: • No reasoning leakage from Stage 1 to Stage 2.The FOK_reason from Stage 1 is short and intuition-level by construction (the system pro...
-
[56]
1424–1425) and guesses the province where those campaigns were concentrated
Attempt 1: Nghe An (incorrect).The solver assumes the defeat occurred during the late Lam Son uprising (c. 1424–1425) and guesses the province where those campaigns were concentrated
-
[57]
Attempt 2: Nam Dinh (correct).The solver re-anchors on theearlierphase of the occupation, identifies the Battle of Bo Co (1408) as Mu Sheng’s first major defeat, and correctly locates the engagement in the coastal Red-River-delta area corresponding to modern Nam Dinh
-
[58]
Attempt 3: Ninh Binh (incorrect).Now the solver fixes on the Battle of Bo Co but reasons geographicallyfrom the Day River estuary, which it places in Ninh Binh
-
[59]
Attempt 4: Thai Binh (incorrect).The solver explicitly notes that the previous three attempts disagreed and tries yet another distributary of the Red River delta. Aggregation and outcome.Because the four candidates are split four ways, string-consensus does not fire and the question is not a code task, so the hybrid aggregator falls through to the select-...
-
[60]
Attempt 1: Req=4R (incorrect).The solver misreads the diagram as a 4×2 grid of diamond cells and applies a series-of-bridges decomposition. The post-solve JOL (0.72) isbelowthe high-confidence regime, signalling that the topology assumption is shaky
-
[61]
Post- solve JOL rises sharply, the retry signal drops belowτ, and the harness stops
Attempt 2: Req=2R (correct).The solver re-counts and now reads a 3×2 diamond lattice, exploits top–bottom symmetry to merge equipotential nodes, and computes Req=2R. Post- solve JOL rises sharply, the retry signal drops belowτ, and the harness stops. Outcome.Final answer2R, correct, two attempts. Takeaway.This is the directed-retry regime: the controller ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.