Segment the horses from the rest of the image and generate a new image where the horse regions are white and the other regions are black

Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron Courville, Behnam Neyshabur, Hanie Sedghi · 2022 · arXiv 2211.09066

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

cs.AI · 2025-06-07 · unverdicted · novelty 6.0

LRMs exhibit complete accuracy collapse beyond certain puzzle complexities, with reasoning effort rising then declining, outperforming standard LLMs only on medium-complexity tasks.

Teaching Large Language Models to Self-Debug

cs.CL · 2023-04-11 · unverdicted · novelty 6.0

Self-Debugging teaches LLMs to identify and fix their own code errors through rubber-duck-style natural language explanations and execution feedback, delivering 2-12% gains over baselines on Spider, TransCoder, and MBPP.

How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

Non-reasoning LLMs fail the equivalence class problem while reasoning LLMs perform better but remain incomplete, with difficulty peaking at phase transition for the former and maximum diameter for the latter.

A Survey on In-context Learning

cs.CL · 2022-12-31 · unverdicted · novelty 3.0

The paper surveys definitions, techniques, applications, and challenges in in-context learning for large language models.

When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models

cs.CL · 2026-05-01

citing papers explorer

Showing 5 of 5 citing papers.

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity cs.AI · 2025-06-07 · unverdicted · none · ref 17
LRMs exhibit complete accuracy collapse beyond certain puzzle complexities, with reasoning effort rising then declining, outperforming standard LLMs only on medium-complexity tasks.
Teaching Large Language Models to Self-Debug cs.CL · 2023-04-11 · unverdicted · none · ref 143
Self-Debugging teaches LLMs to identify and fix their own code errors through rubber-duck-style natural language explanations and execution feedback, delivering 2-12% gains over baselines on Spider, TransCoder, and MBPP.
How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem cs.AI · 2026-05-07 · unverdicted · none · ref 55
Non-reasoning LLMs fail the equivalence class problem while reasoning LLMs perform better but remain incomplete, with difficulty peaking at phase transition for the former and maximum diameter for the latter.
A Survey on In-context Learning cs.CL · 2022-12-31 · unverdicted · none · ref 13
The paper surveys definitions, techniques, applications, and challenges in in-context learning for large language models.
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models cs.CL · 2026-05-01 · unreviewed · ref 22

Segment the horses from the rest of the image and generate a new image where the horse regions are white and the other regions are black

fields

years

verdicts

representative citing papers

citing papers explorer