Chat2Workflow benchmark shows that state-of-the-art LLMs often grasp high-level intent for visual workflow generation but fail to produce correct, stable, executable outputs, with an agentic framework delivering only modest 5.34% gains.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
A 3B model with few-shot prompting reaches 79.7% of GPT-5 tool-use performance while a hypernetwork adaptation adds zero measurable benefit across four benchmarks.
Proposes autopoietic architectures for self-constructing software as a fundamental shift in the SDLC, leveraging foundation models for autonomous evolution and maintenance.
citing papers explorer
-
Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language
Chat2Workflow benchmark shows that state-of-the-art LLMs often grasp high-level intent for visual workflow generation but fail to produce correct, stable, executable outputs, with an agentic framework delivering only modest 5.34% gains.
-
Meta-Tool: Efficient Few-Shot Tool Adaptation for Small Language Models
A 3B model with few-shot prompting reaches 79.7% of GPT-5 tool-use performance while a hypernetwork adaptation adds zero measurable benefit across four benchmarks.
-
Towards Enabling An Artificial Self-Construction Software Life-cycle via Autopoietic Architectures
Proposes autopoietic architectures for self-constructing software as a fundamental shift in the SDLC, leveraging foundation models for autonomous evolution and maintenance.