Cornetto is the first benchmark that synthesizes 231 network misconfiguration problems across topologies of 20-754 nodes and uses formal verification to show that nine state-of-the-art LLMs often introduce regressions and degrade at scale.
Large language models can be easily distracted by irrelevant context, 2023
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
GenericAgent outperforms other LLM agents on long-horizon tasks by maximizing context information density with fewer tokens via minimal tools, on-demand memory, trajectory-to-SOP evolution, and compression.
citing papers explorer
-
Benchmarking LLM-Driven Network Configuration Repair
Cornetto is the first benchmark that synthesizes 231 network misconfiguration problems across topologies of 20-754 nodes and uses formal verification to show that nine state-of-the-art LLMs often introduce regressions and degrade at scale.
-
GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0)
GenericAgent outperforms other LLM agents on long-horizon tasks by maximizing context information density with fewer tokens via minimal tools, on-demand memory, trajectory-to-SOP evolution, and compression.