OptiVerse is a new benchmark spanning neglected optimization domains that shows LLMs suffer sharp accuracy drops on hard problems due to modeling and logic errors, with a Dual-View Auditor Agent proposed to improve performance.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 4years
2026 4representative citing papers
TTS adapts speculator models online via target model verifications to improve acceptance lengths by up to 72% over prior methods, with gains increasing for longer generations.
Qwen-Scope provides open-source sparse autoencoders for Qwen models that function as practical interfaces for steering, evaluating, data workflows, and optimizing large language models.
citing papers explorer
-
OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving
OptiVerse is a new benchmark spanning neglected optimization domains that shows LLMs suffer sharp accuracy drops on hard problems due to modeling and logic errors, with a Dual-View Auditor Agent proposed to improve performance.
-
Test-Time Speculation
TTS adapts speculator models online via target model verifications to improve acceptance lengths by up to 72% over prior methods, with gains increasing for longer generations.
-
Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models
Qwen-Scope provides open-source sparse autoencoders for Qwen models that function as practical interfaces for steering, evaluating, data workflows, and optimizing large language models.
- Dual-Cluster Memory Agent: Resolving Multi-Paradigm Ambiguity in Optimization Problem Solving