Collie: Systematic construction of constrained text generation tasks.arXiv preprint arXiv:2307.08689

Shunyu Yao, Howard Chen, Austin W Hanjie, Runzhe Yang, Karthik Narasimhan · 2023 · arXiv 2307.08689

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

SAGE: A Service Agent Graph-guided Evaluation Benchmark

cs.AI · 2026-04-10 · unverdicted · novelty 7.0

SAGE is a new multi-agent benchmark that formalizes service SOPs as dynamic dialogue graphs to measure LLM agents on logical compliance and path coverage, uncovering an execution gap and empathy resilience across 27 models in 6 scenarios.

Seed1.5-VL Technical Report

cs.CV · 2025-05-11 · unverdicted · novelty 4.0

Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.

Phoenix-VL 1.5 Medium Technical Report

cs.CL · 2026-05-11 · unverdicted · novelty 3.0

Phoenix-VL 1.5 Medium is a 123B-parameter natively multimodal model that reaches state-of-the-art results on Singapore multimodal, legal, and policy benchmarks after localized training on 1T+ tokens while staying competitive on global benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

SAGE: A Service Agent Graph-guided Evaluation Benchmark cs.AI · 2026-04-10 · unverdicted · none · ref 60
SAGE is a new multi-agent benchmark that formalizes service SOPs as dynamic dialogue graphs to measure LLM agents on logical compliance and path coverage, uncovering an execution gap and empathy resilience across 27 models in 6 scenarios.
Seed1.5-VL Technical Report cs.CV · 2025-05-11 · unverdicted · none · ref 162
Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.
Phoenix-VL 1.5 Medium Technical Report cs.CL · 2026-05-11 · unverdicted · none · ref 26
Phoenix-VL 1.5 Medium is a 123B-parameter natively multimodal model that reaches state-of-the-art results on Singapore multimodal, legal, and policy benchmarks after localized training on 1T+ tokens while staying competitive on global benchmarks.

Collie: Systematic construction of constrained text generation tasks.arXiv preprint arXiv:2307.08689

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer