pith. sign in

Collie: Systematic construction of constrained text generation tasks.arXiv preprint arXiv:2307.08689

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

background 1 method 1

citation-polarity summary

years

2026 2 2025 1

verdicts

UNVERDICTED 3

representative citing papers

SAGE: A Service Agent Graph-guided Evaluation Benchmark

cs.AI · 2026-04-10 · unverdicted · novelty 7.0

SAGE is a new multi-agent benchmark that formalizes service SOPs as dynamic dialogue graphs to measure LLM agents on logical compliance and path coverage, uncovering an execution gap and empathy resilience across 27 models in 6 scenarios.

Seed1.5-VL Technical Report

cs.CV · 2025-05-11 · unverdicted · novelty 4.0

Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.

Phoenix-VL 1.5 Medium Technical Report

cs.CL · 2026-05-11 · unverdicted · novelty 3.0

Phoenix-VL 1.5 Medium is a 123B-parameter natively multimodal model that reaches state-of-the-art results on Singapore multimodal, legal, and policy benchmarks after localized training on 1T+ tokens while staying competitive on global benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

  • SAGE: A Service Agent Graph-guided Evaluation Benchmark cs.AI · 2026-04-10 · unverdicted · none · ref 60

    SAGE is a new multi-agent benchmark that formalizes service SOPs as dynamic dialogue graphs to measure LLM agents on logical compliance and path coverage, uncovering an execution gap and empathy resilience across 27 models in 6 scenarios.

  • Seed1.5-VL Technical Report cs.CV · 2025-05-11 · unverdicted · none · ref 162

    Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.

  • Phoenix-VL 1.5 Medium Technical Report cs.CL · 2026-05-11 · unverdicted · none · ref 26

    Phoenix-VL 1.5 Medium is a 123B-parameter natively multimodal model that reaches state-of-the-art results on Singapore multimodal, legal, and policy benchmarks after localized training on 1T+ tokens while staying competitive on global benchmarks.