Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST) , year =

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.

CoCoDA: Co-evolving Compositional DAG for Tool-Augmented Agents

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

CoCoDA co-evolves a typed compositional DAG of primitive and composite tools with the agent planner, using signature-based retrieval and a size-based reward to scale libraries efficiently and let an 8B model match or beat a 32B model on math and code benchmarks.

SCENE: Recognizing Social Norms and Sanctioning in Group Chats

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

SCENE is a new benchmark for testing LLMs on recognizing implicit social norms and adapting to sanctions in multi-party group chats.

citing papers explorer

Showing 3 of 3 citing papers.

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding cs.CV · 2026-05-11 · unverdicted · none · ref 36
EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.
CoCoDA: Co-evolving Compositional DAG for Tool-Augmented Agents cs.AI · 2026-05-08 · unverdicted · none · ref 29
CoCoDA co-evolves a typed compositional DAG of primitive and composite tools with the agent planner, using signature-based retrieval and a size-based reward to scale libraries efficiently and let an 8B model match or beat a 32B model on math and code benchmarks.
SCENE: Recognizing Social Norms and Sanctioning in Group Chats cs.CL · 2026-05-08 · unverdicted · none · ref 14
SCENE is a new benchmark for testing LLMs on recognizing implicit social norms and adapting to sanctions in multi-party group chats.

Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST) , year =

fields

years

verdicts

representative citing papers

citing papers explorer