pith. sign in

hub

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

hub tools

citation-role summary

background 4

citation-polarity summary

years

2026 11

roles

background 4

polarities

background 4

representative citing papers

EO-Gym: A Multimodal, Interactive Environment for Earth Observation Agents

cs.AI · 2026-05-02 · unverdicted · novelty 7.0

EO-Gym supplies an executable multimodal environment and 9k-trajectory benchmark that turns Earth Observation into a tool-using, multi-step reasoning task, revealing that current VLMs struggle on temporal and cross-sensor workflows while fine-tuning lifts Pass@3 from 0.49 to 0.74.

PersonalHomeBench: Evaluating Agents in Personalized Smart Homes

cs.AI · 2026-04-18 · unverdicted · novelty 7.0

PersonalHomeBench is a new benchmark showing that AI agents suffer systematic performance drops in personalized smart homes as task complexity rises, especially in counterfactual reasoning and partial observability.

Sema: Semantic Transport for Real-Time Multimodal Agents

cs.MM · 2026-04-22 · unverdicted · novelty 5.0

Sema reduces uplink bandwidth by 64x for audio and 130-210x for screenshots while keeping multimodal agent task accuracy within 0.7 percentage points of raw baselines in WAN simulations.

citing papers explorer

Showing 11 of 11 citing papers.