{"work":{"id":"fa436f46-ec0a-4d2e-a0ff-e697def4a7be","openalex_id":null,"doi":null,"arxiv_id":"2010.03768","raw_key":null,"title":"ALFWorld: Aligning Text and Embodied Environments for Interactive Learning","authors":null,"authors_text":"Mohit Shridhar, Xingdi Yuan, Marc-Alexandre C\\^ot\\'e, Yonatan Bisk, Adam Trischler, Matthew Hausknecht","year":2020,"venue":"cs.CL","abstract":"Given a simple request like Put a washed apple in the kitchen fridge, humans can reason in purely abstract terms by imagining action sequences and scoring their likelihood of success, prototypicality, and efficiency, all without moving a muscle. Once we see the kitchen in question, we can update our abstract plans to fit the scene. Embodied agents require the same abilities, but existing work does not yet provide the infrastructure necessary for both reasoning abstractly and executing concretely. We address this limitation by introducing ALFWorld, a simulator that enables agents to learn abstract, text based policies in TextWorld (C\\^ot\\'e et al., 2018) and then execute goals from the ALFRED benchmark (Shridhar et al., 2020) in a rich visual environment. ALFWorld enables the creation of a new BUTLER agent whose abstract knowledge, learned in TextWorld, corresponds directly to concrete, visually grounded actions. In turn, as we demonstrate empirically, this fosters better agent generalization than training only in the visually grounded environment. BUTLER's simple, modular design factors the problem to allow researchers to focus on models for improving every piece of the pipeline (language understanding, planning, navigation, and visual scene understanding).","external_url":"https://arxiv.org/abs/2010.03768","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-22T22:55:12.238262+00:00","pith_arxiv_id":"2010.03768","created_at":"2026-05-09T05:45:22.212310+00:00","updated_at":"2026-06-05T21:23:00.469572+00:00","title_quality_ok":true,"display_title":"ALFWorld: Aligning Text and Embodied Environments for Interactive Learning","render_title":"ALFWorld: Aligning Text and Embodied Environments for Interactive Learning"},"hub":{"state":{"work_id":"fa436f46-ec0a-4d2e-a0ff-e697def4a7be","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":66,"external_cited_by_count":null,"distinct_field_count":9,"first_pith_cited_at":"2023-02-03T06:06:27+00:00","last_pith_cited_at":"2026-05-21T16:55:40+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-10T03:56:02.971776+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":12},{"context_role":"dataset","n":8},{"context_role":"method","n":2},{"context_role":"baseline","n":1}],"polarity_counts":[{"context_polarity":"background","n":12},{"context_polarity":"use_dataset","n":8},{"context_polarity":"baseline","n":1},{"context_polarity":"unclear","n":1},{"context_polarity":"use_method","n":1}],"runs":{"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T18:26:43.870083+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":14},{"title":"Group-in-Group Policy Optimization for LLM Agent Training","work_id":"bc65d492-e6ba-4522-874c-43d2f4fc5191","shared_citers":12},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":10},{"title":"Voyager: An Open-Ended Embodied Agent with Large Language Models","work_id":"ffe0d207-86cf-4742-a100-e988ac8b9676","shared_citers":10},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":8},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":8},{"title":"A-MEM: Agentic Memory for LLM Agents","work_id":"3b98feb2-fdb1-479a-bbe4-2c298a4592e2","shared_citers":7},{"title":"DAPO: An Open-Source LLM Reinforcement Learning System at Scale","work_id":"64019d00-0b11-4bbd-b173-b46c8fad0157","shared_citers":7},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":7},{"title":"AI2-THOR: An Interactive 3D Environment for Visual AI","work_id":"9c86ed28-ea70-424c-bd56-34f59dcad861","shared_citers":6},{"title":"AgentBench: Evaluating LLMs as Agents","work_id":"a37549b4-4c94-412d-acc4-4efeb08509be","shared_citers":5},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":5},{"title":"Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities","work_id":"008df105-2fdd-45d8-857a-8e35868aecb6","shared_citers":5},{"title":"Gemini: A Family of Highly Capable Multimodal Models","work_id":"83f7c85b-3f11-450f-ac0c-64d9745220b2","shared_citers":5},{"title":"Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory","work_id":"a5aed26c-a248-48b6-a59e-f7693fcb180a","shared_citers":5},{"title":"OpenAI o1 System Card","work_id":"68d3c334-0fc9-49e3-b7b0-a69afae933e2","shared_citers":5},{"title":"ReAct: Synergizing Reasoning and Acting in Language Models","work_id":"407a2351-25f1-497d-b611-f77d0292a8e6","shared_citers":5},{"title":"A Survey of Large Language Models","work_id":"de1b42b5-4a0a-4b1f-8c78-1f7fe21be6c9","shared_citers":4},{"title":"Do As I Can, Not As I Say: Grounding Language in Robotic Affordances","work_id":"037320f1-b0a9-4cbe-a639-bfb25409ce71","shared_citers":4},{"title":"RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning","work_id":"b96383ee-f8dc-471f-aba4-bc5ce9b0b632","shared_citers":4},{"title":"","work_id":"e26f5a00-c007-439d-83f6-7900f5687b6b","shared_citers":3},{"title":"$\\pi_0$: A Vision-Language-Action Flow Model for General Robot Control","work_id":"f790abdc-a796-482f-a40d-f8ee035ecfc2","shared_citers":3},{"title":"Advances in neural information processing systems , volume=","work_id":"c25e8154-fab2-455c-8a26-56e40aed5d2b","shared_citers":3},{"title":"arXiv preprint arXiv:2506.07398 , year=","work_id":"e5c4ec9c-3e43-42d1-a3d7-0deef9c44093","shared_citers":3}],"time_series":[{"n":2,"year":2024},{"n":30,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T18:29:26.215713+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T18:26:35.645539+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"ALFWorld: Aligning Text and Embodied Environments for Interactive Learning","claims":[{"claim_text":"Given a simple request like Put a washed apple in the kitchen fridge, humans can reason in purely abstract terms by imagining action sequences and scoring their likelihood of success, prototypicality, and efficiency, all without moving a muscle. Once we see the kitchen in question, we can update our abstract plans to fit the scene. Embodied agents require the same abilities, but existing work does not yet provide the infrastructure necessary for both reasoning abstractly and executing concretely. We address this limitation by introducing ALFWorld, a simulator that enables agents to learn abstr","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks ALFWorld: Aligning Text and Embodied Environments for Interactive Learning because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T18:29:55.965385+00:00"}},"summary":{"title":"ALFWorld: Aligning Text and Embodied Environments for Interactive Learning","claims":[{"claim_text":"Given a simple request like Put a washed apple in the kitchen fridge, humans can reason in purely abstract terms by imagining action sequences and scoring their likelihood of success, prototypicality, and efficiency, all without moving a muscle. Once we see the kitchen in question, we can update our abstract plans to fit the scene. Embodied agents require the same abilities, but existing work does not yet provide the infrastructure necessary for both reasoning abstractly and executing concretely. We address this limitation by introducing ALFWorld, a simulator that enables agents to learn abstr","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks ALFWorld: Aligning Text and Embodied Environments for Interactive Learning because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":14},{"title":"Group-in-Group Policy Optimization for LLM Agent Training","work_id":"bc65d492-e6ba-4522-874c-43d2f4fc5191","shared_citers":12},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":10},{"title":"Voyager: An Open-Ended Embodied Agent with Large Language Models","work_id":"ffe0d207-86cf-4742-a100-e988ac8b9676","shared_citers":10},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":8},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":8},{"title":"A-MEM: Agentic Memory for LLM Agents","work_id":"3b98feb2-fdb1-479a-bbe4-2c298a4592e2","shared_citers":7},{"title":"DAPO: An Open-Source LLM Reinforcement Learning System at Scale","work_id":"64019d00-0b11-4bbd-b173-b46c8fad0157","shared_citers":7},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":7},{"title":"AI2-THOR: An Interactive 3D Environment for Visual AI","work_id":"9c86ed28-ea70-424c-bd56-34f59dcad861","shared_citers":6},{"title":"AgentBench: Evaluating LLMs as Agents","work_id":"a37549b4-4c94-412d-acc4-4efeb08509be","shared_citers":5},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":5},{"title":"Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities","work_id":"008df105-2fdd-45d8-857a-8e35868aecb6","shared_citers":5},{"title":"Gemini: A Family of Highly Capable Multimodal Models","work_id":"83f7c85b-3f11-450f-ac0c-64d9745220b2","shared_citers":5},{"title":"Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory","work_id":"a5aed26c-a248-48b6-a59e-f7693fcb180a","shared_citers":5},{"title":"OpenAI o1 System Card","work_id":"68d3c334-0fc9-49e3-b7b0-a69afae933e2","shared_citers":5},{"title":"ReAct: Synergizing Reasoning and Acting in Language Models","work_id":"407a2351-25f1-497d-b611-f77d0292a8e6","shared_citers":5},{"title":"A Survey of Large Language Models","work_id":"de1b42b5-4a0a-4b1f-8c78-1f7fe21be6c9","shared_citers":4},{"title":"Do As I Can, Not As I Say: Grounding Language in Robotic Affordances","work_id":"037320f1-b0a9-4cbe-a639-bfb25409ce71","shared_citers":4},{"title":"RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning","work_id":"b96383ee-f8dc-471f-aba4-bc5ce9b0b632","shared_citers":4},{"title":"","work_id":"e26f5a00-c007-439d-83f6-7900f5687b6b","shared_citers":3},{"title":"$\\pi_0$: A Vision-Language-Action Flow Model for General Robot Control","work_id":"f790abdc-a796-482f-a40d-f8ee035ecfc2","shared_citers":3},{"title":"Advances in neural information processing systems , volume=","work_id":"c25e8154-fab2-455c-8a26-56e40aed5d2b","shared_citers":3},{"title":"arXiv preprint arXiv:2506.07398 , year=","work_id":"e5c4ec9c-3e43-42d1-a3d7-0deef9c44093","shared_citers":3}],"time_series":[{"n":2,"year":2024},{"n":30,"year":2026}],"dependency_candidates":[]},"authors":[]}}