HiMAP: Learning Heuristics-Informed Policies for Large-Scale Multi-Agent Pathfinding

Chuanbo Hua; Federico Berto; Huijie Tang; Jinkyoo Park; Kyuree Ahn; Zihan Ma

arxiv: 2402.15546 · v1 · pith:FTZXWCAQnew · submitted 2024-02-23 · 💻 cs.MA · cs.AI· cs.LG· cs.RO

HiMAP: Learning Heuristics-Informed Policies for Large-Scale Multi-Agent Pathfinding

Huijie Tang , Federico Berto , Zihan Ma , Chuanbo Hua , Kyuree Ahn , Jinkyoo Park This is my paper

classification 💻 cs.MA cs.AIcs.LGcs.RO

keywords mapfpathfindinghimaplearningmulti-agentscalabilityheuristicheuristics-informed

0 comments

read the original abstract

Large-scale multi-agent pathfinding (MAPF) presents significant challenges in several areas. As systems grow in complexity with a multitude of autonomous agents operating simultaneously, efficient and collision-free coordination becomes paramount. Traditional algorithms often fall short in scalability, especially in intricate scenarios. Reinforcement Learning (RL) has shown potential to address the intricacies of MAPF; however, it has also been shown to struggle with scalability, demanding intricate implementation, lengthy training, and often exhibiting unstable convergence, limiting its practical application. In this paper, we introduce Heuristics-Informed Multi-Agent Pathfinding (HiMAP), a novel scalable approach that employs imitation learning with heuristic guidance in a decentralized manner. We train on small-scale instances using a heuristic policy as a teacher that maps each single agent observation information to an action probability distribution. During pathfinding, we adopt several inference techniques to improve performance. With a simple training scheme and implementation, HiMAP demonstrates competitive results in terms of success rate and scalability in the field of imitation-learning-only MAPF, showing the potential of imitation-learning-only MAPF equipped with inference techniques.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

EvoNav: Evolutionary Reward Function Design for Robot Navigation with Large Language Models
cs.RO 2026-05 unverdicted novelty 6.0

EvoNav automates the design of reward functions for RL robot navigation by evolving LLM proposals through a three-stage cheap-to-expensive evaluation process and claims better policies than hand-crafted or prior autom...