pith. sign in

arxiv: 1906.10971 · v1 · pith:7BBJS5TQnew · submitted 2019-06-26 · 💻 cs.RO · cs.SY· eess.SY

NeuroTrajectory: A Neuroevolutionary Approach to Local State Trajectory Learning for Autonomous Vehicles

Pith reviewed 2026-05-25 15:50 UTC · model grok-4.3

classification 💻 cs.RO cs.SYeess.SY
keywords neuroevolutionautonomous drivingtrajectory planninggenetic algorithmsmulti-objective optimizationdeep neural networksstate estimation
0
0 comments X

The pith

Genetic algorithms train deep networks to output sequences of future vehicle states by optimizing a three-part fitness vector for path, lateral velocity and speed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method that evolves a population of perception-planning neural networks using genetic algorithms rather than gradient descent. Each candidate network is scored on a fitness vector whose three elements measure how well the predicted trajectory matches desired travel path, lateral velocity and longitudinal speed. The result is a Pareto front of networks that output an entire sequence of future states over a prediction horizon. This setup is intended to sidestep the single-objective loss used in standard backpropagation and the reward-shaping demands of deep reinforcement learning. The same network architecture can be trained on either synthetic or real driving sequences and is evaluated against a Dynamic Window Approach baseline and an End2End supervised learner.

Core claim

The desired state trajectory of the ego-vehicle is estimated over a finite prediction horizon by a perception-planning deep neural network trained with genetic algorithms on a multi-objective fitness vector composed of travel path, lateral velocity and longitudinal speed.

What carries the argument

Genetic-algorithm evolution of a population of deep neural networks evaluated on a three-element fitness vector to produce a Pareto front of trajectory estimators.

If this is right

  • The identical network structure works on both synthetic and real-world data sequences.
  • The output is a sequence of states that downstream motion controllers can use directly, rather than single-step actions.
  • Training avoids backpropagation on a single aggregated loss by maintaining a multi-objective Pareto front.
  • The method is positioned as an alternative to both decoupled perception-planning pipelines and End2End action mapping.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Because the output is a state sequence, the approach may integrate more cleanly with model-predictive controllers that already reason over future states.
  • Adding further objectives such as minimum safety distance to the fitness vector could be done without altering the overall training loop.
  • The Pareto front could support runtime switching among networks when traffic conditions change.

Load-bearing premise

Scoring networks solely on the three-element fitness vector of path, lateral velocity and longitudinal speed via genetic search will produce trajectory estimators that are practically better than those from single-objective gradient descent without creating new failure modes.

What would settle it

On a fixed autonomous-driving benchmark, if the neuroevolutionary networks produce higher collision or lane-departure rates than a well-tuned reinforcement-learning baseline trained on the same data, the claimed advantage collapses.

Figures

Figures reproduced from arXiv: 1906.10971 by Andrei Vasilcoi, Bogdan Trasnea, Liviu Marina, Sorin Grigorescu, Tiberiu Cocias.

Figure 1
Figure 1. Figure 1: From a modular pipeline to a perception-planning deep neural network approach for autonomous vehicles. Green symbolizes learning components. (a) Mapping sensors to actuators using a traditional pipeline. The output of each module provides input to the adjoining component. (b) Monolithic deep network for direct mapping of sensory data to control actions. (c) Perception-Planning deep neural network (our appr… view at source ↗
Figure 3
Figure 3. Figure 3: Examples of synthetic (a) GridSim and (b) real-world occupancy grids. The top images in each group shows a snapshot of the driving environment together with its respective OG and activations of the first convolutional layer of the deep neural network in [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: Local state trajectory estimation for autonomous driving. Given the current position of the ego-vehicle p <t> ego , a desired destina￾tion p <t+τo> dest and an input sequence of occupancy grids X <t−τi ,t> = [x <t−τi>,...,x <t>], the goal is to estimate a driving trajectory Y <t+1,t+τo> = [y <t+1>,...,y <t+τo>], where each element in the output sequence Y repre￾sents the desired position of the ego-vehicle… view at source ↗
Figure 4
Figure 4. Figure 4: Deep neural network architecture for estimating local driving trajectories. The training data and labels consists of synthetic (Xˆ <t−τi ,t>,Yˆ <t+1,t+τo>) or real-world (X <t−τi ,t>,Y <t+1,t+τo>) OG sequences, together with their future trajectory labels. Both synthetic and real-world OG streams are passed through a convolutional neural network, followed by two fully connected layers of 1024 and 512 units… view at source ↗
Figure 6
Figure 6. Figure 6: Mapping of solution vectors Θ from the decision space S to objective space L. Each solution Θ in decision space corresponds to a coordinate in objective space. The red marked coordinates are the set of Pareto optimal solutions Θ∗ for a multi-objective minimization problem, located on the Pareto front drawn with thick black line. 2D decision and objective space is shown in [PITH_FULL_IMAGE:figures/full_fig… view at source ↗
Figure 7
Figure 7. Figure 7: Evolution of the fitness vector during training. With each training generation, the traveled path decreases, while the longitudinal ve￾locity increases. The lateral velocity increases together with the longitudinal velocity, but with a much smaller gradient, meaning that the vehicle is learning to avoid hazardous motions and passenger discomfort, although the longitudinal velocity is high. The red dots sho… view at source ↗
Figure 8
Figure 8. Figure 8: Autonomous test vehicle used for real-world data acquisition. The car is equipped with a front Continental MFC430 camera, two front and rear Quanergy M8 Lidars and six front, rear and side Continental ARS430 radars. the weights by reaching maximum or minimum values in the sigmoid activation function. The new population is evaluated and the process repeats itself for a given number of training generations, … view at source ↗
Figure 10
Figure 10. Figure 10 [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗
Figure 9
Figure 9. Figure 9: GridSim evaluation routes. The virtual test field is defined on 30km2 of the Stockholm inner-city area (upper image). The ego-vehicle perceives the driving environment using an OG structure; the predicted trajectory is shown in yellow (lower image). As performance metric, we use the Root Mean Square Error (RMSE) between the estimated and a human driven trajectory in the 2D driving plane: RMSE = s 1 τo τo ∑… view at source ↗
Figure 11
Figure 11. Figure 11: Median and variance of RMSE for the three testing scenarios. The errors in simulation and highway driving are similar. The unstructured nature of inner-city driving introduces higher errors, as well as a higher RMSE variance. or End2End, would scale without any customizations on raw sensory information (e.g. video streams, radar, Lidar, etc.). Additionally, the jittering effect of End2End can be a side ef… view at source ↗
read the original abstract

Autonomous vehicles are controlled today either based on sequences of decoupled perception-planning-action operations, either based on End2End or Deep Reinforcement Learning (DRL) systems. Current deep learning solutions for autonomous driving are subject to several limitations (e.g. they estimate driving actions through a direct mapping of sensors to actuators, or require complex reward shaping methods). Although the cost function used for training can aggregate multiple weighted objectives, the gradient descent step is computed by the backpropagation algorithm using a single-objective loss. To address these issues, we introduce NeuroTrajectory, which is a multi-objective neuroevolutionary approach to local state trajectory learning for autonomous driving, where the desired state trajectory of the ego-vehicle is estimated over a finite prediction horizon by a perception-planning deep neural network. In comparison to DRL methods, which predict optimal actions for the upcoming sampling time, we estimate a sequence of optimal states that can be used for motion control. We propose an approach which uses genetic algorithms for training a population of deep neural networks, where each network individual is evaluated based on a multi-objective fitness vector, with the purpose of establishing a so-called Pareto front of optimal deep neural networks. The performance of an individual is given by a fitness vector composed of three elements. Each element describes the vehicle's travel path, lateral velocity and longitudinal speed, respectively. The same network structure can be trained on synthetic, as well as on real-world data sequences. We have benchmarked our system against a baseline Dynamic Window Approach (DWA), as well as against an End2End supervised learning method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces NeuroTrajectory, a multi-objective neuroevolutionary approach for local state trajectory learning in autonomous vehicles. A population of deep neural networks is trained via genetic algorithms to estimate a sequence of optimal ego-vehicle states over a finite prediction horizon; each network is evaluated on a three-element fitness vector (travel path, lateral velocity, longitudinal speed) to establish a Pareto front. The method is positioned as an alternative to decoupled perception-planning pipelines and to End2End or DRL systems, and is benchmarked against a Dynamic Window Approach (DWA) baseline and an End2End supervised learner. The same network structure is claimed to work on both synthetic and real-world data.

Significance. If the quantitative benchmarks hold and the learned trajectories remain safe under omitted constraints, the approach would demonstrate a practical advantage of Pareto-front GA training over single-objective gradient descent for multi-objective trajectory estimation, avoiding reward-shaping difficulties common in DRL. The explicit separation of perception-planning from low-level control via state-sequence output is a clear architectural contribution.

major comments (1)
  1. [Abstract and Section 3] Abstract and the description of the fitness vector (Section 3): performance is defined solely by the three-element vector of travel path, lateral velocity, and longitudinal speed. No term for obstacle clearance, collision avoidance, or kinematic feasibility appears. This omission is load-bearing for the central claim that the GA-evolved networks produce practically usable trajectory estimators, because trajectories can satisfy the reported fitness while violating real-world safety constraints—the exact failure mode flagged by the weakest assumption.
minor comments (1)
  1. [Abstract] The abstract supplies no quantitative error metrics, ablation results, or statistical significance tests, which makes it impossible to judge whether the data support the superiority claim over DWA and End2End.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the opportunity to strengthen the manuscript. The major comment raises a valid point about the fitness function, which we address below by agreeing to revise the relevant sections for clarity and completeness.

read point-by-point responses
  1. Referee: [Abstract and Section 3] Abstract and the description of the fitness vector (Section 3): performance is defined solely by the three-element vector of travel path, lateral velocity, and longitudinal speed. No term for obstacle clearance, collision avoidance, or kinematic feasibility appears. This omission is load-bearing for the central claim that the GA-evolved networks produce practically usable trajectory estimators, because trajectories can satisfy the reported fitness while violating real-world safety constraints—the exact failure mode flagged by the weakest assumption.

    Authors: We agree that the fitness vector, as currently described, consists only of the three elements (travel path, lateral velocity, longitudinal speed) and does not explicitly incorporate terms for obstacle clearance, collision avoidance, or kinematic feasibility. This is a limitation in the presentation of the method. The approach relies on training data (synthetic and real-world sequences) that are presumed to reflect feasible and safe trajectories, with the neuroevolutionary process optimizing within that distribution; safety is intended to be enforced by downstream modules in the overall autonomous driving pipeline rather than inside the fitness evaluation itself. Nevertheless, the referee correctly identifies that this assumption is not stated explicitly and could allow unsafe trajectories to score well on the reported objectives. In the revised manuscript we will (i) expand the description in Section 3 to articulate the assumptions about the training data and the separation of concerns between trajectory estimation and safety enforcement, (ii) add a discussion of kinematic feasibility and the potential need for post-processing or additional constraints, and (iii) qualify the claims in the abstract and introduction accordingly. These changes will be made without altering the core technical contribution. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation is self-contained

full rationale

The paper presents a neuroevolutionary training procedure in which a population of DNNs is evolved via genetic algorithms to optimize a three-element fitness vector (travel path, lateral velocity, longitudinal speed). The claimed output—a sequence of states over a prediction horizon—is the direct inference result of the trained network, not a quantity defined by or equivalent to the fitness vector itself. No equations, self-citations, uniqueness theorems, or ansatzes are invoked in the abstract or described method that would reduce the result to a renaming or re-fitting of the inputs. External baselines (DWA, End2End) are referenced for comparison, confirming the evaluation chain remains independent of the training fitness definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Assessment is limited to the abstract; no explicit free parameters, new axioms, or invented entities are described beyond standard assumptions of neuroevolution and neural network function approximation.

axioms (1)
  • domain assumption Deep neural networks can be effectively optimized by genetic algorithms on vector-valued fitness functions for control tasks.
    The training procedure rests on this established premise of neuroevolution.

pith-pipeline@v0.9.0 · 5836 in / 1203 out tokens · 26455 ms · 2026-05-25T15:50:12.525346+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 6 internal anchors

  1. [1]

    End-to-end race driving with deep reinforcement learning,

    M. Jaritz, R. Charette, M. Toromanoff, E. Perot, and F. Nashashibi, “End-to-end race driving with deep reinforcement learning,” in Int. Conf. on Robotics and Automation ICRA 2018 , Brisbane, Australia, 21-25 May 2018

  2. [2]

    End to End Learning for Self-Driving Cars

    M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba, “End to end learning for self-driving cars,” CoRR, vol. abs/1604.07316, 2016

  3. [3]

    Continuous control with deep reinforce- ment learning,

    T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y . Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforce- ment learning,” 2-4 May 2016

  4. [4]

    Continuous deep q-learning with model-based acceleration,

    S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, “Continuous deep q-learning with model-based acceleration,” in Int. Conf. on Machine Learning ICML 2016 , vol. 48, June 2016, pp. 2829–2838

  5. [5]

    Watch This: Scalable Cost-Function Learning for Path Planning in Urban Environments

    M. Wulfmeier, D. Z. Wang, and I. Posner, “Watch this: Scalable cost- function learning for path planning in urban environments,” CoRR, vol. abs/1607.02329, 2016

  6. [6]

    Intelligent land-vehicle model transfer trajectory planning method based on deep reinforcement learning,

    L. Yu, X. Shao, Y . Wei, and K. Zhou, “Intelligent land-vehicle model transfer trajectory planning method based on deep reinforcement learning,” Sensors, vol. 18, 09 2018

  7. [7]

    Combining Neural Networks and Tree Search for Task and Motion Planning in Challenging Environments

    C. Paxton, V . Raman, G. D. Hager, and M. Kobilarov, “Combining neural networks and tree search for task and motion planning in challenging environments,” CoRR, vol. abs/1703.07887, 2017

  8. [8]

    Grid path planning with deep reinforcement learning: Preliminary results,

    A. I. Panov, K. S. Yakovlev, and R. Suvorov, “Grid path planning with deep reinforcement learning: Preliminary results,” Procedia Computer Science, vol. 123, pp. 347 – 353, 2018, 8th Int. Conf. on Biologically Inspired Cognitive Architectures, BICA 2017

  9. [9]

    Probabilistic vehicle trajectory prediction over occupancy grid map via recurrent neural network,

    B. Kim, C. M. Kang, J. Kim, S. H. Lee, C. C. Chung, and J. W. Choi, “Probabilistic vehicle trajectory prediction over occupancy grid map via recurrent neural network,” in Int. Conf. on Intelligent Transportation Systems (ITSC) , Yokohama, Japan, 16-19 Oct. 2017

  10. [10]

    Sequence-to-Sequence Prediction of Vehicle Trajectory via LSTM Encoder-Decoder Architecture

    S. Park, B. Kim, C. M. Kang, C. C. Chung, and J. W. Choi, “Sequence-to-sequence prediction of vehicle trajectory via LSTM encoder-decoderarchitecture,” CoRR, vol. abs/1802.06338, 2018

  11. [11]

    Deep imitative mod- els for flexible inference, planning, and control,

    N. Rhinehart, R. McAllister, and S. Levine, “Deep imitative mod- els for flexible inference, planning, and control,” CoRR, vol. abs/1810.06544, 2018

  12. [12]

    Multi-objective optimisation using evolutionary algorithms: An introduction

    K. Deb, “Multi-objective optimisation using evolutionary algorithms: An introduction.” in Multi-objective Evolutionary Optimisation for Product Design and Manufacturing , L. Wang, A. H. C. Ng, and K. Deb, Eds. Springer, 2011, pp. 3–34

  13. [13]

    An Overview of Multi-Task Learning in Deep Neural Networks

    S. Ruder, “An overview of multi-task learning in deep neural net- works,” CoRR, vol. abs/1706.05098, 2017

  14. [14]

    A survey of motion planning and control techniques for self-driving urban vehicles,

    B. Paden, M. C ´ap, S. Z. Yong, D. S. Yershov, and E. Frazzoli, “A survey of motion planning and control techniques for self-driving urban vehicles,” IEEE Trans. on Intelligent Vehicles , vol. 1, no. 1, pp. 33–55, 2016

  15. [15]

    Deep Grid Net (DGN): A Deep Learning System for Real-Time Driving Context Understanding,

    L. Marina, B. Trasnea, T. Cocias, A. Vasilcoi, F. Moldoveanu, and S. Grigorescu, “Deep Grid Net (DGN): A Deep Learning System for Real-Time Driving Context Understanding,” in Int. Conf. on Robotic Computing IRC 2019 , Naples, Italy, 25-27 February 2019

  16. [16]

    Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

    F. P. Such, V . Madhavan, E. Conti, J. Lehman, K. O. Stanley, and J. Clune, “Deep neuroevolution: Genetic algorithms are a competi- tive alternative for training deep neural networks for reinforcement learning,” CoRR, vol. abs/1712.06567, 2018

  17. [17]

    The dynamic window approach to collision avoidance,

    D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach to collision avoidance,” Robotics Automation Magazine, IEEE, vol. 4, no. 1, pp. 23–33, Mar. 1997

  18. [18]

    Gridsim: A simulated vehicle kinematics engine for deep neuroevo- lutionary control in autonomous driving,

    B. Trasnea, L. Marina, A. Vasilcoi, C. Pozna, and S. Grigorescu, “Gridsim: A simulated vehicle kinematics engine for deep neuroevo- lutionary control in autonomous driving,” in Int. Conf. on Robotic Computing IRC 2019 , Naples, Italy, 25-27 February 2019