Deep Multi-Task Learning for Anomalous Driving Detection Using CAN Bus Scalar Sensor Data

Dario Pompili; Teruhisa Misu; Vidyasagar Sadhu

arxiv: 1907.00749 · v1 · pith:GACZECQ2new · submitted 2019-06-28 · 💻 cs.LG · cs.CV· cs.RO· stat.ML

Deep Multi-Task Learning for Anomalous Driving Detection Using CAN Bus Scalar Sensor Data

Vidyasagar Sadhu , Teruhisa Misu , Dario Pompili This is my paper

Pith reviewed 2026-05-25 13:57 UTC · model grok-4.3

classification 💻 cs.LG cs.CVcs.ROstat.ML

keywords anomaly detectionmulti-task learningdriving dataCAN bussemi-supervised learningmaneuver classificationimbalanced data

0 comments

The pith

A multi-task model that classifies driving maneuvers as an auxiliary task improves anomaly detection on imbalanced real-world data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that adding maneuver classification as a second task in a neural network helps anomaly detection when normal driving situations are heavily imbalanced. A sympathetic reader would care because standard anomaly detectors struggle when some normal maneuvers occur as rarely as true anomalies, leaving corner cases undetected in safety-critical driving systems. The approach uses known maneuver labels to supply domain knowledge during training even though anomaly labels remain scarce. The authors test the idea on 150 hours of recorded CAN bus data and report gains over baseline detectors.

Core claim

The authors present a novel multi-task learning based approach that leverages domain-knowledge (maneuver labels) for anomaly detection in driving data and show improved performance over baseline approaches on 150 hours of real-world driving data.

What carries the argument

A shared deep feature extractor trained jointly on maneuver classification and anomaly detection heads.

If this is right

The model can flag anomalous driving more reliably when some normal maneuvers are rare.
Semi-supervised anomaly detection benefits from auxiliary supervision on known normal classes.
The same joint-training structure can be applied to other sensor streams collected during driving.
Detected anomalies can trigger separate planning modules in an autonomous vehicle stack.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If maneuver labels prove expensive to collect, future versions could replace them with cheaper weak labels or self-supervised signals.
The approach may transfer to other domains that contain many normal subclasses and few anomalies, such as network traffic or medical sensor streams.
Combining the detector with downstream planning would require showing that flagged anomalies actually lead to safer vehicle responses.

Load-bearing premise

Maneuver labels are available, accurate, and supply useful domain knowledge that multi-task learning can turn into better anomaly detection.

What would settle it

Train and test the same architecture on the identical driving data but with maneuver labels removed or replaced by random labels; if anomaly detection performance falls back to baseline levels, the claim holds.

Figures

Figures reproduced from arXiv: 1907.00749 by Dario Pompili, Teruhisa Misu, Vidyasagar Sadhu.

**Figure 2.** Figure 2: LSTM Autoencoder—the encoder cells encode the input data into a [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: (Top) After the model is trained, we fit the reconstruction errors to [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 7.** Figure 7: Scaled anomaly scores that leverages the maneuver predictions of [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 6.** Figure 6: Greedy symbol decoder for task B (maneuver predictor) in Fig. 1. [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 8.** Figure 8: Comparison of performance on test data between multi-task learning (our approach) and standalone autoencoder or symbol predictor: (a) [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

**Figure 9.** Figure 9: Reconstruction performance of turn data between multi-task learning (ours) and standalone autoencoder: (a) Left turn; (b) Right turn; (c) U-turn. [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

**Figure 10.** Figure 10: (Top) Comparison of eval data MSE reconstruction loss between [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

read the original abstract

Corner cases are the main bottlenecks when applying Artificial Intelligence (AI) systems to safety-critical applications. An AI system should be intelligent enough to detect such situations so that system developers can prepare for subsequent planning. In this paper, we propose semi-supervised anomaly detection considering the imbalance of normal situations. In particular, driving data consists of multiple positive/normal situations (e.g., right turn, going straight), some of which (e.g., U-turn) could be as rare as anomalous situations. Existing machine learning based anomaly detection approaches do not fare sufficiently well when applied to such imbalanced data. In this paper, we present a novel multi-task learning based approach that leverages domain-knowledge (maneuver labels) for anomaly detection in driving data. We evaluate the proposed approach both quantitatively and qualitatively on 150 hours of real-world driving data and show improved performance over baseline approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses multi-task learning with maneuver labels to address imbalance in anomaly detection for CAN bus driving data, but lacks details needed to verify the results.

read the letter

The paper's key move is to frame anomaly detection in driving data as a multi-task problem, where predicting the maneuver type helps the model handle the fact that some normal maneuvers are rare and look like anomalies. This is a sensible use of domain knowledge to tackle imbalance in semi-supervised settings for CAN bus scalar data. It does well at describing the practical issue in safety-critical driving AI and at collecting 150 hours of real CAN bus data for evaluation. That scale is a plus compared to many smaller studies. The soft spots are more significant. The abstract provides no information on the model architecture, the loss functions for the two tasks, how the data is split, the specific metrics, or any statistical significance. Without those, the claim of improved performance over baselines cannot be evaluated. The evaluation protocol is completely opaque from what is given. The approach also depends on having accurate maneuver labels, which the paper treats as given but may not be cheap or reliable in all settings. This work is aimed at researchers in machine learning for robotics and autonomous systems who deal with anomaly detection in time-series sensor data. Someone looking for ideas on incorporating domain knowledge into semi-supervised methods might find it useful, but only if the full paper fills in the gaps. It is worth sending to peer review so that the methods and results can be properly scrutinized.

Referee Report

1 major / 0 minor

Summary. The paper proposes a semi-supervised anomaly detection method for driving data that uses deep multi-task learning to incorporate maneuver labels as domain knowledge, addressing the challenge of imbalanced normal situations (e.g., rare maneuvers like U-turns being as infrequent as anomalies). It evaluates the approach quantitatively and qualitatively on 150 hours of real-world CAN bus scalar sensor data and claims improved performance over baseline approaches.

Significance. If the empirical results are robust, the work could meaningfully advance anomaly detection for safety-critical applications by showing how auxiliary domain-knowledge tasks can mitigate imbalance issues in real driving data. The focus on CAN bus scalar data and real-world collection adds practical value, though the absence of architecture, loss, split, metric, and statistical details in the abstract prevents full assessment of whether the central claim holds.

major comments (1)

[Abstract] Abstract: the central claim of 'improved performance over baseline approaches' on 150 hours of data is stated without any quantitative metrics, baseline definitions, data-split protocol, loss formulation, or statistical tests; this directly undermines the ability to evaluate the soundness of the multi-task improvement.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the abstract. We agree that the current abstract is too high-level and does not provide sufficient quantitative or methodological detail to allow immediate evaluation of the central claim. We will revise the abstract accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of 'improved performance over baseline approaches' on 150 hours of data is stated without any quantitative metrics, baseline definitions, data-split protocol, loss formulation, or statistical tests; this directly undermines the ability to evaluate the soundness of the multi-task improvement.

Authors: We agree with the referee that the abstract, in its present form, lacks the requested quantitative and methodological details. While the full manuscript (Sections 4 and 5) reports AUC/F1 scores, baseline definitions (e.g., isolation forest, autoencoder), the 70/15/15 temporal split, the multi-task loss formulation, and statistical significance via paired t-tests, these elements are not summarized in the abstract. In the revised manuscript we will expand the abstract to include the key performance deltas, baseline names, and a brief statement of the evaluation protocol so that the central claim can be assessed from the abstract alone. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central claim is an empirical result: a multi-task learning architecture that incorporates maneuver labels as auxiliary supervision improves semi-supervised anomaly detection on imbalanced CAN-bus data, demonstrated via quantitative and qualitative evaluation on 150 hours of real-world driving data against baselines. No derivation chain, first-principles prediction, or mathematical reduction is presented that collapses to fitted parameters or self-citations by construction. The approach is described as a novel application of existing multi-task techniques to domain-specific data; the assumption that maneuver labels are available is stated explicitly as an input rather than derived. This is the common case of a self-contained empirical ML paper with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Based on abstract only; the approach relies on domain assumptions about label availability but no explicit free parameters or invented entities are described.

free parameters (1)

Deep learning hyperparameters and architecture choices
Standard in neural network training; not specified in abstract.

axioms (1)

domain assumption Maneuver labels are available and accurate for training the multi-task model
Central to the proposed semi-supervised multi-task approach.

pith-pipeline@v0.9.0 · 5692 in / 1261 out tokens · 32544 ms · 2026-05-25T13:57:37.442748+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We present a novel multi-task learning based approach that leverages domain-knowledge (maneuver labels) for anomaly detection in driving data... convolutional bi-directional LSTM (Bi-LSTM) based autoencoder and a convolutional Bi-LSTM based sequence-to-sequence (seq2seq) symbol predictor
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the network is trained by minimizing the difference, |x−a|² ... overall loss LO = wA LA + wB LB + wR LR

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

V olvo to Release Level 4 Autonomous XC90 in 2021,

Digital Trends, “V olvo to Release Level 4 Autonomous XC90 in 2021,” https://www.digitaltrends.com/cars/volvo-xc-90-level-4- autonomy/, 2018

work page 2021
[2]

Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning,

V . Ramanishka, Y .-T. Chen, T. Misu, and K. Saenko, “Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2018

work page 2018
[3]

LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection,

P. Malhotra, A. Ramakrishnan, G. Anand, L. Vig, P. Agarwal, and G. Shroff, “LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection,” in Anomaly Detection Workshop, International Conference on Machine Learning (ICML) , New York, NY , USA, 2016

work page 2016
[4]

Contextual anomaly detection framework for big sensor data,

M. A. Hayes and M. A. Capretz, “Contextual anomaly detection framework for big sensor data,” Journal of Big Data , vol. 2, no. 1, p. 2, 12 2015

work page 2015
[5]

Fault detection analysis using data mining techniques for a cluster of smart ofﬁce buildings,

A. Capozzoli, F. Lauro, and I. Khan, “Fault detection analysis using data mining techniques for a cluster of smart ofﬁce buildings,” Expert Systems with Applications , vol. 42, no. 9, pp. 4324–4338, 6 2015

work page 2015
[6]

Introducing practical and robust anomaly detection in a time series,

Twitter, “Introducing practical and robust anomaly detection in a time series,” 2015. [Online]. Available: https://blog.twitter.com/engineering/en_us/a/2015/ introducing-practical-and-robust-anomaly-detection-in-a-time-series. html

work page 2015
[7]

RAD—Outlier Detection on Big Data,

Netﬂix, “RAD—Outlier Detection on Big Data,” http://techblog.netﬂix.com/2015/02/rad-outlier-detection-on-big- data.html, 2015

work page 2015
[8]

Real-time anomaly detection system for time series at scale,

M. Toledano, I. Cohen, Y . Ben-Simhon, and I. Tadeski, “Real-time anomaly detection system for time series at scale,” in Proceedings of the KDD: Workshop on Anomaly Detection in Finance , ser. Proceed- ings of Machine Learning Research, vol. 71, 2018, pp. 56–65

work page 2018
[9]

An ensemble learning framework for anomaly detection in building energy consumption,

D. B. Araya, K. Grolinger, H. F. ElYamany, M. A. Capretz, and G. Bit- suamlak, “An ensemble learning framework for anomaly detection in building energy consumption,” Energy and Buildings , vol. 144, pp. 191–206, 6 2017

work page 2017
[10]

Anomaly Detection in Automobile Control Network Data with Long Short-Term Memory Networks,

A. Taylor, S. Leblanc, and N. Japkowicz, “Anomaly Detection in Automobile Control Network Data with Long Short-Term Memory Networks,” in 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) . IEEE, 10 2016, pp. 130–139

work page 2016
[11]

Drive2Vec: Multiscale State-Space Embedding of Vehicular Sensor Data,

D. Hallac, S. Bhooshan, M. Chen, K. Abida, R. Sosic, and J. Leskovec, “Drive2Vec: Multiscale State-Space Embedding of Vehicular Sensor Data,” in 2018 21st International Conference on Intelligent Trans- portation Systems (ITSC) . IEEE, 11 2018, pp. 3233–3238

work page 2018
[12]

Long Short Term Memory Networks for Anomaly Detection in Time Series,

P. Malhotra, L. Vig, G. Shroff, and P. Agarwal, “Long Short Term Memory Networks for Anomaly Detection in Time Series,” in Eu- ropean Symposium on Artiﬁcial Neural Networks , Bruges Belgium, 2015

work page 2015
[13]

Adam: A Method for Stochastic Optimiza- tion,

D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimiza- tion,” in International Conference on Learning Representations , San Diego, CA, USA, 5 2015. 8

work page 2015

[1] [1]

V olvo to Release Level 4 Autonomous XC90 in 2021,

Digital Trends, “V olvo to Release Level 4 Autonomous XC90 in 2021,” https://www.digitaltrends.com/cars/volvo-xc-90-level-4- autonomy/, 2018

work page 2021

[2] [2]

Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning,

V . Ramanishka, Y .-T. Chen, T. Misu, and K. Saenko, “Toward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2018

work page 2018

[3] [3]

LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection,

P. Malhotra, A. Ramakrishnan, G. Anand, L. Vig, P. Agarwal, and G. Shroff, “LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection,” in Anomaly Detection Workshop, International Conference on Machine Learning (ICML) , New York, NY , USA, 2016

work page 2016

[4] [4]

Contextual anomaly detection framework for big sensor data,

M. A. Hayes and M. A. Capretz, “Contextual anomaly detection framework for big sensor data,” Journal of Big Data , vol. 2, no. 1, p. 2, 12 2015

work page 2015

[5] [5]

Fault detection analysis using data mining techniques for a cluster of smart ofﬁce buildings,

A. Capozzoli, F. Lauro, and I. Khan, “Fault detection analysis using data mining techniques for a cluster of smart ofﬁce buildings,” Expert Systems with Applications , vol. 42, no. 9, pp. 4324–4338, 6 2015

work page 2015

[6] [6]

Introducing practical and robust anomaly detection in a time series,

Twitter, “Introducing practical and robust anomaly detection in a time series,” 2015. [Online]. Available: https://blog.twitter.com/engineering/en_us/a/2015/ introducing-practical-and-robust-anomaly-detection-in-a-time-series. html

work page 2015

[7] [7]

RAD—Outlier Detection on Big Data,

Netﬂix, “RAD—Outlier Detection on Big Data,” http://techblog.netﬂix.com/2015/02/rad-outlier-detection-on-big- data.html, 2015

work page 2015

[8] [8]

Real-time anomaly detection system for time series at scale,

M. Toledano, I. Cohen, Y . Ben-Simhon, and I. Tadeski, “Real-time anomaly detection system for time series at scale,” in Proceedings of the KDD: Workshop on Anomaly Detection in Finance , ser. Proceed- ings of Machine Learning Research, vol. 71, 2018, pp. 56–65

work page 2018

[9] [9]

An ensemble learning framework for anomaly detection in building energy consumption,

D. B. Araya, K. Grolinger, H. F. ElYamany, M. A. Capretz, and G. Bit- suamlak, “An ensemble learning framework for anomaly detection in building energy consumption,” Energy and Buildings , vol. 144, pp. 191–206, 6 2017

work page 2017

[10] [10]

Anomaly Detection in Automobile Control Network Data with Long Short-Term Memory Networks,

A. Taylor, S. Leblanc, and N. Japkowicz, “Anomaly Detection in Automobile Control Network Data with Long Short-Term Memory Networks,” in 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) . IEEE, 10 2016, pp. 130–139

work page 2016

[11] [11]

Drive2Vec: Multiscale State-Space Embedding of Vehicular Sensor Data,

D. Hallac, S. Bhooshan, M. Chen, K. Abida, R. Sosic, and J. Leskovec, “Drive2Vec: Multiscale State-Space Embedding of Vehicular Sensor Data,” in 2018 21st International Conference on Intelligent Trans- portation Systems (ITSC) . IEEE, 11 2018, pp. 3233–3238

work page 2018

[12] [12]

Long Short Term Memory Networks for Anomaly Detection in Time Series,

P. Malhotra, L. Vig, G. Shroff, and P. Agarwal, “Long Short Term Memory Networks for Anomaly Detection in Time Series,” in Eu- ropean Symposium on Artiﬁcial Neural Networks , Bruges Belgium, 2015

work page 2015

[13] [13]

Adam: A Method for Stochastic Optimiza- tion,

D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimiza- tion,” in International Conference on Learning Representations , San Diego, CA, USA, 5 2015. 8

work page 2015