Triplet Feature Fusion for Equipment Anomaly Prediction : An Open-Source Methodology Using Small Foundation Models

Takato Yasuno

Authors on Pith no claims yet

classification 💻 cs.LG stat.ML

keywords tripletembeddingsequipmentmodelsopen-sourcepercentpipelineanomalies

read the original abstract

Predicting equipment anomalies before they escalate into failures is a critical challenge in industrial facility management. Existing approaches rely either on hand-crafted threshold rules, which lack generalizability, or on large neural models that are impractical for on-site, air-gapped deployments. We present an industrial methodology that resolves this tension by combining open-source small foundation models into a unified 1,116-dimensional Triplet Feature Fusion pipeline. This pipeline integrates: (1) statistical features (x in $R^{28}$) derived from 90-day sensor histories, (2) time-series embeddings (y in $R^{64}$) from a LoRA-adapted IBM Granite TinyTimeMixer (TTM, 133K parameters), and (3) multilingual text embeddings (z in $R^{1024}$) extracted from Japanese equipment master records via multilingual-e5-large. The concatenated triplet h = [x; y; z] is processed by a LightGBM classifier (< 3 MB) trained to predict anomalies at 30-, 60-, and 90-day horizons. All components use permissive open-source licenses (Apache 2.0 / MIT). The inference-time pipeline runs entirely on CPU in under 2 ms, enabling edge deployment on co-located hardware without cloud dependency. On a dataset of 64 HVAC units comprising 67,045 samples, the triplet model achieves Precision = 0.992, F1 = 0.958, and ROC-AUC = 0.998 at the 30-day horizon. Crucially, it reduces the False Positive Rate from 0.6 percent (baseline) to 0.1 percent - an 83 percent reduction attributable to equipment-type conditioning via text embedding z. Cluster analysis reveals that the embeddings align time-series signatures with distinct fault archetypes, explaining how compact multilingual representations improve discrimination without explicit categorical encoding.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Heterogeneous Variational Inference for Markov Degradation Hazard Models: Discretized Mixture with Interpretable Clusters
cs.LG 2026-04 unverdicted novelty 5.0

A discretized finite mixture model with ADVI identifies interpretable low- and high-risk clusters in Markov degradation hazard models for 280 industrial pumps, achieving 84x speedup over NUTS while enforcing stability...