Generalizability of Learning-based Occupancy Detection in Residential Buildings (extended version)

Albin Apell; Angela Fontan; Karl Henrik Johansson; Katayoun Eshkofti; Mahsa Farjadnia; Marco Molinari; Tilde Hjalmarsson

arxiv: 2604.14841 · v2 · submitted 2026-04-16 · 📡 eess.SY · cs.SY

Generalizability of Learning-based Occupancy Detection in Residential Buildings (extended version)

Mahsa Farjadnia , Katayoun Eshkofti , Albin Apell , Tilde Hjalmarsson , Karl Henrik Johansson , Angela Fontan , Marco Molinari This is my paper

Pith reviewed 2026-05-10 11:14 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords occupancy detectionenvironmental sensorsmachine learningLSTMgeneralizabilitycross-apartmentresidential buildingslogistic regression

0 comments

The pith

Machine learning models detect residential occupancy from environmental sensors with comparable accuracy that generalizes across apartments for LSTM.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares three machine learning approaches to detect whether an apartment is occupied using only data from temperature, humidity, CO2 and related environmental sensors. It measures performance both inside the apartment used for training and on data from completely separate apartments, plus additional data from a calibrated digital model. All models reach roughly 0.83 accuracy and 0.86 F1 score on same-apartment tests, yet the attention-enhanced LSTM holds steady at 0.84 accuracy and 0.85 F1 on cross-apartment tests while logistic regression remains competitive when generalization is not required. This matters because it shows a practical path to non-intrusive, lower-cost occupancy sensing that could function in varied homes without retraining from scratch each time.

Core claim

All three models achieve comparable performance on the same-apartment test data with accuracy of approximately 0.83 and F1 score of approximately 0.86. When assessed on cross-apartment data, the LSTM model demonstrates the strongest generalization capability with accuracy of 0.84 and F1 score of 0.85, while logistic regression provides a competitive, low-complexity alternative for applications that do not require cross-apartment generalization. Hyperparameters for SVM and LSTM are optimized via Bayesian optimization, and evaluation includes both real sensor data from the KTH Live-In Lab and data generated from its calibrated digital model.

What carries the argument

Cross-apartment evaluation of logistic regression, SVM, and attention-enhanced LSTM models on environmental sensor data from multiple apartments, supplemented by a calibrated digital model for testing.

If this is right

All models perform similarly when tested on data from the apartment used in training.
The LSTM model maintains its accuracy when applied to sensor data from apartments not included in training.
Logistic regression supplies a low-complexity option for single-apartment settings where cross-apartment generalization is unnecessary.
The accuracy versus sensor-count trade-off remains usable for practical deployment in residential buildings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Pre-trained LSTM models could be installed in new buildings with only light fine-tuning rather than full retraining.
Simulation data from calibrated digital models might reduce the amount of real labeled data needed before deployment.
Non-camera methods could support privacy-sensitive energy or security systems in rental or multi-tenant housing.
Testing the same pipeline in regions with different climate or building standards would reveal the limits of the reported generalization.

Load-bearing premise

The environmental sensor data and occupancy patterns collected from the KTH Live-In Lab apartments, together with the calibrated digital model, are representative of typical residential buildings.

What would settle it

Collecting occupancy labels and the same environmental sensor streams from apartments outside the KTH testbed that have different layouts, insulation, or resident schedules, then finding that all three models drop below 0.75 accuracy on that new data.

Figures

Figures reproduced from arXiv: 2604.14841 by Albin Apell, Angela Fontan, Karl Henrik Johansson, Katayoun Eshkofti, Mahsa Farjadnia, Marco Molinari, Tilde Hjalmarsson.

**Figure 2.** Figure 2: Boxplots of monitored variables in Apartment 2 during occupied and unoccupied periods. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of occupancy durations recorded in [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Correlation matrices of environmental variables [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Row-normalized confusion matrices for the models [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Testbed KTH model developed in IDA ICE. yˆ = 1 ˆy = 0 Predicted label y = 1 y = 0 True label LR: 100.0% SVM: 100.0% LSTM: 100.0% LR: 0.0% SVM: 0.0% LSTM: 0.0% LR: 62.6% SVM: 63.8% LSTM: 61.4% LR: 37.4% SVM: 36.2% LSTM: 38.6% [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Row-normalized confusion matrices for the models [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

read the original abstract

This paper investigates non-intrusive occupancy detection methods for residential buildings using environmental sensor data from the KTH Live-In Lab in Stockholm, Sweden. Three machine learning approaches, namely, logistic regression (LR), support vector machines (SVM), and long short-term memory (LSTM) network enhanced with an attention mechanism, are evaluated in terms of predictive performance and computational complexity. The analysis considers the trade-off between sensor availability (investment cost) and prediction accuracy in real applications, as well as the models' cross-apartment generalizability. Hyperparameters for both the SVM and LSTM models are optimized using Bayesian optimization. All three models are evaluated on data collected from apartments not used during training, and on data generated from a calibrated digital model of the testbed. Results show that all models achieve comparable performance on the same-apartment test data (accuracy of approximately 0.83, F1 score of approximately 0.86). When assessed on cross-apartment data, the LSTM model demonstrates the strongest generalization capability (accuracy of 0.84, F1 score of 0.85), while LR provides a competitive, low-complexity alternative for applications that do not require cross-apartment generalization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LSTM edges out on cross-apartment tests while LR stays competitive on same-apartment data, but both claims sit on a single Stockholm lab plus its internal digital twin.

read the letter

The main thing to know is that the three models land at roughly the same accuracy and F1 on held-out data from the training apartments, but LSTM with attention pulls ahead when the test data comes from the other apartments, while logistic regression remains a low-complexity option that does not require that extra generalization step. The paper also checks performance on data generated from a calibrated digital model of the same testbed and notes the usual trade-offs between sensor count and accuracy.

Referee Report

3 major / 2 minor

Summary. The paper evaluates three machine learning models—logistic regression (LR), support vector machines (SVM), and LSTM with attention—for non-intrusive occupancy detection in residential buildings using environmental sensor data (CO2, temperature, humidity) from the KTH Live-In Lab in Stockholm. It compares performance on same-apartment test data (reporting ~0.83 accuracy and ~0.86 F1 across models) versus cross-apartment generalization, where LSTM performs best (~0.84 accuracy, ~0.85 F1), and also tests on data from a calibrated digital model of the testbed. The work examines trade-offs with sensor availability and model complexity, using Bayesian optimization for hyperparameters.

Significance. If the empirical comparisons hold, the results provide practical guidance for choosing low-complexity models like LR when cross-building generalization is not required, while highlighting LSTM's relative strength for transfer across apartments. The use of both real sensor traces and a digital twin is a positive aspect for controlled testing, but the single-lab scope restricts claims about general residential buildings.

major comments (3)

[Results] The central generalizability claim (LSTM best on cross-apartment data) rests on tests across only three apartments in one Stockholm lab; no external datasets, different climate zones, or building types are evaluated, so the reported accuracy/F1 numbers may be testbed-specific rather than broadly applicable (see Results section and abstract performance claims).
[Methods] Details on experimental setup are insufficient to support the reported metrics: data preprocessing steps, exact train/test splits (e.g., how cross-apartment partitions were formed), sensor feature engineering, and cross-validation procedures are not fully specified, preventing reproduction or assessment of the ~0.83 same-apartment accuracy (see Methods/Experimental Setup section).
[Digital Model] The calibrated digital model is used to generate additional test data supporting the claims, but no quantitative calibration metrics (e.g., error between simulated and real sensor traces) or sensitivity analysis to parameters like ventilation rates are provided, which is load-bearing for using simulated data to bolster the generalization results (see Digital Model section).

minor comments (2)

[Abstract] The abstract states 'approximately 0.83' and 'approximately 0.86' without per-model breakdowns; adding a table of exact values for LR/SVM/LSTM would improve clarity.
[Introduction] Ensure consistent notation for model names and metrics (e.g., always spell out LSTM on first use in each section) and include references to prior occupancy detection surveys for context.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each of the major comments below, indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: The central generalizability claim (LSTM best on cross-apartment data) rests on tests across only three apartments in one Stockholm lab; no external datasets, different climate zones, or building types are evaluated, so the reported accuracy/F1 numbers may be testbed-specific rather than broadly applicable (see Results section and abstract performance claims).

Authors: We agree that the evaluation is limited to three apartments within the KTH Live-In Lab in Stockholm, and broader validation across different buildings and climates would be ideal. However, the manuscript's claims are grounded in this specific dataset and testbed, with the title and abstract referring to generalizability in the context of cross-apartment transfer within similar residential settings. The results highlight LSTM's advantage in this scenario, while noting LR's competitiveness for same-apartment use. We will revise the discussion section to more explicitly state the limitations of the single-lab scope and avoid overgeneralization in the abstract. No additional external data is available for this study. revision: partial
Referee: Details on experimental setup are insufficient to support the reported metrics: data preprocessing steps, exact train/test splits (e.g., how cross-apartment partitions were formed), sensor feature engineering, and cross-validation procedures are not fully specified, preventing reproduction or assessment of the ~0.83 same-apartment accuracy (see Methods/Experimental Setup section).

Authors: We appreciate this feedback and acknowledge that more details are needed for reproducibility. In the revised version, we will expand the Experimental Setup subsection to fully describe: (1) preprocessing steps including handling of missing data, outlier removal, and normalization; (2) the exact train/test split methodology, including how apartments were partitioned for cross-apartment evaluation (e.g., leave-one-apartment-out); (3) the feature engineering from CO2, temperature, and humidity sensors; and (4) the cross-validation approach used for hyperparameter tuning and performance estimation. This will allow readers to reproduce the approximately 0.83 accuracy on same-apartment data. revision: yes
Referee: The calibrated digital model is used to generate additional test data supporting the claims, but no quantitative calibration metrics (e.g., error between simulated and real sensor traces) or sensitivity analysis to parameters like ventilation rates are provided, which is load-bearing for using simulated data to bolster the generalization results (see Digital Model section).

Authors: We concur that providing quantitative calibration details would enhance the credibility of the digital model results. In the revision, we will include in the Digital Model section: quantitative metrics such as root mean square error (RMSE) and mean absolute error (MAE) between the simulated and real sensor traces for CO2, temperature, and humidity; and a sensitivity analysis discussing the impact of key parameters like ventilation rates on the simulated data. These additions will support the use of the model for testing generalization. revision: yes

Circularity Check

0 steps flagged

No significant circularity: purely empirical ML evaluation

full rationale

The paper performs an empirical comparison of three ML classifiers (LR, SVM, LSTM+attention) trained and tested on environmental sensor traces and occupancy labels from the KTH Live-In Lab apartments, plus data from a calibrated digital twin. Performance metrics (accuracy ~0.83, F1 ~0.86 on same-apartment; LSTM at 0.84/0.85 on cross-apartment) are reported directly from hold-out splits and cross-apartment transfers. No derivation chain, first-principles equations, fitted parameters renamed as predictions, or self-citation load-bearing premises exist; the claims rest on observable data splits and model outputs rather than any reduction to the inputs by construction. The representativeness concern is a validity issue, not a circularity issue.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The claims depend on the quality and representativeness of the experimental data from the specific testbed and the assumption that ML models trained on it can be evaluated for generalization.

free parameters (1)

Model hyperparameters
Optimized using Bayesian optimization for SVM and LSTM models to fit the specific dataset.

axioms (2)

domain assumption Sensor data from environmental sensors can be used to infer occupancy non-intrusively.
Core premise of the non-intrusive approach described in the abstract.
domain assumption The digital twin model accurately represents the physical apartments.
Invoked to generate additional test data for evaluation.

pith-pipeline@v0.9.0 · 5536 in / 1416 out tokens · 62269 ms · 2026-05-10T11:14:37.104638+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sent...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sent...

work page
[4]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[1] [1]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sent...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

, " * write output.state after.block = add.period write newline

ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sent...

work page

[4] [4]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page