Dywave: Event-Aligned Dynamic Tokenization for Heterogeneous IoT Sensing Signals

Denizhan Kara; Hongjue Zhao; Jinyang Li; Shengzhong Liu; Tarek Abdelzaher; Tomoyoshi Kimura; Xiaomin Ouyang; Yigong Hu; Yizhuo Chen

arxiv: 2605.14014 · v2 · pith:3XSV7YHWnew · submitted 2026-05-13 · 💻 cs.LG · cs.AI

Dywave: Event-Aligned Dynamic Tokenization for Heterogeneous IoT Sensing Signals

Tomoyoshi Kimura , Denizhan Kara , Jinyang Li , Hongjue Zhao , Yigong Hu , Yizhuo Chen , Xiaomin Ouyang , Shengzhong Liu

show 1 more author

Tarek Abdelzaher

This is my paper

Pith reviewed 2026-05-20 20:31 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords dynamic tokenizationwavelet decompositionIoT sensing signalsevent alignmentsequence modelsactivity recognitioncomputational efficiencytemporal boundaries

0 comments

The pith

Dywave aligns IoT sensing tokens to semantic events via wavelet decomposition, cutting token lengths by up to 75% while raising accuracy by up to 12%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents Dywave, a dynamic tokenization method designed for the non-stationary and multi-scale signals collected by IoT sensors. Standard fixed or uniform tokenization often fails to respect the natural temporal structures and physical events in these signals, which limits accuracy in tasks such as activity recognition and stress assessment. Dywave instead uses wavelet-based hierarchical decomposition to locate meaningful boundaries, then compresses redundant intervals while keeping temporal coherence intact. The result is shorter input sequences that still carry the essential information, leading to measurable gains in both performance and speed when fed to mainstream sequence models. A sympathetic reader would care because IoT deployments generate continuous heterogeneous data streams, and more efficient tokenization could make real-time analysis practical on devices with limited compute and power.

Core claim

Dywave constructs compact input representations aligned with intrinsic temporal structures and underlying physical events. It leverages wavelet-based hierarchical decomposition to identify meaningful temporal boundaries corresponding to underlying semantic events and adaptively compresses redundant intervals while preserving temporal coherence. Evaluations across five real-world IoT sensing datasets for activity recognition, stress assessment, and nearby object detection show that the resulting tokens improve accuracy by up to 12% and reduce input lengths by up to 75% when used with mainstream sequence models, while also increasing robustness to domain shifts and varying sequence lengths.

What carries the argument

Wavelet-based hierarchical decomposition that detects event-aligned temporal boundaries in heterogeneous sensing signals to enable adaptive compression of redundant intervals.

If this is right

Mainstream sequence models achieve up to 12% higher accuracy on activity recognition, stress assessment, and object detection tasks.
Input sequences shrink by up to 75%, directly lowering memory and compute requirements during inference.
The tokenization remains effective under domain shifts and across sequences of different lengths.
Gains appear consistently across multiple real-world IoT datasets and several standard sequence architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The unsupervised boundary detection could extend to other non-stationary time series such as audio or environmental monitoring streams.
Shorter tokenized inputs would lower energy use on battery-powered edge devices that run continuous sensing.
The same alignment principle might be combined with learned boundary predictors for cases where wavelets alone miss subtle events.
Large-scale unlabeled IoT collections could adopt this tokenization without the cost of event annotation.

Load-bearing premise

Wavelet-based hierarchical decomposition can reliably locate temporal boundaries that correspond to semantic events in the signals without any task-specific supervision or labeled annotations.

What would settle it

An IoT dataset in which the boundaries found by wavelet decomposition show no correspondence to actual changes in the underlying physical process, so that the dynamic tokens yield accuracy no higher than standard fixed-length tokenization.

Figures

Figures reproduced from arXiv: 2605.14014 by Denizhan Kara, Hongjue Zhao, Jinyang Li, Shengzhong Liu, Tarek Abdelzaher, Tomoyoshi Kimura, Xiaomin Ouyang, Yigong Hu, Yizhuo Chen.

**Figure 1.** Figure 1: Ego4D (HAR) raw signal examples. Signal events are manually annotated with red bounding boxes. using IMU signals, brief motion gestures (e.g., waving) may occur within a second, while complex activities (e.g., walking) can span tens of seconds and vary in intensity. Moreover, real-world signals exhibit highly irregular information density, with quiescent intervals alternating with short bursts of salient … view at source ↗

**Figure 2.** Figure 2: Overview of Dywave. ferent users produce signals that vary greatly in temporal structure and intensity. To illustrate this variability, Figure 1 visualizes 30-second accelerometer samples from the Ego4D human activity recognition dataset (Grauman et al., 2022), comparing signals of cleaning activity across users and time periods, as well as the reading activity. Even within the same activity, signal patte… view at source ↗

**Figure 3.** Figure 3: Short-context performance vs. different parameters. is a fixed-length time-series segment used for short- and long-context classification. Baselines. We consider 5 baselines compatible with various backbones: PatchTST (Nie et al., 2023), DropPatch (Qiu et al., 2025), MedFormer (Wang et al., 2024), WaveToken (Masserano et al., 2025), and MultiPatch (Naghashi et al., 2025). We evaluate them using two sequen… view at source ↗

**Figure 4.** Figure 4: Short-context Accuracy vs. Token with the Transformer encoder. Accuracy F1 Score PAMAP2 0.65 0.70 0.75 0.80 0.85 Accuracy F1 Score RWHAR 0.70 0.75 0.80 0.85 0.90 PatchTST Acc Dywave Acc PatchTST Gyr Dywave Gyr PatchTST Dywave [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Multimodal classification accuracy. eters, requiring extensive grid search, while Dywave uses learnable, instance-specific segmentation. Moreover, Wavetoken performs notably worse than other baselines. Discretizing the input into quantized token IDs appears ill-suited for high-frequency sensing data with rich dynamics, as it disrupts the fine-grained amplitude and temporal coherence essential for signal c… view at source ↗

**Figure 6.** Figure 6: Long-context classification performance with the Transformer encoder. (a) MOD - Audio (b) Ego4D - Accelerometer [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Long-context token distribution with the Transformer encoder. on Ego4D, where 30-second sequences contain multiple heterogeneous sub-events (posture transitions, hand-object interactions, environmental perturbations). Fixed-size tokenization mixes unrelated actions and obscures fine-grained transitions, while Dywave dynamically identifies semantic boundaries that align with activity transitions, enabling … view at source ↗

**Figure 8.** Figure 8: On-device (Raspberry Pi 4) Profiling. context settings but sacrifices efficiency with a much higher token count. In long-context settings (MOD audio), it reduces input length but suffers greater accuracy degradation. This implies non-anchor segments contain meaningful cues and should not be discarded. Dynamic fusion is crucial for achieving compact representations without sacrificing accuracy. Using spec… view at source ↗

**Figure 9.** Figure 9: Inference robustness with random noise injection. with heterogeneous dynamics, Dywave’s token compression substantially reduces backbone computation, and the advantage grows with longer context windows or larger encoder models. This makes Dywave particularly well-suited for real-world deployments where signals are long, heterogeneous, and resource constraints are tight. 4.7. Inference Noise Robustness [… view at source ↗

**Figure 10.** Figure 10: Physics-Informed Hierarchical Embedding Module. Here, Wfj denote the discrete wavelet coefficients at level j ∈ [1, J], Aj the corresponding approximations, ehj and gej the rescaled wavelet and scaling filters, and Lj the effective filter length. Since MODWT is undecimated, both dXj and Aj preserve the full temporal resolution of the original sequence L. The recursive formulation produces a hierarchy of a… view at source ↗

**Figure 11.** Figure 11: Temporal Anchor Formation Module. capture abrupt, high-frequency transients such as wrist flicks or foot impacts, while the context embedding EV interprets these as transitions between broader activity phases, such as moving from walking to standing or from wiping to resting. To integrate these complementary views, we fuse the two embeddings into a unified hierarchical embedding: E F = E U ||E V , EF ∈ R … view at source ↗

**Figure 12.** Figure 12: Temporal Fusion Module. uniform patching wastes computation on such redundant information. Dynamic temporal fusion addresses this by adaptively compressing coherent regions while preserving semantic integrity at event boundaries [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗

**Figure 13.** Figure 13: Sensitivity analysis on anchor budget. 0.00 0.25 0.50 0.75 1.00 rec 0.83 0.84 0.85 0.86 Accuracy RWHAR-Accuracy 0.00 0.25 0.50 0.75 1.00 rec 30 40 50 60 # Tokens RWHAR-#Tokens 0.00 0.25 0.50 0.75 1.00 rec 0.74 0.76 0.78 0.80 0.82 Accuracy PAMAP2-Accuracy 0.00 0.25 0.50 0.75 1.00 rec 2.0 2.2 2.4 2.6 # Tokens PAMAP2-#Tokens [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗

**Figure 14.** Figure 14: Sensitivity analysis on reconstruction loss λrec. reduction in input sequence length. The gap is more significant with long-context inputs in Ego4D. In these scenarios, PatchTST produces hundreds of tokens, leading to rapidly increasing latency with sequence length. In contrast, Dywave adaptively compresses long stationary regions into a small number of semantically coherent input tokens with up to an ord… view at source ↗

**Figure 15.** Figure 15: Ego4D Boundary Visualization. Signal events are manually annotated with red bounding boxes. different users perform the same activity (row 2), with user-dependent rhythms and intensities. Comparing different activities (row 3) further highlights changes in temporal density and dynamic range, reflecting the inherent heterogeneity of real-world motion across the samples. Under such diverse conditions, token… view at source ↗

**Figure 16.** Figure 16: Example of micro-activity decomposition with Dywave on Ego4D. conducts a qualitative case study of Dywave’s capability in mitigating this challenge on the Ego4D dataset (Grauman et al., 2022), which provides synchronized egocentric video and IMU signals during daily activities such as cooking, crafting, and household management. We extract 15-second continuous IMU segments and apply Dywave to the accelero… view at source ↗

read the original abstract

Internet of Things (IoT) systems continuously collect heterogeneous sensing signals from ubiquitous sensors to support intelligent applications such as human activity analysis, emotion monitoring, and environmental perception. These signals are inherently non-stationary and multi-scale, posing unique challenges for standard tokenization techniques. This paper proposes Dywave, a dynamic tokenization framework for IoT sensing signals that constructs compact input representations aligned with intrinsic temporal structures and underlying physical events. Dywave leverages wavelet-based hierarchical decomposition, identifies meaningful temporal boundaries corresponding to underlying semantic events, and adaptively compresses redundant intervals while preserving temporal coherence. Extensive evaluations on five real-world IoT sensing datasets across activity recognition, stress assessment, and nearby object detection demonstrate that Dywave outperforms state-of-the-art methods by up to 12% in accuracy, while improving computational efficiency by reducing input token lengths by up to 75% across mainstream sequence models. Moreover, Dywave exhibits improved robustness to domain shifts and varying sequence lengths.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Dywave uses wavelet decomposition to create event-aligned tokens for IoT signals and reports clear efficiency and accuracy gains, but the semantic validity of those boundaries is not yet pinned down.

read the letter

The main takeaway is that Dywave ties wavelet-based decomposition to detected temporal boundaries in heterogeneous IoT signals, then uses those to build shorter token sequences for activity recognition and similar tasks. It claims up to 12% accuracy gains and 75% shorter inputs across five datasets and standard sequence models. That combination of established signal processing with adaptive compression for non-stationary data is the fresh framing here. The evaluations give it some weight by covering multiple real-world sensing scenarios and showing the token reduction helps mainstream models without obvious domain-specific tuning. The robustness notes on varying lengths and shifts are also useful in practice. The central soft spot is the missing link between the wavelet boundaries and actual semantic or physical events. The abstract states they correspond to underlying events without supervision, yet provides no alignment metric, ablation on boundary quality, or check against labeled events. If the detected points are mostly picking up amplitude changes rather than meaningful intervals, the accuracy edge could trace to variable-length compression alone rather than the event alignment. Fair token-budget controls in the baselines would also help rule that out. Minor points include clearer reporting on wavelet parameter choices and whether the gains hold under stricter compute matching. This paper targets people working on on-device sequence models for sensing streams, especially those already using wavelets or looking to cut memory in activity or stress monitoring pipelines. A reader who needs concrete token-length reductions on public IoT datasets will find testable numbers here. The empirical claims are specific enough and the core idea is grounded in real constraints, so it deserves a serious referee even if the boundary validation needs tightening.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes Dywave, a dynamic tokenization framework for heterogeneous IoT sensing signals. It employs wavelet-based hierarchical decomposition to detect temporal boundaries aligned with intrinsic structures and underlying semantic events, adaptively compressing redundant intervals while preserving coherence. On five real-world datasets spanning activity recognition, stress assessment, and object detection, Dywave is reported to improve accuracy by up to 12% and reduce token lengths by up to 75% relative to prior methods when paired with standard sequence models, with added robustness to domain shifts.

Significance. If the claimed semantic-event alignment holds and explains the gains beyond generic compression, the work could meaningfully advance efficient tokenization for non-stationary sensor streams in resource-limited IoT settings. The breadth of datasets and models tested provides a reasonable empirical foundation, though the source of the reported improvements requires clearer isolation.

major comments (1)

[Abstract] Abstract: the central attribution of up to 12% accuracy gains and 75% token-length reduction to 'meaningful temporal boundaries corresponding to underlying semantic events' identified without supervision is load-bearing. No alignment metric, comparison against ground-truth event labels, or ablation against random or non-semantic adaptive boundaries is described, leaving open that gains may derive from variable-length compression alone rather than semantic correspondence.

minor comments (2)

[Abstract] The abstract refers to 'five real-world IoT sensing datasets' without naming them; explicit dataset citations and characteristics would improve reproducibility.
[Methodology] Clarify the specific wavelet family, decomposition depth selection criterion, and any hyperparameters governing boundary detection to support exact replication.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment below and describe the revisions we will make to strengthen the empirical isolation of our claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central attribution of up to 12% accuracy gains and 75% token-length reduction to 'meaningful temporal boundaries corresponding to underlying semantic events' identified without supervision is load-bearing. No alignment metric, comparison against ground-truth event labels, or ablation against random or non-semantic adaptive boundaries is described, leaving open that gains may derive from variable-length compression alone rather than semantic correspondence.

Authors: We appreciate the referee's observation that the attribution to unsupervised semantic-event alignment is central and requires stronger isolation from generic compression effects. Our wavelet hierarchical decomposition identifies boundaries via multi-scale energy thresholding and coefficient persistence, which we posit capture intrinsic signal transients corresponding to physical events; however, we agree that this requires explicit validation beyond the current results. In the revised manuscript we will add a dedicated ablation section that (i) compares Dywave against random boundary sampling drawn from the same length distribution, (ii) uniform fixed-length segmentation, and (iii) non-wavelet adaptive methods such as entropy-based or change-point detection without semantic priors. All ablations will be evaluated on the same five datasets and downstream models, reporting both accuracy and token-length metrics. Where activity-transition timestamps are available as proxy ground truth (e.g., in the activity-recognition datasets), we will also report boundary alignment metrics such as precision/recall of detected events against annotated intervals. These additions will clarify whether the observed gains exceed those attributable to variable-length compression alone. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents Dywave as a wavelet-based dynamic tokenization method evaluated empirically on five external real-world IoT datasets for tasks like activity recognition. No load-bearing steps reduce by construction to fitted parameters from the target data, self-citations, or definitional renaming; performance gains and token reductions are reported as outcomes of the proposed decomposition applied to standard sequence models. The derivation chain remains self-contained against external benchmarks with no evidence of the patterns that would indicate circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities are stated. Typical implicit assumptions include that wavelet scales capture semantic events and that compression preserves all task-relevant information.

pith-pipeline@v0.9.0 · 5734 in / 1115 out tokens · 40625 ms · 2026-05-20T20:31:34.091409+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Dywave leverages wavelet-based hierarchical decomposition, identifies meaningful temporal boundaries corresponding to underlying semantic events, and adaptively compresses redundant intervals while preserving temporal coherence.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MODWT yields {dX1, …, dXJ, A} … Detail Embedding … Context Embedding … Temporal Anchor Formation … saliency-weighted temporal fusion

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · 3 internal anchors

[1]

R., Smith, N

Ahia, O., Kumar, S., Gonen, H., Kasai, J., Mortensen, D. R., Smith, N. A., and Tsvetkov, Y. Do all languages cost the same? tokenization in the era of commercial language models

work page
[2]

K., and Alshurafa, N

Alharbi, R., Shahi, S., Cruz, S., Li, L., Sen, S., Pedram, M., Romano, C., Hester, J., Katsaggelos, A. K., and Alshurafa, N. Smokemon: unobtrusive extraction of smoking topography using wearable energy-efficient thermal. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, 6 0 (4): 0 1--25, 2023

work page 2023
[3]

F., Stella, L., Turkmen, A

Ansari, A. F., Stella, L., Turkmen, A. C., Zhang, X., Mercado, P., Shen, H., Shchur, O., Rangapuram, S. S., Arango, S. P., Kapoor, S., et al. Chronos: Learning the language of time series. Transactions on Machine Learning Research

work page
[4]

Foundation models for cps-iot: Opportunities and challenges

Baris, O., Chen, Y., Dong, G., Han, L., Kimura, T., Quan, P., Wang, R., Wang, T., Abdelzaher, T., Berg \'e s, M., et al. Foundation models for cps-iot: Opportunities and challenges. arXiv preprint arXiv:2501.16368, 2025

work page arXiv 2025
[5]

A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. On the opportunities and risks of foundation models. arXiv e-prints, pp.\ arXiv--2108, 2021

work page 2021
[6]

O., Pfister, T., Zheng, Y., Ye, W., and Liu, Y

Cao, D., Jia, F., Arik, S. O., Pfister, T., Zheng, Y., Ye, W., and Liu, Y. Tempo: Prompt-based generative pre-trained transformer for time series forecasting. In The Twelfth International Conference on Learning Representations

work page
[7]

Mspatch: A multi-scale patch mixing framework for multivariate time series forecasting

Cao, Y., Tian, Z., Guo, W., and Liu, X. Mspatch: A multi-scale patch mixing framework for multivariate time series forecasting. Expert Systems with Applications, 273: 0 126849, 2025

work page 2025
[8]

Llm4ts: Aligning pre-trained llms as data-efficient time-series forecasters

Chang, C., Wang, W.-Y., Peng, W.-C., and Chen, T.-F. Llm4ts: Aligning pre-trained llms as data-efficient time-series forecasters. ACM Transactions on Intelligent Systems and Technology, 16 0 (3): 0 1--20, 2025

work page 2025
[9]

L., Akther, S., Ertin, E., Fagundes, C

Chatterjee, S., Moreno, A., Lizotte, S. L., Akther, S., Ertin, E., Fagundes, C. P., Lam, C., Rehg, J. M., Wan, N., Wetter, D. W., et al. Smokingopp: Detecting the smoking'opportunity'context using mobile sensors. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, 4 0 (1): 0 1--26, 2020

work page 2020
[10]

Hmgan: A hierarchical multi-modal generative adversarial network model for wearable human activity recognition

Chen, L., Hu, R., Wu, M., and Zhou, X. Hmgan: A hierarchical multi-modal generative adversarial network model for wearable human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7 0 (3): 0 1--27, 2023

work page 2023
[11]

and Gu, A

Dao, T. and Gu, A. Transformers are ssms: Generalized models and efficient algorithms through structured state space duality. In International Conference on Machine Learning, pp.\ 10041--10071. PMLR, 2024

work page 2024
[12]

A decoder-only foundation model for time-series forecasting

Das, A., Kong, W., Sen, R., and Zhou, Y. A decoder-only foundation model for time-series forecasting. In Forty-first International Conference on Machine Learning, 2024

work page 2024
[13]

V., and Salim, F

Deldari, S., Xue, H., Saeed, A., Smith, D. V., and Salim, F. D. Cocoa: Cross modality contrastive learning for sensor data. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 6 0 (3): 0 1--28, 2022

work page 2022
[14]

An image is worth 16x16 words: Transformers for image recognition at scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020

work page 2020
[15]

Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting

Ekambaram, V., Jati, A., Nguyen, N., Sinthong, P., and Kalagnanam, J. Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting. In Proceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining, pp.\ 459--469, 2023

work page 2023
[16]

E., Chang, C.-C., Xu, X

Englhardt, Z., Ma, C., Morris, M. E., Chang, C.-C., Xu, X. O., Qin, L., McDuff, D., Liu, X., Patel, S., and Iyer, V. From classification to clinical insights: Towards analyzing and reasoning about mobile and behavioral health data with large language models. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 8 0 (2): 0 1-...

work page 2024
[17]

Mmtsa: Multi-modal temporal segment attention network for efficient human activity recognition

Gao, Z., Wang, Y., Chen, J., Xing, J., Patel, S., Liu, X., and Shi, Y. Mmtsa: Multi-modal temporal segment attention network for efficient human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7 0 (3): 0 1--26, 2023

work page 2023
[18]

o tz, L., Kollovieh, M., G \

G \"o tz, L., Kollovieh, M., G \"u nnemann, S., and Schwinn, L. Byte pair encoding for efficient time series forecasting. arXiv preprint arXiv:2505.14411, 2025

work page arXiv 2025
[19]

An introduction to wavelets

Graps, A. An introduction to wavelets. IEEE computational science and engineering, 2 0 (2): 0 50--61, 1995

work page 1995
[20]

Ego4d: Around the world in 3,000 hours of egocentric video

Grauman, K., Westbury, A., Byrne, E., Chavis, Z., Furnari, A., Girdhar, R., Hamburger, J., Jiang, H., Liu, M., Liu, X., et al. Ego4d: Around the world in 3,000 hours of egocentric video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 18995--19012, 2022

work page 2022
[21]

Deep residual learning for image recognition

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp.\ 770--778, 2016

work page 2016
[22]

Openmae: efficient masked autoencoder for vibration sensing with open-domain data enrichment

Hu, C., Chen, Y., Kara, D., Liu, S., Abdelzaher, T., Wu, F., and Chen, G. Openmae: efficient masked autoencoder for vibration sensing with open-domain data enrichment. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9 0 (2): 0 1--29, 2025

work page 2025
[23]

Y., Shi, X., Chen, P.-Y., Liang, Y., Li, Y.-F., Pan, S., et al

Jin, M., Wang, S., Ma, L., Chu, Z., Zhang, J. Y., Shi, X., Chen, P.-Y., Liang, Y., Li, Y.-F., Pan, S., et al. Time-llm: Time series forecasting by reprogramming large language models. In The Twelfth International Conference on Learning Representations

work page
[24]

Phymask: An adaptive masking paradigm for efficient self-supervised learning in iot

Kara, D., Kimura, T., Chen, Y., Li, J., Wang, R., Chen, Y., Wang, T., Liu, S., and Abdelzaher, T. Phymask: An adaptive masking paradigm for efficient self-supervised learning in iot. In Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems, pp.\ 97--111, 2024 a

work page 2024
[25]

Freqmae: Frequency-aware masked autoencoder for multi-modal iot sensing

Kara, D., Kimura, T., Shengzhong, L., Jinyang, L., Dongxin, L., Tianshi, W., Ruijie, W., Yizhuo, C., Yigong, H., and Tarek, A. Freqmae: Frequency-aware masked autoencoder for multi-modal iot sensing. In The World Wide Web Conference, 2024 b

work page 2024
[26]

Estimating sampling rate of human activity data from accelerometer using transformer-based regression model

Kawano, H., Okamoto, M., and Murao, K. Estimating sampling rate of human activity data from accelerometer using transformer-based regression model. In Adjunct Proceedings of the 2023 ACM International Joint Conference on Pervasive and Ubiquitous Computing & the 2023 ACM International Symposium on Wearable Computing, pp.\ 200--201, 2023

work page 2023
[27]

What and when to explain? on-road evaluation of explanations in highly automated vehicles

Kim, G., Yeo, D., Jo, T., Rus, D., and Kim, S. What and when to explain? on-road evaluation of explanations in highly automated vehicles. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7 0 (3): 0 1--26, 2023

work page 2023
[28]

Vibrofm: Towards micro foundation models for robust multimodal iot sensing

Kimura, T., Li, J., Wang, T., Chen, Y., Wang, R., Kara, D., Wigness, M., Bhattacharyya, J., Srivatsa, M., Liu, S., et al. Vibrofm: Towards micro foundation models for robust multimodal iot sensing. In 2024 IEEE 21st International Conference on Mobile Ad-Hoc and Smart Systems (MASS), pp.\ 10--18. IEEE, 2024

work page 2024
[29]

Infomae: Pair-efficient cross-modal alignment for multimodal time-series sensing signals

Kimura, T., Li, X., Hanna, O., Chen, Y., Chen, Y., Kara, D., Wang, T., Li, J., Ouyang, X., Liu, S., et al. Infomae: Pair-efficient cross-modal alignment for multimodal time-series sensing signals. In Proceedings of the ACM on Web Conference 2025, pp.\ 3084--3095, 2025

work page 2025
[30]

Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[31]

R., Cai, H., and Mostofi, Y

Korany, B., Karanam, C. R., Cai, H., and Mostofi, Y. Xmodal-id: Using wifi for through-wall person identification from candidate video footage. In The 25th Annual International Conference on Mobile Computing and Networking, MobiCom '19, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450361699. doi:10.1145/3300061.3345437. URL https...

work page doi:10.1145/3300061.3345437 2019
[32]

and Richardson, J

Kudo, T. and Richardson, J. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. EMNLP 2018, pp.\ 66, 2018

work page 2018
[33]

F., Morettin, P

Larrubia, L. F., Morettin, P. A., and Chiann, C. The maximal overlap discrete wavelet scattering transform and its application in classification tasks. arXiv preprint arXiv:2506.12039, 2025

work page arXiv 2025
[34]

Pywavelets: A python package for wavelet analysis

Lee, G., Gommers, R., Waselewski, F., Wohlfahrt, K., and O'Leary, A. Pywavelets: A python package for wavelet analysis. Journal of Open Source Software, 4 0 (36): 0 1237, 2019

work page 2019
[35]

and Mayrand, M

Lina, J.-M. and Mayrand, M. Complex daubechies wavelets. Applied and Computational Harmonic Analysis, 2 0 (3): 0 219--229, 1995

work page 1995
[36]

Focal: Contrastive learning for multimodal time-series sensing signals in factorized orthogonal latent space

Liu, S., Kimura, T., Liu, D., Wang, R., Li, J., Diggavi, S., Srivastava, M., and Abdelzaher, T. Focal: Contrastive learning for multimodal time-series sensing signals in factorized orthogonal latent space. Advances in Neural Information Processing Systems, 36, 2023

work page 2023
[37]

F., Han, B., Zhang, X., Faloutsos, C., Mahoney, M

Masserano, L., Ansari, A. F., Han, B., Zhang, X., Faloutsos, C., Mahoney, M. W., Wilson, A. G., Park, Y., Rangapuram, S. S., Maddix, D. C., et al. Enhancing foundation models for time series forecasting via wavelet-based tokenization. In Forty-second International Conference on Machine Learning, 2025

work page 2025
[38]

Naghashi, V., Boukadoum, M., and Diallo, A. B. A multiscale model for multivariate time series forecasting. Scientific Reports, 15 0 (1): 0 1565, 2025

work page 2025
[39]

a rv \"a inen, J., Pettersson, K., and M \

Nath, R. K., Tervonen, J., N \"a rv \"a inen, J., Pettersson, K., and M \"a ntyj \"a rvi, J. Towards self-supervised learning of ecg signal representation for the classification of acute stress types. In Proceedings of the Great Lakes Symposium on VLSI 2023, pp.\ 85--90, 2023

work page 2023
[40]

Hierarchical transformers are more efficient language models

Nawrot, P., Tworkowski, S., Tyrolski, M., Kaiser, ., Wu, Y., Szegedy, C., and Michalewski, H. Hierarchical transformers are more efficient language models

work page
[41]

H., Sinthong, P., and Kalagnanam, J

Nie, Y., Nguyen, N. H., Sinthong, P., and Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. In The Eleventh International Conference on Learning Representations, 2023

work page 2023
[42]

W., Xie, Z., Xing, G., and Huang, J

Ouyang, X., Shuai, X., Zhou, J., Shi, I. W., Xie, Z., Xing, G., and Huang, J. Cosmo: Contrastive fusion learning with small data for multimodal human activity recognition. In International Conference on Mobile Computing And Networking (MobiCom), 2022

work page 2022
[43]

Percival, D. B. and Walden, A. T. Wavelet methods for time series analysis, volume 4. Cambridge university press, 2000

work page 2000
[44]

Language model tokenizers introduce unfairness between languages

Petrov, A., La Malfa, E., Torr, P., and Bibi, A. Language model tokenizers introduce unfairness between languages. Advances in neural information processing systems, 36: 0 36963--36990, 2023

work page 2023
[45]

Fredformer: Frequency debiased transformer for time series forecasting

Piao, X., Chen, Z., Murayama, T., Matsubara, Y., and Sakurai, Y. Fredformer: Frequency debiased transformer for time series forecasting. In Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining, pp.\ 2400--2410, 2024

work page 2024
[46]

Enhancing masked time-series modeling via dropping patches

Qiu, T., Xie, Y., Niu, H., Xiong, Y., and Gao, X. Enhancing masked time-series modeling via dropping patches. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pp.\ 20077--20085, 2025

work page 2025
[47]

Dynamicvit: Efficient vision transformers with dynamic token sparsification

Rao, Y., Zhao, W., Liu, B., Lu, J., Zhou, J., and Hsieh, C.-J. Dynamicvit: Efficient vision transformers with dynamic token sparsification. Advances in neural information processing systems, 34: 0 13937--13949, 2021

work page 2021
[48]

and Stricker, D

Reiss, A. and Stricker, D. Introducing a new benchmarked dataset for activity monitoring. In International Symposium on Wearable Computers (ISWC), 2012

work page 2012
[49]

Motion2press: Cross model learning from imu to plantar pressure for gait analysis

Ren, J., Zheng, R., Zhang, W., She, D., Bai, Y., Jin, Z., and Gao, Y. Motion2press: Cross model learning from imu to plantar pressure for gait analysis. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9 0 (3): 0 1--33, 2025

work page 2025
[50]

Tokenlearner: Adaptive space-time tokenization for videos

Ryoo, M., Piergiovanni, A., Arnab, A., Dehghani, M., and Angelova, A. Tokenlearner: Adaptive space-time tokenization for videos. Advances in neural information processing systems, 34: 0 12786--12797, 2021

work page 2021
[51]

A., Mao, W., Neupane, S., Rehg, J

Saha, M., Xu, M. A., Mao, W., Neupane, S., Rehg, J. M., and Kumar, S. Pulse-ppg: An open-source field-trained ppg foundation model for wearable applications across lab and field settings. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9 0 (3): 0 1--35, 2025

work page 2025
[52]

Introducing wesad, a multimodal dataset for wearable stress and affect detection

Schmidt, P., Reiss, A., Duerichen, R., Marberger, C., and Van Laerhoven, K. Introducing wesad, a multimodal dataset for wearable stress and affect detection. In Proceedings of the 20th ACM international conference on multimodal interaction, pp.\ 400--408, 2018 a

work page 2018
[53]

Schmidt, P., Reiss, A., D \" u richen, R., Marberger, C., and Laerhoven, K. V. Introducing wesad, a multimodal dataset for wearable stress and affect detection. In ICMI 2018, pp.\ 400--408. ACM , 2018 b . doi:10.1145/3242969.3242985

work page doi:10.1145/3242969.3242985 2018
[54]

Neural machine translation of rare words with subword units

Sennrich, R., Haddow, B., and Birch, A. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 1715--1725, 2016

work page 2016
[55]

S., Jiang, X., and Mesgarani, N

Shams, S., Dindar, S. S., Jiang, X., and Mesgarani, N. Ssamba: Self-supervised audio representation learning with mamba state space model. In 2024 IEEE Spoken Language Technology Workshop (SLT), pp.\ 1053--1059. IEEE, 2024

work page 2024
[56]

and Stuckenschmidt, H

Sztyler, T. and Stuckenschmidt, H. On-body localization of wearable devices: An investigation of position-aware activity recognition. In IEEE International Conference on Pervasive Computing and Communications (PerCom), 2016

work page 2016
[57]

Scaling laws with vocabulary: Larger models deserve larger vocabularies

Tao, C., Liu, Q., Dou, L., Muennighoff, N., Wan, Z., Luo, P., Lin, M., and Wong, N. Scaling laws with vocabulary: Larger models deserve larger vocabularies. Advances in Neural Information Processing Systems, 37: 0 114147--114179, 2024

work page 2024
[58]

Selective review of offline change point detection methods

Truong, C., Oudre, L., and Vayatis, N. Selective review of offline change point detection methods. Signal processing, 167: 0 107299, 2020

work page 2020
[59]

A., Chatterjee, S., Fagundes, C

Ullah, M. A., Chatterjee, S., Fagundes, C. P., Lam, C., Nahum-Shani, I., Rehg, J. M., Wetter, D. W., and Kumar, S. mrisk: continuous risk estimation for smoking lapse from noisy sensor data with incomplete and positive-only labels. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, 6 0 (3): 0 1--29, 2022

work page 2022
[60]

N., Kaiser, ., and Polosukhin, I

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, ., and Polosukhin, I. Attention is all you need. In Advances in neural information processing systems, pp.\ 5998--6008, 2017

work page 2017
[61]

Loear: Push the range limit of acoustic sensing for vital sign monitoring

Wang, L., Li, W., Sun, K., Zhang, F., Gu, T., Xu, C., and Zhang, D. Loear: Push the range limit of acoustic sensing for vital sign monitoring. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 6 0 (3): 0 1--24, 2022

work page 2022
[62]

Contrastive learning of stress-specific word embedding for social media based stress detection

Wang, X., Zhang, H., Cao, L., Zeng, K., Li, Q., Li, N., and Feng, L. Contrastive learning of stress-specific word embedding for social media based stress detection. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.\ 5137--5149, 2023 a

work page 2023
[63]

Medformer: A multi-granularity patching transformer for medical time-series classification

Wang, Y., Huang, N., Li, T., Yan, Y., and Zhang, X. Medformer: A multi-granularity patching transformer for medical time-series classification. Advances in Neural Information Processing Systems, 37: 0 36314--36341, 2024

work page 2024
[64]

Lightgts: A lightweight general time series forecasting model

Wang, Y., Qiu, Y., Chen, P., Shu, Y., Rao, Z., Pan, L., Yang, B., and Guo, C. Lightgts: A lightweight general time series forecasting model. In International Conference on Machine Learning, pp.\ 64109--64126. PMLR, 2025

work page 2025
[65]

Hearfire: Indoor fire detection via inaudible acoustic sensing

Wang, Z., Wang, Y., Tian, M., and Shen, J. Hearfire: Indoor fire detection via inaudible acoustic sensing. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, 6 0 (4): 0 1--25, 2023 b

work page 2023
[66]

Deepsense: A unified deep learning framework for time-series mobile sensing data processing

Yao, S., Hu, S., Zhao, Y., Zhang, A., and Abdelzaher, T. Deepsense: A unified deep learning framework for time-series mobile sensing data processing. In International Conference on World Wide Web (WWW), 2017

work page 2017
[67]

Stfnets: Learning sensing signals from the time-frequency perspective with short-time fourier neural networks

Yao, S., Piao, A., Jiang, W., Zhao, Y., Shao, H., Liu, S., Liu, D., Li, J., Wang, T., Hu, S., et al. Stfnets: Learning sensing signals from the time-frequency perspective with short-time fourier neural networks. In The World Wide Web Conference, pp.\ 2192--2202, 2019

work page 2019
[68]

Frequency-domain mlps are more effective learners in time series forecasting

Yi, K., Zhang, Q., Fan, W., Wang, S., Wang, P., He, H., An, N., Lian, D., Cao, L., and Niu, Z. Frequency-domain mlps are more effective learners in time series forecasting. Advances in Neural Information Processing Systems, 36: 0 76656--76679, 2023

work page 2023
[69]

and Sano, A

Yu, H. and Sano, A. Semi-supervised learning for wearable-based momentary stress detection in the wild. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7 0 (2): 0 1--23, 2023

work page 2023
[70]

M., Chee, M., Shenoy, P., and Balan, R

Zakaria, C., Yilmaz, G., Mammen, P. M., Chee, M., Shenoy, P., and Balan, R. Sleepmore: Inferring sleep duration at scale via multi-device wifi sensing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 6 0 (4): 0 1--32, 2023

work page 2023
[71]

Self-supervised contrastive pre-training for time series via time-frequency consistency

Zhang, X., Zhao, Z., Tsiligkaridis, T., and Zitnik, M. Self-supervised contrastive pre-training for time series via time-frequency consistency. In Neural Information Processing Systems (NeurIPS), 2022

work page 2022
[72]

A., Narayanswamy, G., Xu, M

Zhang, Y., Ayush, K., Qiao, S., Heydari, A. A., Narayanswamy, G., Xu, M. A., Metwally, A., Xu, J., Garrison, J., Xu, X., Althoff, T., Liu, Y., Kohli, P., Zhan, J., Malhotra, M., Patel, S., Mascolo, C., Liu, X., McDuff, D., and Yang, Y. Sensor LM : Learning the language of wearable sensors. In The Thirty-ninth Annual Conference on Neural Information Proces...

work page 2025
[73]

Segall: A unified active learning framework for wireless sensing data segmentation

Zheng, N., Liu, R., Fan, X., Zhang, C., Zhang, L., and Yin, Z. Segall: A unified active learning framework for wireless sensing data segmentation. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9 0 (3): 0 1--27, 2025

work page 2025
[74]

Zhong, S., Song, S., Zhuo, W., Li, G., Liu, Y., and Chan, S.-H. G. A multi-scale decomposition mlp-mixer for time series analysis. Proceedings of the VLDB Endowment, 17 0 (7): 0 1723--1736, 2024

work page 2024
[75]

One fits all: Power general time series analysis by pretrained lm

Zhou, T., Niu, P., Sun, L., Jin, R., et al. One fits all: Power general time series analysis by pretrained lm. Advances in neural information processing systems, 36: 0 43322--43355, 2023

work page 2023
[76]

Scalemixer: A multi-scale mlp-mixer model for long-term time series forecasting

Zou, X., You, C., Zhao, R., Yang, H., and Cheng, X. Scalemixer: A multi-scale mlp-mixer model for long-term time series forecasting. In International Conference on Neural Information Processing, pp.\ 44--58. Springer, 2024

work page 2024
[77]

12th \ USENIX \ Symposium on Operating Systems Design and Implementation ( \ OSDI \ 16) , pages =

Tensorflow: A system for large-scale machine learning , author =. 12th \ USENIX \ Symposium on Operating Systems Design and Implementation ( \ OSDI \ 16) , pages =

work page
[78]

Computer , publisher =

Toward an internet of battlefield things: A resilience perspective , author =. Computer , publisher =

work page
[79]

ACM Transactions on Internet Technology (TOIT) , publisher =

Five challenges in cloud-enabled intelligence and control , author =. ACM Transactions on Internet Technology (TOIT) , publisher =

work page
[80]

2017 International Conference on Engineering and Technology (ICET) , volume =

Understanding of a convolutional neural network , author =. 2017 International Conference on Engineering and Technology (ICET) , volume =

work page 2017

Showing first 80 references.

[1] [1]

R., Smith, N

Ahia, O., Kumar, S., Gonen, H., Kasai, J., Mortensen, D. R., Smith, N. A., and Tsvetkov, Y. Do all languages cost the same? tokenization in the era of commercial language models

work page

[2] [2]

K., and Alshurafa, N

Alharbi, R., Shahi, S., Cruz, S., Li, L., Sen, S., Pedram, M., Romano, C., Hester, J., Katsaggelos, A. K., and Alshurafa, N. Smokemon: unobtrusive extraction of smoking topography using wearable energy-efficient thermal. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, 6 0 (4): 0 1--25, 2023

work page 2023

[3] [3]

F., Stella, L., Turkmen, A

Ansari, A. F., Stella, L., Turkmen, A. C., Zhang, X., Mercado, P., Shen, H., Shchur, O., Rangapuram, S. S., Arango, S. P., Kapoor, S., et al. Chronos: Learning the language of time series. Transactions on Machine Learning Research

work page

[4] [4]

Foundation models for cps-iot: Opportunities and challenges

Baris, O., Chen, Y., Dong, G., Han, L., Kimura, T., Quan, P., Wang, R., Wang, T., Abdelzaher, T., Berg \'e s, M., et al. Foundation models for cps-iot: Opportunities and challenges. arXiv preprint arXiv:2501.16368, 2025

work page arXiv 2025

[5] [5]

A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. On the opportunities and risks of foundation models. arXiv e-prints, pp.\ arXiv--2108, 2021

work page 2021

[6] [6]

O., Pfister, T., Zheng, Y., Ye, W., and Liu, Y

Cao, D., Jia, F., Arik, S. O., Pfister, T., Zheng, Y., Ye, W., and Liu, Y. Tempo: Prompt-based generative pre-trained transformer for time series forecasting. In The Twelfth International Conference on Learning Representations

work page

[7] [7]

Mspatch: A multi-scale patch mixing framework for multivariate time series forecasting

Cao, Y., Tian, Z., Guo, W., and Liu, X. Mspatch: A multi-scale patch mixing framework for multivariate time series forecasting. Expert Systems with Applications, 273: 0 126849, 2025

work page 2025

[8] [8]

Llm4ts: Aligning pre-trained llms as data-efficient time-series forecasters

Chang, C., Wang, W.-Y., Peng, W.-C., and Chen, T.-F. Llm4ts: Aligning pre-trained llms as data-efficient time-series forecasters. ACM Transactions on Intelligent Systems and Technology, 16 0 (3): 0 1--20, 2025

work page 2025

[9] [9]

L., Akther, S., Ertin, E., Fagundes, C

Chatterjee, S., Moreno, A., Lizotte, S. L., Akther, S., Ertin, E., Fagundes, C. P., Lam, C., Rehg, J. M., Wan, N., Wetter, D. W., et al. Smokingopp: Detecting the smoking'opportunity'context using mobile sensors. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, 4 0 (1): 0 1--26, 2020

work page 2020

[10] [10]

Hmgan: A hierarchical multi-modal generative adversarial network model for wearable human activity recognition

Chen, L., Hu, R., Wu, M., and Zhou, X. Hmgan: A hierarchical multi-modal generative adversarial network model for wearable human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7 0 (3): 0 1--27, 2023

work page 2023

[11] [11]

and Gu, A

Dao, T. and Gu, A. Transformers are ssms: Generalized models and efficient algorithms through structured state space duality. In International Conference on Machine Learning, pp.\ 10041--10071. PMLR, 2024

work page 2024

[12] [12]

A decoder-only foundation model for time-series forecasting

Das, A., Kong, W., Sen, R., and Zhou, Y. A decoder-only foundation model for time-series forecasting. In Forty-first International Conference on Machine Learning, 2024

work page 2024

[13] [13]

V., and Salim, F

Deldari, S., Xue, H., Saeed, A., Smith, D. V., and Salim, F. D. Cocoa: Cross modality contrastive learning for sensor data. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 6 0 (3): 0 1--28, 2022

work page 2022

[14] [14]

An image is worth 16x16 words: Transformers for image recognition at scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020

work page 2020

[15] [15]

Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting

Ekambaram, V., Jati, A., Nguyen, N., Sinthong, P., and Kalagnanam, J. Tsmixer: Lightweight mlp-mixer model for multivariate time series forecasting. In Proceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining, pp.\ 459--469, 2023

work page 2023

[16] [16]

E., Chang, C.-C., Xu, X

Englhardt, Z., Ma, C., Morris, M. E., Chang, C.-C., Xu, X. O., Qin, L., McDuff, D., Liu, X., Patel, S., and Iyer, V. From classification to clinical insights: Towards analyzing and reasoning about mobile and behavioral health data with large language models. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 8 0 (2): 0 1-...

work page 2024

[17] [17]

Mmtsa: Multi-modal temporal segment attention network for efficient human activity recognition

Gao, Z., Wang, Y., Chen, J., Xing, J., Patel, S., Liu, X., and Shi, Y. Mmtsa: Multi-modal temporal segment attention network for efficient human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7 0 (3): 0 1--26, 2023

work page 2023

[18] [18]

o tz, L., Kollovieh, M., G \

G \"o tz, L., Kollovieh, M., G \"u nnemann, S., and Schwinn, L. Byte pair encoding for efficient time series forecasting. arXiv preprint arXiv:2505.14411, 2025

work page arXiv 2025

[19] [19]

An introduction to wavelets

Graps, A. An introduction to wavelets. IEEE computational science and engineering, 2 0 (2): 0 50--61, 1995

work page 1995

[20] [20]

Ego4d: Around the world in 3,000 hours of egocentric video

Grauman, K., Westbury, A., Byrne, E., Chavis, Z., Furnari, A., Girdhar, R., Hamburger, J., Jiang, H., Liu, M., Liu, X., et al. Ego4d: Around the world in 3,000 hours of egocentric video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.\ 18995--19012, 2022

work page 2022

[21] [21]

Deep residual learning for image recognition

He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp.\ 770--778, 2016

work page 2016

[22] [22]

Openmae: efficient masked autoencoder for vibration sensing with open-domain data enrichment

Hu, C., Chen, Y., Kara, D., Liu, S., Abdelzaher, T., Wu, F., and Chen, G. Openmae: efficient masked autoencoder for vibration sensing with open-domain data enrichment. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9 0 (2): 0 1--29, 2025

work page 2025

[23] [23]

Y., Shi, X., Chen, P.-Y., Liang, Y., Li, Y.-F., Pan, S., et al

Jin, M., Wang, S., Ma, L., Chu, Z., Zhang, J. Y., Shi, X., Chen, P.-Y., Liang, Y., Li, Y.-F., Pan, S., et al. Time-llm: Time series forecasting by reprogramming large language models. In The Twelfth International Conference on Learning Representations

work page

[24] [24]

Phymask: An adaptive masking paradigm for efficient self-supervised learning in iot

Kara, D., Kimura, T., Chen, Y., Li, J., Wang, R., Chen, Y., Wang, T., Liu, S., and Abdelzaher, T. Phymask: An adaptive masking paradigm for efficient self-supervised learning in iot. In Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems, pp.\ 97--111, 2024 a

work page 2024

[25] [25]

Freqmae: Frequency-aware masked autoencoder for multi-modal iot sensing

Kara, D., Kimura, T., Shengzhong, L., Jinyang, L., Dongxin, L., Tianshi, W., Ruijie, W., Yizhuo, C., Yigong, H., and Tarek, A. Freqmae: Frequency-aware masked autoencoder for multi-modal iot sensing. In The World Wide Web Conference, 2024 b

work page 2024

[26] [26]

Estimating sampling rate of human activity data from accelerometer using transformer-based regression model

Kawano, H., Okamoto, M., and Murao, K. Estimating sampling rate of human activity data from accelerometer using transformer-based regression model. In Adjunct Proceedings of the 2023 ACM International Joint Conference on Pervasive and Ubiquitous Computing & the 2023 ACM International Symposium on Wearable Computing, pp.\ 200--201, 2023

work page 2023

[27] [27]

What and when to explain? on-road evaluation of explanations in highly automated vehicles

Kim, G., Yeo, D., Jo, T., Rus, D., and Kim, S. What and when to explain? on-road evaluation of explanations in highly automated vehicles. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7 0 (3): 0 1--26, 2023

work page 2023

[28] [28]

Vibrofm: Towards micro foundation models for robust multimodal iot sensing

Kimura, T., Li, J., Wang, T., Chen, Y., Wang, R., Kara, D., Wigness, M., Bhattacharyya, J., Srivatsa, M., Liu, S., et al. Vibrofm: Towards micro foundation models for robust multimodal iot sensing. In 2024 IEEE 21st International Conference on Mobile Ad-Hoc and Smart Systems (MASS), pp.\ 10--18. IEEE, 2024

work page 2024

[29] [29]

Infomae: Pair-efficient cross-modal alignment for multimodal time-series sensing signals

Kimura, T., Li, X., Hanna, O., Chen, Y., Chen, Y., Kara, D., Wang, T., Li, J., Ouyang, X., Liu, S., et al. Infomae: Pair-efficient cross-modal alignment for multimodal time-series sensing signals. In Proceedings of the ACM on Web Conference 2025, pp.\ 3084--3095, 2025

work page 2025

[30] [30]

Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[31] [31]

R., Cai, H., and Mostofi, Y

Korany, B., Karanam, C. R., Cai, H., and Mostofi, Y. Xmodal-id: Using wifi for through-wall person identification from candidate video footage. In The 25th Annual International Conference on Mobile Computing and Networking, MobiCom '19, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450361699. doi:10.1145/3300061.3345437. URL https...

work page doi:10.1145/3300061.3345437 2019

[32] [32]

and Richardson, J

Kudo, T. and Richardson, J. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. EMNLP 2018, pp.\ 66, 2018

work page 2018

[33] [33]

F., Morettin, P

Larrubia, L. F., Morettin, P. A., and Chiann, C. The maximal overlap discrete wavelet scattering transform and its application in classification tasks. arXiv preprint arXiv:2506.12039, 2025

work page arXiv 2025

[34] [34]

Pywavelets: A python package for wavelet analysis

Lee, G., Gommers, R., Waselewski, F., Wohlfahrt, K., and O'Leary, A. Pywavelets: A python package for wavelet analysis. Journal of Open Source Software, 4 0 (36): 0 1237, 2019

work page 2019

[35] [35]

and Mayrand, M

Lina, J.-M. and Mayrand, M. Complex daubechies wavelets. Applied and Computational Harmonic Analysis, 2 0 (3): 0 219--229, 1995

work page 1995

[36] [36]

Focal: Contrastive learning for multimodal time-series sensing signals in factorized orthogonal latent space

Liu, S., Kimura, T., Liu, D., Wang, R., Li, J., Diggavi, S., Srivastava, M., and Abdelzaher, T. Focal: Contrastive learning for multimodal time-series sensing signals in factorized orthogonal latent space. Advances in Neural Information Processing Systems, 36, 2023

work page 2023

[37] [37]

F., Han, B., Zhang, X., Faloutsos, C., Mahoney, M

Masserano, L., Ansari, A. F., Han, B., Zhang, X., Faloutsos, C., Mahoney, M. W., Wilson, A. G., Park, Y., Rangapuram, S. S., Maddix, D. C., et al. Enhancing foundation models for time series forecasting via wavelet-based tokenization. In Forty-second International Conference on Machine Learning, 2025

work page 2025

[38] [38]

Naghashi, V., Boukadoum, M., and Diallo, A. B. A multiscale model for multivariate time series forecasting. Scientific Reports, 15 0 (1): 0 1565, 2025

work page 2025

[39] [39]

a rv \"a inen, J., Pettersson, K., and M \

Nath, R. K., Tervonen, J., N \"a rv \"a inen, J., Pettersson, K., and M \"a ntyj \"a rvi, J. Towards self-supervised learning of ecg signal representation for the classification of acute stress types. In Proceedings of the Great Lakes Symposium on VLSI 2023, pp.\ 85--90, 2023

work page 2023

[40] [40]

Hierarchical transformers are more efficient language models

Nawrot, P., Tworkowski, S., Tyrolski, M., Kaiser, ., Wu, Y., Szegedy, C., and Michalewski, H. Hierarchical transformers are more efficient language models

work page

[41] [41]

H., Sinthong, P., and Kalagnanam, J

Nie, Y., Nguyen, N. H., Sinthong, P., and Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. In The Eleventh International Conference on Learning Representations, 2023

work page 2023

[42] [42]

W., Xie, Z., Xing, G., and Huang, J

Ouyang, X., Shuai, X., Zhou, J., Shi, I. W., Xie, Z., Xing, G., and Huang, J. Cosmo: Contrastive fusion learning with small data for multimodal human activity recognition. In International Conference on Mobile Computing And Networking (MobiCom), 2022

work page 2022

[43] [43]

Percival, D. B. and Walden, A. T. Wavelet methods for time series analysis, volume 4. Cambridge university press, 2000

work page 2000

[44] [44]

Language model tokenizers introduce unfairness between languages

Petrov, A., La Malfa, E., Torr, P., and Bibi, A. Language model tokenizers introduce unfairness between languages. Advances in neural information processing systems, 36: 0 36963--36990, 2023

work page 2023

[45] [45]

Fredformer: Frequency debiased transformer for time series forecasting

Piao, X., Chen, Z., Murayama, T., Matsubara, Y., and Sakurai, Y. Fredformer: Frequency debiased transformer for time series forecasting. In Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining, pp.\ 2400--2410, 2024

work page 2024

[46] [46]

Enhancing masked time-series modeling via dropping patches

Qiu, T., Xie, Y., Niu, H., Xiong, Y., and Gao, X. Enhancing masked time-series modeling via dropping patches. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pp.\ 20077--20085, 2025

work page 2025

[47] [47]

Dynamicvit: Efficient vision transformers with dynamic token sparsification

Rao, Y., Zhao, W., Liu, B., Lu, J., Zhou, J., and Hsieh, C.-J. Dynamicvit: Efficient vision transformers with dynamic token sparsification. Advances in neural information processing systems, 34: 0 13937--13949, 2021

work page 2021

[48] [48]

and Stricker, D

Reiss, A. and Stricker, D. Introducing a new benchmarked dataset for activity monitoring. In International Symposium on Wearable Computers (ISWC), 2012

work page 2012

[49] [49]

Motion2press: Cross model learning from imu to plantar pressure for gait analysis

Ren, J., Zheng, R., Zhang, W., She, D., Bai, Y., Jin, Z., and Gao, Y. Motion2press: Cross model learning from imu to plantar pressure for gait analysis. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9 0 (3): 0 1--33, 2025

work page 2025

[50] [50]

Tokenlearner: Adaptive space-time tokenization for videos

Ryoo, M., Piergiovanni, A., Arnab, A., Dehghani, M., and Angelova, A. Tokenlearner: Adaptive space-time tokenization for videos. Advances in neural information processing systems, 34: 0 12786--12797, 2021

work page 2021

[51] [51]

A., Mao, W., Neupane, S., Rehg, J

Saha, M., Xu, M. A., Mao, W., Neupane, S., Rehg, J. M., and Kumar, S. Pulse-ppg: An open-source field-trained ppg foundation model for wearable applications across lab and field settings. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9 0 (3): 0 1--35, 2025

work page 2025

[52] [52]

Introducing wesad, a multimodal dataset for wearable stress and affect detection

Schmidt, P., Reiss, A., Duerichen, R., Marberger, C., and Van Laerhoven, K. Introducing wesad, a multimodal dataset for wearable stress and affect detection. In Proceedings of the 20th ACM international conference on multimodal interaction, pp.\ 400--408, 2018 a

work page 2018

[53] [53]

Schmidt, P., Reiss, A., D \" u richen, R., Marberger, C., and Laerhoven, K. V. Introducing wesad, a multimodal dataset for wearable stress and affect detection. In ICMI 2018, pp.\ 400--408. ACM , 2018 b . doi:10.1145/3242969.3242985

work page doi:10.1145/3242969.3242985 2018

[54] [54]

Neural machine translation of rare words with subword units

Sennrich, R., Haddow, B., and Birch, A. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 1715--1725, 2016

work page 2016

[55] [55]

S., Jiang, X., and Mesgarani, N

Shams, S., Dindar, S. S., Jiang, X., and Mesgarani, N. Ssamba: Self-supervised audio representation learning with mamba state space model. In 2024 IEEE Spoken Language Technology Workshop (SLT), pp.\ 1053--1059. IEEE, 2024

work page 2024

[56] [56]

and Stuckenschmidt, H

Sztyler, T. and Stuckenschmidt, H. On-body localization of wearable devices: An investigation of position-aware activity recognition. In IEEE International Conference on Pervasive Computing and Communications (PerCom), 2016

work page 2016

[57] [57]

Scaling laws with vocabulary: Larger models deserve larger vocabularies

Tao, C., Liu, Q., Dou, L., Muennighoff, N., Wan, Z., Luo, P., Lin, M., and Wong, N. Scaling laws with vocabulary: Larger models deserve larger vocabularies. Advances in Neural Information Processing Systems, 37: 0 114147--114179, 2024

work page 2024

[58] [58]

Selective review of offline change point detection methods

Truong, C., Oudre, L., and Vayatis, N. Selective review of offline change point detection methods. Signal processing, 167: 0 107299, 2020

work page 2020

[59] [59]

A., Chatterjee, S., Fagundes, C

Ullah, M. A., Chatterjee, S., Fagundes, C. P., Lam, C., Nahum-Shani, I., Rehg, J. M., Wetter, D. W., and Kumar, S. mrisk: continuous risk estimation for smoking lapse from noisy sensor data with incomplete and positive-only labels. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, 6 0 (3): 0 1--29, 2022

work page 2022

[60] [60]

N., Kaiser, ., and Polosukhin, I

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, ., and Polosukhin, I. Attention is all you need. In Advances in neural information processing systems, pp.\ 5998--6008, 2017

work page 2017

[61] [61]

Loear: Push the range limit of acoustic sensing for vital sign monitoring

Wang, L., Li, W., Sun, K., Zhang, F., Gu, T., Xu, C., and Zhang, D. Loear: Push the range limit of acoustic sensing for vital sign monitoring. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 6 0 (3): 0 1--24, 2022

work page 2022

[62] [62]

Contrastive learning of stress-specific word embedding for social media based stress detection

Wang, X., Zhang, H., Cao, L., Zeng, K., Li, Q., Li, N., and Feng, L. Contrastive learning of stress-specific word embedding for social media based stress detection. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.\ 5137--5149, 2023 a

work page 2023

[63] [63]

Medformer: A multi-granularity patching transformer for medical time-series classification

Wang, Y., Huang, N., Li, T., Yan, Y., and Zhang, X. Medformer: A multi-granularity patching transformer for medical time-series classification. Advances in Neural Information Processing Systems, 37: 0 36314--36341, 2024

work page 2024

[64] [64]

Lightgts: A lightweight general time series forecasting model

Wang, Y., Qiu, Y., Chen, P., Shu, Y., Rao, Z., Pan, L., Yang, B., and Guo, C. Lightgts: A lightweight general time series forecasting model. In International Conference on Machine Learning, pp.\ 64109--64126. PMLR, 2025

work page 2025

[65] [65]

Hearfire: Indoor fire detection via inaudible acoustic sensing

Wang, Z., Wang, Y., Tian, M., and Shen, J. Hearfire: Indoor fire detection via inaudible acoustic sensing. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, 6 0 (4): 0 1--25, 2023 b

work page 2023

[66] [66]

Deepsense: A unified deep learning framework for time-series mobile sensing data processing

Yao, S., Hu, S., Zhao, Y., Zhang, A., and Abdelzaher, T. Deepsense: A unified deep learning framework for time-series mobile sensing data processing. In International Conference on World Wide Web (WWW), 2017

work page 2017

[67] [67]

Stfnets: Learning sensing signals from the time-frequency perspective with short-time fourier neural networks

Yao, S., Piao, A., Jiang, W., Zhao, Y., Shao, H., Liu, S., Liu, D., Li, J., Wang, T., Hu, S., et al. Stfnets: Learning sensing signals from the time-frequency perspective with short-time fourier neural networks. In The World Wide Web Conference, pp.\ 2192--2202, 2019

work page 2019

[68] [68]

Frequency-domain mlps are more effective learners in time series forecasting

Yi, K., Zhang, Q., Fan, W., Wang, S., Wang, P., He, H., An, N., Lian, D., Cao, L., and Niu, Z. Frequency-domain mlps are more effective learners in time series forecasting. Advances in Neural Information Processing Systems, 36: 0 76656--76679, 2023

work page 2023

[69] [69]

and Sano, A

Yu, H. and Sano, A. Semi-supervised learning for wearable-based momentary stress detection in the wild. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7 0 (2): 0 1--23, 2023

work page 2023

[70] [70]

M., Chee, M., Shenoy, P., and Balan, R

Zakaria, C., Yilmaz, G., Mammen, P. M., Chee, M., Shenoy, P., and Balan, R. Sleepmore: Inferring sleep duration at scale via multi-device wifi sensing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 6 0 (4): 0 1--32, 2023

work page 2023

[71] [71]

Self-supervised contrastive pre-training for time series via time-frequency consistency

Zhang, X., Zhao, Z., Tsiligkaridis, T., and Zitnik, M. Self-supervised contrastive pre-training for time series via time-frequency consistency. In Neural Information Processing Systems (NeurIPS), 2022

work page 2022

[72] [72]

A., Narayanswamy, G., Xu, M

Zhang, Y., Ayush, K., Qiao, S., Heydari, A. A., Narayanswamy, G., Xu, M. A., Metwally, A., Xu, J., Garrison, J., Xu, X., Althoff, T., Liu, Y., Kohli, P., Zhan, J., Malhotra, M., Patel, S., Mascolo, C., Liu, X., McDuff, D., and Yang, Y. Sensor LM : Learning the language of wearable sensors. In The Thirty-ninth Annual Conference on Neural Information Proces...

work page 2025

[73] [73]

Segall: A unified active learning framework for wireless sensing data segmentation

Zheng, N., Liu, R., Fan, X., Zhang, C., Zhang, L., and Yin, Z. Segall: A unified active learning framework for wireless sensing data segmentation. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 9 0 (3): 0 1--27, 2025

work page 2025

[74] [74]

Zhong, S., Song, S., Zhuo, W., Li, G., Liu, Y., and Chan, S.-H. G. A multi-scale decomposition mlp-mixer for time series analysis. Proceedings of the VLDB Endowment, 17 0 (7): 0 1723--1736, 2024

work page 2024

[75] [75]

One fits all: Power general time series analysis by pretrained lm

Zhou, T., Niu, P., Sun, L., Jin, R., et al. One fits all: Power general time series analysis by pretrained lm. Advances in neural information processing systems, 36: 0 43322--43355, 2023

work page 2023

[76] [76]

Scalemixer: A multi-scale mlp-mixer model for long-term time series forecasting

Zou, X., You, C., Zhao, R., Yang, H., and Cheng, X. Scalemixer: A multi-scale mlp-mixer model for long-term time series forecasting. In International Conference on Neural Information Processing, pp.\ 44--58. Springer, 2024

work page 2024

[77] [77]

12th \ USENIX \ Symposium on Operating Systems Design and Implementation ( \ OSDI \ 16) , pages =

Tensorflow: A system for large-scale machine learning , author =. 12th \ USENIX \ Symposium on Operating Systems Design and Implementation ( \ OSDI \ 16) , pages =

work page

[78] [78]

Computer , publisher =

Toward an internet of battlefield things: A resilience perspective , author =. Computer , publisher =

work page

[79] [79]

ACM Transactions on Internet Technology (TOIT) , publisher =

Five challenges in cloud-enabled intelligence and control , author =. ACM Transactions on Internet Technology (TOIT) , publisher =

work page

[80] [80]

2017 International Conference on Engineering and Technology (ICET) , volume =

Understanding of a convolutional neural network , author =. 2017 International Conference on Engineering and Technology (ICET) , volume =

work page 2017