pith. sign in

arxiv: 2509.14968 · v3 · pith:7ADCNWQ4new · submitted 2025-09-18 · 💻 cs.LG · cs.NI

FAWN: A MultiEncoder Fusion-Attention Wave Network for Integrated Sensing and Communication Indoor Scene Inference

Pith reviewed 2026-05-21 21:47 UTC · model grok-4.3

classification 💻 cs.LG cs.NI
keywords FAWNIntegrated Sensing and CommunicationWi-Fi5Gindoor scene inferencepassive sensingtransformer networkmulti-encoder fusion
0
0 comments X

The pith

FAWN fuses Wi-Fi and 5G signals to achieve sub-meter accuracy in passive indoor scene inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

FAWN is a transformer-based network that fuses information from Wi-Fi and 5G to enable passive sensing of indoor environments. This integration allows the system to understand physical spaces by reusing existing communication signals without causing interference. The design addresses the accuracy limits of single-technology approaches by combining different wireless spectrums for wider coverage. A real prototype demonstrates the method in practice, with positioning errors staying below 0.6 meters in roughly 84 percent of cases. The work shows how current networks can gain environmental awareness using infrastructure already in place.

Core claim

FAWN, a MultiEncoder Fusion-Attention Wave Network based on the original transformers architecture, fuses information from Wi-Fi and 5G, making the network capable of understanding the physical world without interfering with the current communication, as shown by a real-scenario prototype with errors below 0.6 m around 84% of the time.

What carries the argument

The MultiEncoder Fusion-Attention Wave Network, a transformer-based architecture that uses multiple encoders and attention mechanisms to integrate signals from different wireless technologies for indoor scene inference.

If this is right

  • Combining Wi-Fi and 5G increases the accuracy reachable for passive indoor sensing beyond single-technology limits.
  • The passive approach reuses existing communications to sense the environment without dedicated hardware or interference.
  • Leveraging different spectrums augments the coverage area for scene inference tasks.
  • Integration into a real prototype confirms the architecture works with current wireless infrastructure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The fusion technique could apply to other wireless standards to further boost sensing performance.
  • Such systems might support practical uses like indoor navigation or smart environment monitoring.
  • Testing in additional building types would help identify where the accuracy gains hold or need adjustment.

Load-bearing premise

A single real-scenario prototype is sufficient to show that fusing Wi-Fi and 5G via this architecture reliably augments coverage and accuracy for general indoor scene inference tasks.

What would settle it

Repeated tests across multiple varied indoor environments where errors exceed 0.6 m in more than 16 percent of cases would indicate the fusion method does not deliver the claimed general reliability.

Figures

Figures reproduced from arXiv: 2509.14968 by Alejandro Calvillo-Fernandez, Antonio de la Oliva, Carlos Barroso-Fern\'andez, Carlos J. Bernardos.

Figure 1
Figure 1. Figure 1: Scheme of indoor laboratory room. Wi-Fi AP and 5G dot signals are [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The flow of how CSI is extracted from the real environment (top left picture) using the USRPs acting as passive receivers of 5G SSBs and Wi-Fi [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: ECDFs of the position error in meters. We can see in Table II a comparison of the ability of all the solutions to classify if the inferred target is a person or a robot, and the time and energy spend on inference. To evaluate the classification performance, we choose two well known metrics: the F1-Score, meaning that for higher values, the model balances precision and recall better; and accuracy, number of… view at source ↗
Figure 4
Figure 4. Figure 4: Heatmap of the mean error in the different locations of the laboratory [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

The upcoming generations of wireless technologies promise an era where everything is interconnected and intelligent. As the need for intelligence grows, networks must learn to better understand the physical world. However, deploying dedicated hardware to perceive the environment is not always feasible, mainly due to costs and/or complexity. Integrated Sensing and Communication (ISAC) has made a step forward in addressing this challenge. Within ISAC, passive sensing emerges as a cost-effective solution that reuses wireless communications to sense the environment, without interfering with existing communications. Nevertheless, the majority of current solutions are limited to one technology (mostly Wi-Fi or 5G), constraining the maximum accuracy reachable. As different technologies work with different spectrums, we see a necessity in integrating more than one technology to augment the coverage area. Hence, we take the advantage of ISAC passive sensing, to present FAWN, a MultiEncoder Fusion-Attention Wave Network for ISAC indoor scene inference. FAWN is based on the original transformers architecture, to fuse information from Wi-Fi and 5G, making the network capable of understanding the physical world without interfering with the current communication. To test our solution, we have built a prototype and integrated it in a real scenario. Results show errors below 0.6 m around 84% of times.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes FAWN, a transformer-based MultiEncoder Fusion-Attention Wave Network for passive Integrated Sensing and Communication (ISAC) indoor scene inference. It fuses Wi-Fi and 5G signals to enable environment perception without dedicated hardware or interference with existing communications, claiming augmented coverage and accuracy. A real-scenario prototype is reported to achieve positioning errors below 0.6 m in approximately 84% of cases.

Significance. If the fusion mechanism proves robust, the work could advance multi-technology passive ISAC by demonstrating practical integration of heterogeneous wireless signals for scene inference using existing infrastructure. The prototype provides an initial feasibility demonstration, though broader validation would be needed to establish general impact.

major comments (2)
  1. Abstract and prototype evaluation: the central claim that FAWN reliably augments coverage and accuracy via Wi-Fi/5G fusion rests on results from a single real-scenario prototype. No information is supplied on experimental design (room dimensions, device placements, number of trials or test points, multipath profiles, or material properties), baselines, dataset size, or statistical tests, so it is not possible to determine whether the 84% figure for sub-0.6 m errors generalizes beyond the specific tested geometry and hardware configuration.
  2. Results section: absence of single-technology baselines (Wi-Fi-only or 5G-only) or alternative fusion architectures prevents isolating the contribution of the proposed multi-encoder fusion-attention mechanism from potential benefits of simply using multiple bands.
minor comments (2)
  1. Specify the exact performance metric (e.g., CDF value at 0.6 m) and total number of measurements rather than the phrasing 'around 84% of times'.
  2. Clarify notation for the wave-network components and how the fusion-attention layers combine encoder outputs from the two technologies.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our experimental results and the contribution of the fusion mechanism. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: Abstract and prototype evaluation: the central claim that FAWN reliably augments coverage and accuracy via Wi-Fi/5G fusion rests on results from a single real-scenario prototype. No information is supplied on experimental design (room dimensions, device placements, number of trials or test points, multipath profiles, or material properties), baselines, dataset size, or statistical tests, so it is not possible to determine whether the 84% figure for sub-0.6 m errors generalizes beyond the specific tested geometry and hardware configuration.

    Authors: We agree that additional details on the experimental design are required to support claims of generalizability. The full manuscript describes the prototype setup at a high level, but we will expand the relevant section to include room dimensions, device placements, number of trials and test points, multipath profiles, material properties, dataset size, and any statistical tests. These additions will allow readers to better evaluate the 84% figure for sub-0.6 m errors. revision: yes

  2. Referee: Results section: absence of single-technology baselines (Wi-Fi-only or 5G-only) or alternative fusion architectures prevents isolating the contribution of the proposed multi-encoder fusion-attention mechanism from potential benefits of simply using multiple bands.

    Authors: We acknowledge that single-technology baselines are necessary to isolate the benefit of the proposed fusion-attention mechanism. In the revised manuscript we will add Wi-Fi-only and 5G-only results to the evaluation. For alternative fusion architectures, we will include a discussion of simpler multi-band fusion approaches and, space permitting, comparative results to highlight the specific advantages of the multi-encoder design. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical prototype results are direct measurements

full rationale

The paper introduces the FAWN multi-encoder fusion-attention architecture (based on transformers) to fuse Wi-Fi and 5G signals for passive ISAC indoor scene inference. It then describes building a prototype and integrating it in a real scenario, reporting empirical error statistics (below 0.6 m in ~84% of cases). No derivation chain, first-principles equations, or predictions are present that reduce by construction to fitted parameters, self-citations, or renamed inputs. Performance numbers are presented as direct outcomes from the built system rather than quantities defined in terms of the model itself. This is the most common honest finding for an empirical systems paper; the central claim rests on external falsifiable measurements, not tautological reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities beyond the general assumption that a transformer-based fusion network can be trained to extract scene information from combined radio signals. The central claim rests on the empirical performance of the prototype.

free parameters (1)
  • Network hyperparameters and weights
    The transformer-based multi-encoder model contains numerous learnable parameters fitted during training; specific values or counts are not reported.

pith-pipeline@v0.9.0 · 5781 in / 1231 out tokens · 61098 ms · 2026-05-21T21:47:35.772301+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

  1. [1]

    A survey on integrated sensing, communication, and computation,

    D. Wen, Y . Zhou, X. Li, Y . Shi, K. Huang, and K. B. Letaief, “A survey on integrated sensing, communication, and computation,”IEEE Communications Surveys & Tutorials, pp. 1–1, 2024

  2. [2]

    A robust CSI-based Wi-Fi passive sensing method using attention mechanism deep learning,

    Z. He, X. Zhang, Y . Wang, Y . Lin, G. Gui, and H. Gacanin, “A robust CSI-based Wi-Fi passive sensing method using attention mechanism deep learning,”IEEE Internet of Things Journal, vol. 10, 2023

  3. [3]

    WiSigPro: Transformer for elevating CSI-based human activity recognition through attention mechanisms,

    A. Hussain, Y . Chen, A. Ullah, and S. Zhang, “WiSigPro: Transformer for elevating CSI-based human activity recognition through attention mechanisms,”Expert Systems with Applications, vol. 258, p. 124976, 2024

  4. [4]

    5G-based passive radar sensing for human activity recognition using deep learning,

    M. Dwivedi, I. E. L. Hulede, O. Venegas, J. Ashdown, and A. Mukher- jee, “5G-based passive radar sensing for human activity recognition using deep learning,” in2024 Radar Conference (RadarConf24). IEEE, 2024, pp. 1–6

  5. [5]

    5G-based passive radar utilizing channel response estimated via reference signals,

    M. Wypich, R. Maksymiuk, and T. P. Zielinski, “5G-based passive radar utilizing channel response estimated via reference signals,”IEEE Transactions on Radar Systems, 2025. [6]IEEE Standard for Information Technology–Telecommunications and Information Exchange Between Systems–Local and Metropolitan Area Networks–Specific Requirements–Part 11: Wireless LAN ...

  6. [6]

    P2SLAM: Bearing based WiFi SLAM for indoor robots,

    A. Arun, R. Ayyalasomayajula, W. Hunter, and D. Bharadia, “P2SLAM: Bearing based WiFi SLAM for indoor robots,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3326–3333, 2022

  7. [7]

    WiFi-CSI difference paradigm: Achieving efficient doppler speed estimation for passive tracking,

    W. Li, R. Gao, J. Xiong, J. Zhou, L. Wang, X. Mao, E. Yi, and D. Zhang, “WiFi-CSI difference paradigm: Achieving efficient doppler speed estimation for passive tracking,”Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 8, no. 2, pp. 1–29, 2024

  8. [8]

    Gradient-based learning applied to document recognition,

    Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 2002

  9. [9]

    Model complexity of deep learning: A survey,

    X. Hu, L. Chu, J. Pei, W. Liu, and J. Bian, “Model complexity of deep learning: A survey,”Knowledge and Information Systems, vol. 63, no. 10, pp. 2585–2619, 2021

  10. [10]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

  11. [11]

    waveSLAM: Empowering accurate indoor mapping using off-the-shelf millimeter-wave self-sensing,

    P. Picazo, M. Groshev, A. Blanco, C. Fiandrino, A. de La Oliva, and J. Widmer, “waveSLAM: Empowering accurate indoor mapping using off-the-shelf millimeter-wave self-sensing,” inIEEE 98th Vehicular Technology Conference (VTC2023-Fall), 2023, pp. 1–7

  12. [12]

    Radio sensing using 5G signals: Concepts, state of the art, and challenges,

    Y . Chen, J. Zhang, W. Feng, and M.-S. Alouini, “Radio sensing using 5G signals: Concepts, state of the art, and challenges,”IEEE Internet of Things Journal, vol. 9, no. 2, pp. 1037–1052, 2022

  13. [13]

    Integrated Sensing and Communication (ISAC) for vehicles: Bistatic radar with 5G-NR signals,

    N. K. Nataraja, S. Sharma, K. Ali, F. Bai, R. Wang, and A. F. Molisch, “Integrated Sensing and Communication (ISAC) for vehicles: Bistatic radar with 5G-NR signals,”IEEE Transactions on Vehicular Technology, vol. 74, no. 4, pp. 6121–6137, 2025

  14. [14]

    Ericsson Indoor Planner for iOS,

    Ericsson AB, “Ericsson Indoor Planner for iOS,” https: //ericsson-indoor-planner-ios.soft112.com/, 2019, accessed: 2025- 07-23. Carlos Barroso-Fern ´andezgot his M.Sc. in 2022 and is a Ph.D. student at Universidad Carlos III de Madrid. Alejandro Calvillo-Fernandezgot his M.Sc. in 2024 and is a Ph.D. student at Universidad Carlos III de Madrid. Antonio de ...