FAWN: A MultiEncoder Fusion-Attention Wave Network for Integrated Sensing and Communication Indoor Scene Inference

Alejandro Calvillo-Fernandez; Antonio de la Oliva; Carlos Barroso-Fern\'andez; Carlos J. Bernardos

arxiv: 2509.14968 · v3 · pith:7ADCNWQ4new · submitted 2025-09-18 · 💻 cs.LG · cs.NI

FAWN: A MultiEncoder Fusion-Attention Wave Network for Integrated Sensing and Communication Indoor Scene Inference

Carlos Barroso-Fern\'andez , Alejandro Calvillo-Fernandez , Antonio de la Oliva , Carlos J. Bernardos This is my paper

Pith reviewed 2026-05-21 21:47 UTC · model grok-4.3

classification 💻 cs.LG cs.NI

keywords FAWNIntegrated Sensing and CommunicationWi-Fi5Gindoor scene inferencepassive sensingtransformer networkmulti-encoder fusion

0 comments

The pith

FAWN fuses Wi-Fi and 5G signals to achieve sub-meter accuracy in passive indoor scene inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

FAWN is a transformer-based network that fuses information from Wi-Fi and 5G to enable passive sensing of indoor environments. This integration allows the system to understand physical spaces by reusing existing communication signals without causing interference. The design addresses the accuracy limits of single-technology approaches by combining different wireless spectrums for wider coverage. A real prototype demonstrates the method in practice, with positioning errors staying below 0.6 meters in roughly 84 percent of cases. The work shows how current networks can gain environmental awareness using infrastructure already in place.

Core claim

FAWN, a MultiEncoder Fusion-Attention Wave Network based on the original transformers architecture, fuses information from Wi-Fi and 5G, making the network capable of understanding the physical world without interfering with the current communication, as shown by a real-scenario prototype with errors below 0.6 m around 84% of the time.

What carries the argument

The MultiEncoder Fusion-Attention Wave Network, a transformer-based architecture that uses multiple encoders and attention mechanisms to integrate signals from different wireless technologies for indoor scene inference.

If this is right

Combining Wi-Fi and 5G increases the accuracy reachable for passive indoor sensing beyond single-technology limits.
The passive approach reuses existing communications to sense the environment without dedicated hardware or interference.
Leveraging different spectrums augments the coverage area for scene inference tasks.
Integration into a real prototype confirms the architecture works with current wireless infrastructure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The fusion technique could apply to other wireless standards to further boost sensing performance.
Such systems might support practical uses like indoor navigation or smart environment monitoring.
Testing in additional building types would help identify where the accuracy gains hold or need adjustment.

Load-bearing premise

A single real-scenario prototype is sufficient to show that fusing Wi-Fi and 5G via this architecture reliably augments coverage and accuracy for general indoor scene inference tasks.

What would settle it

Repeated tests across multiple varied indoor environments where errors exceed 0.6 m in more than 16 percent of cases would indicate the fusion method does not deliver the claimed general reliability.

Figures

Figures reproduced from arXiv: 2509.14968 by Alejandro Calvillo-Fernandez, Antonio de la Oliva, Carlos Barroso-Fern\'andez, Carlos J. Bernardos.

**Figure 2.** Figure 2: The flow of how CSI is extracted from the real environment (top left picture) using the USRPs acting as passive receivers of 5G SSBs and Wi-Fi [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: ECDFs of the position error in meters. We can see in Table II a comparison of the ability of all the solutions to classify if the inferred target is a person or a robot, and the time and energy spend on inference. To evaluate the classification performance, we choose two well known metrics: the F1-Score, meaning that for higher values, the model balances precision and recall better; and accuracy, number of… view at source ↗

**Figure 4.** Figure 4: Heatmap of the mean error in the different locations of the laboratory [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

The upcoming generations of wireless technologies promise an era where everything is interconnected and intelligent. As the need for intelligence grows, networks must learn to better understand the physical world. However, deploying dedicated hardware to perceive the environment is not always feasible, mainly due to costs and/or complexity. Integrated Sensing and Communication (ISAC) has made a step forward in addressing this challenge. Within ISAC, passive sensing emerges as a cost-effective solution that reuses wireless communications to sense the environment, without interfering with existing communications. Nevertheless, the majority of current solutions are limited to one technology (mostly Wi-Fi or 5G), constraining the maximum accuracy reachable. As different technologies work with different spectrums, we see a necessity in integrating more than one technology to augment the coverage area. Hence, we take the advantage of ISAC passive sensing, to present FAWN, a MultiEncoder Fusion-Attention Wave Network for ISAC indoor scene inference. FAWN is based on the original transformers architecture, to fuse information from Wi-Fi and 5G, making the network capable of understanding the physical world without interfering with the current communication. To test our solution, we have built a prototype and integrated it in a real scenario. Results show errors below 0.6 m around 84% of times.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FAWN fuses Wi-Fi and 5G via a multi-encoder transformer for passive ISAC sensing but the single-prototype results leave the general claims under-supported.

read the letter

The key takeaway is that this paper presents FAWN, a transformer-inspired multi-encoder fusion-attention network that combines Wi-Fi and 5G signals for passive integrated sensing and communication in indoor environments. They report that in a built prototype, the system achieves localization errors below 0.6 meters around 84 percent of the time. The approach aims to improve accuracy and coverage by leveraging multiple existing wireless standards without disrupting communications. What the work does reasonably well is apply attention mechanisms across separate encoders for each technology to fuse their sensing capabilities. This addresses a real limitation in current ISAC solutions that stick to one radio type, like Wi-Fi alone. By integrating different spectra, it could offer more robust scene inference in varied indoor settings. Building an actual prototype and testing it in a real scenario adds some credibility compared to simulation-only papers. The main soft spot is the limited experimental validation. Details on the test environment, number of trials, baseline methods, or error distributions are not provided, making it tough to assess how much the fusion contributes. A single scenario raises questions about whether the performance generalizes across different room layouts, device positions, or interference levels, as the stress-test note points out. If the full paper has more on this, it would help, but based on the summary, the results feel preliminary. This paper would appeal to researchers in wireless communications and sensing, especially those working on 6G and smart spaces where reusing infrastructure for perception is a goal. Someone looking for new architectures in multi-technology ISAC might get value from the design choices, even if they need to verify the claims with their own setups. I would suggest it goes to peer review. The concept has merit in an emerging field, and feedback from referees could guide improvements in the evaluation to better support the fusion benefits.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes FAWN, a transformer-based MultiEncoder Fusion-Attention Wave Network for passive Integrated Sensing and Communication (ISAC) indoor scene inference. It fuses Wi-Fi and 5G signals to enable environment perception without dedicated hardware or interference with existing communications, claiming augmented coverage and accuracy. A real-scenario prototype is reported to achieve positioning errors below 0.6 m in approximately 84% of cases.

Significance. If the fusion mechanism proves robust, the work could advance multi-technology passive ISAC by demonstrating practical integration of heterogeneous wireless signals for scene inference using existing infrastructure. The prototype provides an initial feasibility demonstration, though broader validation would be needed to establish general impact.

major comments (2)

Abstract and prototype evaluation: the central claim that FAWN reliably augments coverage and accuracy via Wi-Fi/5G fusion rests on results from a single real-scenario prototype. No information is supplied on experimental design (room dimensions, device placements, number of trials or test points, multipath profiles, or material properties), baselines, dataset size, or statistical tests, so it is not possible to determine whether the 84% figure for sub-0.6 m errors generalizes beyond the specific tested geometry and hardware configuration.
Results section: absence of single-technology baselines (Wi-Fi-only or 5G-only) or alternative fusion architectures prevents isolating the contribution of the proposed multi-encoder fusion-attention mechanism from potential benefits of simply using multiple bands.

minor comments (2)

Specify the exact performance metric (e.g., CDF value at 0.6 m) and total number of measurements rather than the phrasing 'around 84% of times'.
Clarify notation for the wave-network components and how the fusion-attention layers combine encoder outputs from the two technologies.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our experimental results and the contribution of the fusion mechanism. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses

Referee: Abstract and prototype evaluation: the central claim that FAWN reliably augments coverage and accuracy via Wi-Fi/5G fusion rests on results from a single real-scenario prototype. No information is supplied on experimental design (room dimensions, device placements, number of trials or test points, multipath profiles, or material properties), baselines, dataset size, or statistical tests, so it is not possible to determine whether the 84% figure for sub-0.6 m errors generalizes beyond the specific tested geometry and hardware configuration.

Authors: We agree that additional details on the experimental design are required to support claims of generalizability. The full manuscript describes the prototype setup at a high level, but we will expand the relevant section to include room dimensions, device placements, number of trials and test points, multipath profiles, material properties, dataset size, and any statistical tests. These additions will allow readers to better evaluate the 84% figure for sub-0.6 m errors. revision: yes
Referee: Results section: absence of single-technology baselines (Wi-Fi-only or 5G-only) or alternative fusion architectures prevents isolating the contribution of the proposed multi-encoder fusion-attention mechanism from potential benefits of simply using multiple bands.

Authors: We acknowledge that single-technology baselines are necessary to isolate the benefit of the proposed fusion-attention mechanism. In the revised manuscript we will add Wi-Fi-only and 5G-only results to the evaluation. For alternative fusion architectures, we will include a discussion of simpler multi-band fusion approaches and, space permitting, comparative results to highlight the specific advantages of the multi-encoder design. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical prototype results are direct measurements

full rationale

The paper introduces the FAWN multi-encoder fusion-attention architecture (based on transformers) to fuse Wi-Fi and 5G signals for passive ISAC indoor scene inference. It then describes building a prototype and integrating it in a real scenario, reporting empirical error statistics (below 0.6 m in ~84% of cases). No derivation chain, first-principles equations, or predictions are present that reduce by construction to fitted parameters, self-citations, or renamed inputs. Performance numbers are presented as direct outcomes from the built system rather than quantities defined in terms of the model itself. This is the most common honest finding for an empirical systems paper; the central claim rests on external falsifiable measurements, not tautological reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities beyond the general assumption that a transformer-based fusion network can be trained to extract scene information from combined radio signals. The central claim rests on the empirical performance of the prototype.

free parameters (1)

Network hyperparameters and weights
The transformer-based multi-encoder model contains numerous learnable parameters fitted during training; specific values or counts are not reported.

pith-pipeline@v0.9.0 · 5781 in / 1231 out tokens · 61098 ms · 2026-05-21T21:47:35.772301+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

FAWN is based on the original transformers architecture, to fuse information from Wi-Fi and 5G... Multi-encoder attention based on transformer architecture.
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Results show errors below 0.6 m around 84% of times.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

A survey on integrated sensing, communication, and computation,

D. Wen, Y . Zhou, X. Li, Y . Shi, K. Huang, and K. B. Letaief, “A survey on integrated sensing, communication, and computation,”IEEE Communications Surveys & Tutorials, pp. 1–1, 2024

work page 2024
[2]

A robust CSI-based Wi-Fi passive sensing method using attention mechanism deep learning,

Z. He, X. Zhang, Y . Wang, Y . Lin, G. Gui, and H. Gacanin, “A robust CSI-based Wi-Fi passive sensing method using attention mechanism deep learning,”IEEE Internet of Things Journal, vol. 10, 2023

work page 2023
[3]

WiSigPro: Transformer for elevating CSI-based human activity recognition through attention mechanisms,

A. Hussain, Y . Chen, A. Ullah, and S. Zhang, “WiSigPro: Transformer for elevating CSI-based human activity recognition through attention mechanisms,”Expert Systems with Applications, vol. 258, p. 124976, 2024

work page 2024
[4]

5G-based passive radar sensing for human activity recognition using deep learning,

M. Dwivedi, I. E. L. Hulede, O. Venegas, J. Ashdown, and A. Mukher- jee, “5G-based passive radar sensing for human activity recognition using deep learning,” in2024 Radar Conference (RadarConf24). IEEE, 2024, pp. 1–6

work page 2024
[5]

5G-based passive radar utilizing channel response estimated via reference signals,

M. Wypich, R. Maksymiuk, and T. P. Zielinski, “5G-based passive radar utilizing channel response estimated via reference signals,”IEEE Transactions on Radar Systems, 2025. [6]IEEE Standard for Information Technology–Telecommunications and Information Exchange Between Systems–Local and Metropolitan Area Networks–Specific Requirements–Part 11: Wireless LAN ...

work page 2025
[6]

P2SLAM: Bearing based WiFi SLAM for indoor robots,

A. Arun, R. Ayyalasomayajula, W. Hunter, and D. Bharadia, “P2SLAM: Bearing based WiFi SLAM for indoor robots,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3326–3333, 2022

work page 2022
[7]

WiFi-CSI difference paradigm: Achieving efficient doppler speed estimation for passive tracking,

W. Li, R. Gao, J. Xiong, J. Zhou, L. Wang, X. Mao, E. Yi, and D. Zhang, “WiFi-CSI difference paradigm: Achieving efficient doppler speed estimation for passive tracking,”Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 8, no. 2, pp. 1–29, 2024

work page 2024
[8]

Gradient-based learning applied to document recognition,

Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 2002

work page 2002
[9]

Model complexity of deep learning: A survey,

X. Hu, L. Chu, J. Pei, W. Liu, and J. Bian, “Model complexity of deep learning: A survey,”Knowledge and Information Systems, vol. 63, no. 10, pp. 2585–2619, 2021

work page 2021
[10]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

work page 2017
[11]

waveSLAM: Empowering accurate indoor mapping using off-the-shelf millimeter-wave self-sensing,

P. Picazo, M. Groshev, A. Blanco, C. Fiandrino, A. de La Oliva, and J. Widmer, “waveSLAM: Empowering accurate indoor mapping using off-the-shelf millimeter-wave self-sensing,” inIEEE 98th Vehicular Technology Conference (VTC2023-Fall), 2023, pp. 1–7

work page 2023
[12]

Radio sensing using 5G signals: Concepts, state of the art, and challenges,

Y . Chen, J. Zhang, W. Feng, and M.-S. Alouini, “Radio sensing using 5G signals: Concepts, state of the art, and challenges,”IEEE Internet of Things Journal, vol. 9, no. 2, pp. 1037–1052, 2022

work page 2022
[13]

Integrated Sensing and Communication (ISAC) for vehicles: Bistatic radar with 5G-NR signals,

N. K. Nataraja, S. Sharma, K. Ali, F. Bai, R. Wang, and A. F. Molisch, “Integrated Sensing and Communication (ISAC) for vehicles: Bistatic radar with 5G-NR signals,”IEEE Transactions on Vehicular Technology, vol. 74, no. 4, pp. 6121–6137, 2025

work page 2025
[14]

Ericsson Indoor Planner for iOS,

Ericsson AB, “Ericsson Indoor Planner for iOS,” https: //ericsson-indoor-planner-ios.soft112.com/, 2019, accessed: 2025- 07-23. Carlos Barroso-Fern ´andezgot his M.Sc. in 2022 and is a Ph.D. student at Universidad Carlos III de Madrid. Alejandro Calvillo-Fernandezgot his M.Sc. in 2024 and is a Ph.D. student at Universidad Carlos III de Madrid. Antonio de ...

work page 2019

[1] [1]

A survey on integrated sensing, communication, and computation,

D. Wen, Y . Zhou, X. Li, Y . Shi, K. Huang, and K. B. Letaief, “A survey on integrated sensing, communication, and computation,”IEEE Communications Surveys & Tutorials, pp. 1–1, 2024

work page 2024

[2] [2]

A robust CSI-based Wi-Fi passive sensing method using attention mechanism deep learning,

Z. He, X. Zhang, Y . Wang, Y . Lin, G. Gui, and H. Gacanin, “A robust CSI-based Wi-Fi passive sensing method using attention mechanism deep learning,”IEEE Internet of Things Journal, vol. 10, 2023

work page 2023

[3] [3]

WiSigPro: Transformer for elevating CSI-based human activity recognition through attention mechanisms,

A. Hussain, Y . Chen, A. Ullah, and S. Zhang, “WiSigPro: Transformer for elevating CSI-based human activity recognition through attention mechanisms,”Expert Systems with Applications, vol. 258, p. 124976, 2024

work page 2024

[4] [4]

5G-based passive radar sensing for human activity recognition using deep learning,

M. Dwivedi, I. E. L. Hulede, O. Venegas, J. Ashdown, and A. Mukher- jee, “5G-based passive radar sensing for human activity recognition using deep learning,” in2024 Radar Conference (RadarConf24). IEEE, 2024, pp. 1–6

work page 2024

[5] [5]

5G-based passive radar utilizing channel response estimated via reference signals,

M. Wypich, R. Maksymiuk, and T. P. Zielinski, “5G-based passive radar utilizing channel response estimated via reference signals,”IEEE Transactions on Radar Systems, 2025. [6]IEEE Standard for Information Technology–Telecommunications and Information Exchange Between Systems–Local and Metropolitan Area Networks–Specific Requirements–Part 11: Wireless LAN ...

work page 2025

[6] [6]

P2SLAM: Bearing based WiFi SLAM for indoor robots,

A. Arun, R. Ayyalasomayajula, W. Hunter, and D. Bharadia, “P2SLAM: Bearing based WiFi SLAM for indoor robots,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3326–3333, 2022

work page 2022

[7] [7]

WiFi-CSI difference paradigm: Achieving efficient doppler speed estimation for passive tracking,

W. Li, R. Gao, J. Xiong, J. Zhou, L. Wang, X. Mao, E. Yi, and D. Zhang, “WiFi-CSI difference paradigm: Achieving efficient doppler speed estimation for passive tracking,”Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 8, no. 2, pp. 1–29, 2024

work page 2024

[8] [8]

Gradient-based learning applied to document recognition,

Y . LeCun, L. Bottou, Y . Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,”Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 2002

work page 2002

[9] [9]

Model complexity of deep learning: A survey,

X. Hu, L. Chu, J. Pei, W. Liu, and J. Bian, “Model complexity of deep learning: A survey,”Knowledge and Information Systems, vol. 63, no. 10, pp. 2585–2619, 2021

work page 2021

[10] [10]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

work page 2017

[11] [11]

waveSLAM: Empowering accurate indoor mapping using off-the-shelf millimeter-wave self-sensing,

P. Picazo, M. Groshev, A. Blanco, C. Fiandrino, A. de La Oliva, and J. Widmer, “waveSLAM: Empowering accurate indoor mapping using off-the-shelf millimeter-wave self-sensing,” inIEEE 98th Vehicular Technology Conference (VTC2023-Fall), 2023, pp. 1–7

work page 2023

[12] [12]

Radio sensing using 5G signals: Concepts, state of the art, and challenges,

Y . Chen, J. Zhang, W. Feng, and M.-S. Alouini, “Radio sensing using 5G signals: Concepts, state of the art, and challenges,”IEEE Internet of Things Journal, vol. 9, no. 2, pp. 1037–1052, 2022

work page 2022

[13] [13]

Integrated Sensing and Communication (ISAC) for vehicles: Bistatic radar with 5G-NR signals,

N. K. Nataraja, S. Sharma, K. Ali, F. Bai, R. Wang, and A. F. Molisch, “Integrated Sensing and Communication (ISAC) for vehicles: Bistatic radar with 5G-NR signals,”IEEE Transactions on Vehicular Technology, vol. 74, no. 4, pp. 6121–6137, 2025

work page 2025

[14] [14]

Ericsson Indoor Planner for iOS,

Ericsson AB, “Ericsson Indoor Planner for iOS,” https: //ericsson-indoor-planner-ios.soft112.com/, 2019, accessed: 2025- 07-23. Carlos Barroso-Fern ´andezgot his M.Sc. in 2022 and is a Ph.D. student at Universidad Carlos III de Madrid. Alejandro Calvillo-Fernandezgot his M.Sc. in 2024 and is a Ph.D. student at Universidad Carlos III de Madrid. Antonio de ...

work page 2019