SAGE: Training-Free Semantic Evidence Composition for Edge-Cloud Inference under Hard Uplink Budgets

Hyuncheol Park; Inhyeok Choi

arxiv: 2604.19623 · v1 · submitted 2026-04-21 · 💻 cs.LG · cs.CV· eess.SP

SAGE: Training-Free Semantic Evidence Composition for Edge-Cloud Inference under Hard Uplink Budgets

Inhyeok Choi , Hyuncheol Park This is my paper

Pith reviewed 2026-05-10 03:49 UTC · model grok-4.3

classification 💻 cs.LG cs.CVeess.SP

keywords edge-cloud inferencesemantic evidence compositionuplink budget constraintstraining-free methodimportance filteringembedding diversity samplingImageNet-1K classificationhybrid AI systems

0 comments

The pith

SAGE shows that mixing importance with diversity in transmitted evidence allows edge-cloud systems to achieve near-server accuracy under tight uplink budgets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that standard importance-based selection for what to send from edge to cloud falls short when the number of bits is strictly limited. It finds that including some less important but complementary pieces of evidence improves the server's ability to classify correctly, and that even picking evidence evenly across space works reasonably well without looking at content. From this, SAGE is built as a simple way to first keep important parts then add diverse ones based on their embeddings. This approach reaches most of the possible accuracy while sending much less data, which matters for making hybrid AI systems work on real networks with limited upload speeds.

Core claim

The authors establish that attention-based importance alone is inherently limited under hard uplink budgets because the transmitted set must cover diverse aspects of the input rather than just the individually most salient units. They support this with two observations: swapping in complementary low-importance units raises accuracy, and uniform spatial selection performs competitively at moderate budgets. SAGE implements this insight through a training-free pipeline of importance filtering combined with embedding-diversity sampling, delivering 93 percent of the full server accuracy on ImageNet-1K while using fewer than half the available evidence units and clearly beating pure importance

What carries the argument

SAGE, the Semantic Attention-Guided Evidence composition method that first applies importance filtering then performs embedding-diversity sampling to build the transmitted evidence set.

If this is right

SAGE achieves 93% of server ceiling accuracy on ImageNet-1K with under half the evidence units.
It outperforms importance-only methods substantially under the same hard budgets.
The method requires no additional training, allowing direct use with existing models.
Both complementary diversity and spatial uniformity contribute independently to better server performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar selection strategies could extend to video or audio streams where continuous uplink is even more constrained.
The finding that diversity matters more than raw importance may apply to other budgeted communication tasks in distributed AI.
Testing SAGE on different architectures or real-world network traces would confirm its robustness beyond the reported ImageNet results.

Load-bearing premise

The assumption that replacing high-importance evidence units with lower-importance but complementary ones will reliably increase the remote model's accuracy for a wide range of inputs.

What would settle it

A controlled test on a held-out dataset where importance-only selection under the same budget constraint matches or exceeds the accuracy achieved by SAGE's combined filtering and diversity approach.

Figures

Figures reproduced from arXiv: 2604.19623 by Hyuncheol Park, Inhyeok Choi.

**Figure 2.** Figure 2: Fraction of offloaded images whose patch count under cumulative [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Offloaded accuracy vs. budget on ImageNet-1K ( [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of patch selection on ImageNet-1K ( [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Budget–accuracy trade-off on ImageNet-1K under four confidence-gate settings ( [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: SAGE accuracy gain (pp) by client attention entropy tertile on [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: SAGE accuracy gain (pp) over Attention Prefix across four client– [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Accuracy vs. normalized average communication cost on ImageNet-1K. Cost is defined as [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Deployment operating points for SAGE. Each line connects budgets [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

read the original abstract

Edge-cloud hybrid inference offloads difficult inputs to a powerful remote model, but the uplink channel imposes hard per-request constraints on the number of bits that can be transmitted. We show that selecting transmitted content based solely on attention-based importance, the standard approach in collaborative inference, is inherently limited under hard budgets. Two findings support this claim. First, replacing high-importance units with low-importance but complementary ones improves server accuracy. This shows that what matters is not individual importance but how well the transmitted set covers diverse aspects of the input. Second, spatially uniform selection without any content information achieves competitive accuracy at moderate budgets. This confirms that spatial coverage alone carries independent value. Based on this analysis, we propose SAGE (Semantic Attention-Guided Evidence), a principled, training-free method that combines importance filtering with embedding-diversity sampling. SAGE achieves 93% of the server ceiling in offloaded accuracy while transmitting fewer than half of the available evidence units on ImageNet-1K, substantially outperforming importance-only composition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SAGE gives a clean training-free recipe for picking what to send under tight uplink limits by blending importance with embedding diversity, and the ImageNet numbers look usable even if the supporting experiments stay light on controls.

read the letter

The core contribution is a straightforward way to handle hard per-request bit budgets in edge-cloud setups: first drop low-importance patches via attention, then pick a diverse subset from the remaining embeddings. This beats pure importance selection and gets to 93% of the full server accuracy while sending less than half the evidence units on ImageNet-1K. The two motivating observations—that swapping in complementary low-importance units can help and that uniform spatial sampling holds its own at moderate budgets—are presented as general principles rather than dataset quirks, which is the part that makes the design feel principled instead of ad-hoc. No training is required, which keeps it deployable on existing models. The paper stays focused on the uplink constraint and avoids overclaiming generality beyond vision classification. The main soft spot is that the abstract and stress-test note leave the replacement and uniform-selection results without reported effect sizes, statistical tests, or cross-model checks, so it is not yet clear how robust the motivation is versus how much it depends on the particular attention extractor or ImageNet statistics. If the full experiments include proper ablations and error bars, that gap closes; otherwise the outperformance could partly be an artifact of the chosen baselines. The work is aimed at researchers building hybrid inference pipelines for bandwidth-constrained devices. A reader already working on collaborative inference or mobile vision will get immediate value from the method and the concrete numbers. It deserves a serious referee because the problem is real, the approach is simple and reproducible in principle, and the claimed gains are large enough to matter if they hold up under scrutiny.

Referee Report

2 major / 2 minor

Summary. The paper introduces SAGE, a training-free method for semantic evidence composition in edge-cloud hybrid inference under hard per-request uplink budgets. It argues that attention-based importance-only selection is inherently limited, supported by two empirical findings on ImageNet-1K: replacing high-importance units with low-importance but complementary ones improves server accuracy, and spatially uniform selection without content information is competitive at moderate budgets. SAGE combines importance filtering with embedding-diversity sampling and reports achieving 93% of server-ceiling offloaded accuracy while transmitting fewer than half the available evidence units, substantially outperforming importance-only baselines.

Significance. If the results and design rationale hold, the work provides a practical, training-free technique for reducing uplink transmission costs in collaborative inference without sacrificing much accuracy. The use of standard ImageNet-1K benchmarks and emphasis on being parameter-free are positive aspects that could serve as a reproducible baseline for future edge-cloud systems research.

major comments (2)

[Experiments section (around the description of the two findings)] The two supporting findings (complementary replacement and value of uniform selection) are presented as motivating the SAGE design, but the manuscript does not report effect sizes, statistical significance, error bars, or matched-budget controls for these experiments. This leaves the justification for combining importance with diversity under-specified and risks the central 93%-of-ceiling claim being an artifact of the particular attention extractor or dataset statistics rather than a general principle.
[§3 (Method)] The exact mechanism for 'embedding-diversity sampling' and how it interacts with importance filtering (e.g., the diversity metric, sampling procedure, and any thresholds) is not formalized with equations or pseudocode. Without this, it is difficult to verify the claim that SAGE is 'principled' and fully training-free or to reproduce the reported outperformance.

minor comments (2)

[Figures and tables] Add error bars, exact baseline definitions, and the precise definition of 'evidence units' to all accuracy-vs-budget plots and tables for clarity.
[Abstract] The abstract states 'substantially outperforming' without quantifying the gap or naming the exact importance-only comparator; a brief numerical comparison would help.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to improve statistical rigor and formalization.

read point-by-point responses

Referee: [Experiments section (around the description of the two findings)] The two supporting findings (complementary replacement and value of uniform selection) are presented as motivating the SAGE design, but the manuscript does not report effect sizes, statistical significance, error bars, or matched-budget controls for these experiments. This leaves the justification for combining importance with diversity under-specified and risks the central 93%-of-ceiling claim being an artifact of the particular attention extractor or dataset statistics rather than a general principle.

Authors: We agree that the motivating experiments would be strengthened by reporting effect sizes, error bars from repeated trials, statistical significance tests, and explicit matched-budget controls. In the revised version we will add these elements to the Experiments section. This will provide quantitative backing for the two findings, clarify the rationale for combining importance and diversity sampling, and help demonstrate that the 93% performance holds under controlled conditions rather than being an artifact. revision: yes
Referee: [§3 (Method)] The exact mechanism for 'embedding-diversity sampling' and how it interacts with importance filtering (e.g., the diversity metric, sampling procedure, and any thresholds) is not formalized with equations or pseudocode. Without this, it is difficult to verify the claim that SAGE is 'principled' and fully training-free or to reproduce the reported outperformance.

Authors: We concur that the embedding-diversity sampling procedure requires more precise specification. In the revision we will augment §3 with equations that define the diversity metric (embedding cosine distance), the sampling algorithm, the interaction with the importance filter, and any thresholds used. We will also include pseudocode for the complete SAGE procedure. These additions will facilitate verification and reproduction while preserving the method's training-free and parameter-free character; the design remains principled because it is directly derived from the empirical observations reported earlier in the paper. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical motivation independent of method performance

full rationale

The paper's chain begins with two stated empirical findings (replacement of high-importance units by complementary ones, and competitive performance of uniform selection) drawn from ImageNet-1K experiments, which are then used to motivate the design of SAGE. These findings are presented as observations rather than derived predictions, and the subsequent method is training-free with no fitted parameters or equations that reduce outputs to inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked in the provided text to close any loop. The central accuracy claims rest on external benchmarks and direct comparisons, making the derivation self-contained against the data.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions from attention mechanisms in vision models and the value of embedding diversity for coverage, without introducing new free parameters or invented entities in the abstract description.

axioms (2)

domain assumption Attention scores reliably indicate semantic importance of input units for the server model
Invoked when contrasting SAGE against the standard importance-only approach.
domain assumption Embedding diversity provides complementary information independent of individual importance scores
Basis for the diversity sampling step in SAGE.

pith-pipeline@v0.9.0 · 5483 in / 1363 out tokens · 47428 ms · 2026-05-10T03:49:30.568681+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

Edge intelligence: Paving the last mile of artificial intelligence with edge computing,

Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge intelligence: Paving the last mile of artificial intelligence with edge computing,”Proceedings of the IEEE, vol. 107, no. 8, pp. 1738–1762, Aug. 2019

work page 2019
[2]

A survey on mobile edge computing: The communication perspective,

Y . Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: The communication perspective,”IEEE Communications Surveys & Tutorials, vol. 19, no. 4, pp. 2322–2358, 2017

work page 2017
[3]

Communication-computation trade-off in resource-constrained edge inference,

J. Shao and J. Zhang, “Communication-computation trade-off in resource-constrained edge inference,”IEEE Communications Magazine, vol. 58, no. 12, pp. 20–26, Dec. 2020. 9 0.0 0.1 0.2 0.3 0.4 Normalized average communication cost 0 10 20 30 40 50 60 70 Offloaded Accuracy ( %) (a) Offloaded Accuracy 0.0 0.1 0.2 0.3 0.4 Normalized average communication cost 4...

work page 2020
[4]

Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,

Y . Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,” inProceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017, pp. 615–629

work page 2017
[5]

Split computing and early exiting for deep learning applications: Survey and research challenges,

Y . Matsubara, M. Levorato, and F. Restuccia, “Split computing and early exiting for deep learning applications: Survey and research challenges,” ACM Computing Surveys, vol. 55, no. 5, pp. 1–30, 2023

work page 2023
[6]

SPINN: Synergistic progressive inference of neural networks over device and cloud,

S. Laskaridis, S. I. Venieris, M. Almeida, I. Leontiadis, and N. D. Lane, “SPINN: Synergistic progressive inference of neural networks over device and cloud,” inProceedings of the 26th Annual International Conference on Mobile Computing and Networking (MobiCom), 2020, pp. 1–15

work page 2020
[7]

BottleNet++: An end-to-end approach for feature compression in device-edge co-inference systems,

J. Shao and J. Zhang, “BottleNet++: An end-to-end approach for feature compression in device-edge co-inference systems,” inProceedings of the IEEE International Conference on Communications Workshops (ICC Workshops), 2020, pp. 1–6

work page 2020
[8]

Learning task-oriented communication for edge inference: An information bottleneck approach,

J. Shao, Y . Mao, and J. Zhang, “Learning task-oriented communication for edge inference: An information bottleneck approach,”IEEE Journal on Selected Areas in Communications, vol. 40, no. 1, pp. 197–211, Jan. 2022

work page 2022
[9]

An image is worth 16×16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16×16 words: Transformers for image recognition at scale,” inProceedings of the International Conference on Learning Representations (ICLR), 2021

work page 2021
[10]

Training data-efficient image transformers & distillation through attention,

H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. J ´egou, “Training data-efficient image transformers & distillation through attention,” inProceedings of the International Conference on Machine Learning (ICML), 2021

work page 2021
[11]

DynamicViT: Efficient vision transformers with dynamic token sparsification,

Y . Rao, W. Zhao, B. Liu, J. Lu, J. Zhou, and C.-J. Hsieh, “DynamicViT: Efficient vision transformers with dynamic token sparsification,” in Advances in Neural Information Processing Systems (NeurIPS), 2021

work page 2021
[12]

Not all patches are what you need: Expediting vision transformers via token reorganizations,

Y . Liang, C. Ge, Z. Tong, Y . Song, J. Wang, and P. Xie, “Not all patches are what you need: Expediting vision transformers via token reorganizations,” inProceedings of the International Conference on Learning Representations (ICLR), 2022

work page 2022
[13]

A-ViT: Adaptive tokens for efficient vision transformer,

H. Yin, A. Vahdat, J. M. Alvarez, A. Mallya, J. Kautz, and P. Molchanov, “A-ViT: Adaptive tokens for efficient vision transformer,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

work page 2022
[14]

Token merging: Your ViT but faster,

D. Bolya, C.-Y . Fu, X. Dai, P. Zhang, C. Feichtenhofer, and J. Hoffman, “Token merging: Your ViT but faster,” inProceedings of the Interna- tional Conference on Learning Representations (ICLR), 2023

work page 2023
[15]

Beyond attentive tokens: Incorporating token importance and diversity for efficient vision transformers,

S. Long, Z. Zhao, J. Pi, S. Wang, and J. Wang, “Beyond attentive tokens: Incorporating token importance and diversity for efficient vision transformers,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

work page 2023
[16]

Attention-aware semantic communications for collaborative inference,

J. Im, N. Kwon, T. Park, J. Woo, J. Lee, and Y . Kim, “Attention-aware semantic communications for collaborative inference,”IEEE Internet of Things Journal, vol. 11, no. 22, pp. 37 008–37 020, Nov. 2024

work page 2024
[17]

Deep learning enabled semantic communication systems,

H. Xie, Z. Qin, G. Y . Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,”IEEE Transactions on Signal Pro- cessing, vol. 69, pp. 2663–2675, 2021

work page 2021
[18]

Deep joint source- channel coding for wireless image transmission,

E. Bourtsoulatze, D. B. Kurka, and D. G ¨und¨uz, “Deep joint source- channel coding for wireless image transmission,”IEEE Transactions on Cognitive Communications and Networking, vol. 5, no. 3, pp. 567–579, Sep. 2019

work page 2019
[19]

Beyond transmitting bits: Context, semantics, and task-oriented communications,

D. G ¨und¨uz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K. K. Wong, and C.-B. Chae, “Beyond transmitting bits: Context, semantics, and task-oriented communications,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 1, pp. 5–41, Jan. 2023

work page 2023
[20]

DivPrune: Diversity-based visual token pruning for large multimodal models,

S. Ranjbar Alvar, G. Singh, M. Akbari, and Y . Zhang, “DivPrune: Diversity-based visual token pruning for large multimodal models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

work page 2025
[21]

AgilePruner: An empirical study of attention and diversity for adaptive visual token pruning in large vision-language models,

C. Baek, J. Song, S. Kim, and K. Kong, “AgilePruner: An empirical study of attention and diversity for adaptive visual token pruning in large vision-language models,” inProceedings of the International Conference on Learning Representations (ICLR), 2026

work page 2026
[22]

Adaptive semantic token selection for AI-native goal-oriented 10 communications,

A. Devoto, S. Petruzzi, J. Pomponi, P. Di Lorenzo, and S. Scarda- pane, “Adaptive semantic token selection for AI-native goal-oriented 10 communications,” inProceedings of the IEEE Global Communications Conference Workshops (GLOBECOM Workshops), 2024

work page 2024
[23]

Vision transformer-based semantic communications with importance-aware quantization,

J. Park, Y . Oh, Y . Kim, and Y .-S. Jeon, “Vision transformer-based semantic communications with importance-aware quantization,”IEEE Internet of Things Journal, 2025

work page 2025
[24]

Bandwidth-agile image transmission with deep joint source-channel coding,

D. B. Kurka and D. G ¨und¨uz, “Bandwidth-agile image transmission with deep joint source-channel coding,”IEEE Transactions on Wireless Communications, vol. 20, no. 12, pp. 8081–8095, Dec. 2021

work page 2021
[25]

Semantic communications for future Internet: Fundamentals, applications, and challenges,

W. Yang, H. Du, Z. Q. Liew, W. Y . B. Lim, Z. Xiong, D. Niyato, X. Chi, X. Shen, and C. Miao, “Semantic communications for future Internet: Fundamentals, applications, and challenges,”IEEE Communications Surveys & Tutorials, vol. 25, no. 1, pp. 213–250, 2023

work page 2023
[26]

Less data, more knowledge: Building next-generation semantic communication networks,

C. Chaccour, W. Saad, M. Debbah, Z. Han, and H. V . Poor, “Less data, more knowledge: Building next-generation semantic communication networks,”IEEE Communications Surveys & Tutorials, vol. 27, no. 1, pp. 37–76, 2025

work page 2025
[27]

6G networks: Beyond Shannon towards semantic and goal-oriented communications,

E. C. Strinati and S. Barbarossa, “6G networks: Beyond Shannon towards semantic and goal-oriented communications,”Computer Net- works, vol. 190, p. 107930, 2021

work page 2021
[28]

Optimal fronthaul compression for synchronization in the uplink of cloud radio access networks,

E. Heo, O. Simeone, and H. Park, “Optimal fronthaul compression for synchronization in the uplink of cloud radio access networks,”EURASIP Journal on Wireless Communications and Networking, vol. 2017, no. 1, p. 22, Jan. 2017

work page 2017
[29]

Adaptive token sampling for efficient vision transformers,

M. Fayyaz, S. A. Koohpayegani, F. R. Jafari, S. Sengupta, H. R. V . Joze, E. Sommerlade, H. Pirsiavash, and J. Gall, “Adaptive token sampling for efficient vision transformers,” inProceedings of the European Conference on Computer Vision (ECCV), 2022

work page 2022
[30]

Coverage-centric coreset selection for high pruning rates,

H. Zheng, R. Liu, F. Lai, and A. Prakash, “Coverage-centric coreset selection for high pruning rates,” inProceedings of the International Conference on Learning Representations (ICLR), 2023

work page 2023
[31]

Go- ing deeper with image transformers,

H. Touvron, M. Cord, A. Sablayrolles, G. Synnaeve, and H. J ´egou, “Go- ing deeper with image transformers,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 32–42. 11

work page 2021

[1] [1]

Edge intelligence: Paving the last mile of artificial intelligence with edge computing,

Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge intelligence: Paving the last mile of artificial intelligence with edge computing,”Proceedings of the IEEE, vol. 107, no. 8, pp. 1738–1762, Aug. 2019

work page 2019

[2] [2]

A survey on mobile edge computing: The communication perspective,

Y . Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: The communication perspective,”IEEE Communications Surveys & Tutorials, vol. 19, no. 4, pp. 2322–2358, 2017

work page 2017

[3] [3]

Communication-computation trade-off in resource-constrained edge inference,

J. Shao and J. Zhang, “Communication-computation trade-off in resource-constrained edge inference,”IEEE Communications Magazine, vol. 58, no. 12, pp. 20–26, Dec. 2020. 9 0.0 0.1 0.2 0.3 0.4 Normalized average communication cost 0 10 20 30 40 50 60 70 Offloaded Accuracy ( %) (a) Offloaded Accuracy 0.0 0.1 0.2 0.3 0.4 Normalized average communication cost 4...

work page 2020

[4] [4]

Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,

Y . Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,” inProceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017, pp. 615–629

work page 2017

[5] [5]

Split computing and early exiting for deep learning applications: Survey and research challenges,

Y . Matsubara, M. Levorato, and F. Restuccia, “Split computing and early exiting for deep learning applications: Survey and research challenges,” ACM Computing Surveys, vol. 55, no. 5, pp. 1–30, 2023

work page 2023

[6] [6]

SPINN: Synergistic progressive inference of neural networks over device and cloud,

S. Laskaridis, S. I. Venieris, M. Almeida, I. Leontiadis, and N. D. Lane, “SPINN: Synergistic progressive inference of neural networks over device and cloud,” inProceedings of the 26th Annual International Conference on Mobile Computing and Networking (MobiCom), 2020, pp. 1–15

work page 2020

[7] [7]

BottleNet++: An end-to-end approach for feature compression in device-edge co-inference systems,

J. Shao and J. Zhang, “BottleNet++: An end-to-end approach for feature compression in device-edge co-inference systems,” inProceedings of the IEEE International Conference on Communications Workshops (ICC Workshops), 2020, pp. 1–6

work page 2020

[8] [8]

Learning task-oriented communication for edge inference: An information bottleneck approach,

J. Shao, Y . Mao, and J. Zhang, “Learning task-oriented communication for edge inference: An information bottleneck approach,”IEEE Journal on Selected Areas in Communications, vol. 40, no. 1, pp. 197–211, Jan. 2022

work page 2022

[9] [9]

An image is worth 16×16 words: Transformers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16×16 words: Transformers for image recognition at scale,” inProceedings of the International Conference on Learning Representations (ICLR), 2021

work page 2021

[10] [10]

Training data-efficient image transformers & distillation through attention,

H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. J ´egou, “Training data-efficient image transformers & distillation through attention,” inProceedings of the International Conference on Machine Learning (ICML), 2021

work page 2021

[11] [11]

DynamicViT: Efficient vision transformers with dynamic token sparsification,

Y . Rao, W. Zhao, B. Liu, J. Lu, J. Zhou, and C.-J. Hsieh, “DynamicViT: Efficient vision transformers with dynamic token sparsification,” in Advances in Neural Information Processing Systems (NeurIPS), 2021

work page 2021

[12] [12]

Not all patches are what you need: Expediting vision transformers via token reorganizations,

Y . Liang, C. Ge, Z. Tong, Y . Song, J. Wang, and P. Xie, “Not all patches are what you need: Expediting vision transformers via token reorganizations,” inProceedings of the International Conference on Learning Representations (ICLR), 2022

work page 2022

[13] [13]

A-ViT: Adaptive tokens for efficient vision transformer,

H. Yin, A. Vahdat, J. M. Alvarez, A. Mallya, J. Kautz, and P. Molchanov, “A-ViT: Adaptive tokens for efficient vision transformer,” inProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

work page 2022

[14] [14]

Token merging: Your ViT but faster,

D. Bolya, C.-Y . Fu, X. Dai, P. Zhang, C. Feichtenhofer, and J. Hoffman, “Token merging: Your ViT but faster,” inProceedings of the Interna- tional Conference on Learning Representations (ICLR), 2023

work page 2023

[15] [15]

Beyond attentive tokens: Incorporating token importance and diversity for efficient vision transformers,

S. Long, Z. Zhao, J. Pi, S. Wang, and J. Wang, “Beyond attentive tokens: Incorporating token importance and diversity for efficient vision transformers,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

work page 2023

[16] [16]

Attention-aware semantic communications for collaborative inference,

J. Im, N. Kwon, T. Park, J. Woo, J. Lee, and Y . Kim, “Attention-aware semantic communications for collaborative inference,”IEEE Internet of Things Journal, vol. 11, no. 22, pp. 37 008–37 020, Nov. 2024

work page 2024

[17] [17]

Deep learning enabled semantic communication systems,

H. Xie, Z. Qin, G. Y . Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,”IEEE Transactions on Signal Pro- cessing, vol. 69, pp. 2663–2675, 2021

work page 2021

[18] [18]

Deep joint source- channel coding for wireless image transmission,

E. Bourtsoulatze, D. B. Kurka, and D. G ¨und¨uz, “Deep joint source- channel coding for wireless image transmission,”IEEE Transactions on Cognitive Communications and Networking, vol. 5, no. 3, pp. 567–579, Sep. 2019

work page 2019

[19] [19]

Beyond transmitting bits: Context, semantics, and task-oriented communications,

D. G ¨und¨uz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K. K. Wong, and C.-B. Chae, “Beyond transmitting bits: Context, semantics, and task-oriented communications,”IEEE Journal on Selected Areas in Communications, vol. 41, no. 1, pp. 5–41, Jan. 2023

work page 2023

[20] [20]

DivPrune: Diversity-based visual token pruning for large multimodal models,

S. Ranjbar Alvar, G. Singh, M. Akbari, and Y . Zhang, “DivPrune: Diversity-based visual token pruning for large multimodal models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

work page 2025

[21] [21]

AgilePruner: An empirical study of attention and diversity for adaptive visual token pruning in large vision-language models,

C. Baek, J. Song, S. Kim, and K. Kong, “AgilePruner: An empirical study of attention and diversity for adaptive visual token pruning in large vision-language models,” inProceedings of the International Conference on Learning Representations (ICLR), 2026

work page 2026

[22] [22]

Adaptive semantic token selection for AI-native goal-oriented 10 communications,

A. Devoto, S. Petruzzi, J. Pomponi, P. Di Lorenzo, and S. Scarda- pane, “Adaptive semantic token selection for AI-native goal-oriented 10 communications,” inProceedings of the IEEE Global Communications Conference Workshops (GLOBECOM Workshops), 2024

work page 2024

[23] [23]

Vision transformer-based semantic communications with importance-aware quantization,

J. Park, Y . Oh, Y . Kim, and Y .-S. Jeon, “Vision transformer-based semantic communications with importance-aware quantization,”IEEE Internet of Things Journal, 2025

work page 2025

[24] [24]

Bandwidth-agile image transmission with deep joint source-channel coding,

D. B. Kurka and D. G ¨und¨uz, “Bandwidth-agile image transmission with deep joint source-channel coding,”IEEE Transactions on Wireless Communications, vol. 20, no. 12, pp. 8081–8095, Dec. 2021

work page 2021

[25] [25]

Semantic communications for future Internet: Fundamentals, applications, and challenges,

W. Yang, H. Du, Z. Q. Liew, W. Y . B. Lim, Z. Xiong, D. Niyato, X. Chi, X. Shen, and C. Miao, “Semantic communications for future Internet: Fundamentals, applications, and challenges,”IEEE Communications Surveys & Tutorials, vol. 25, no. 1, pp. 213–250, 2023

work page 2023

[26] [26]

Less data, more knowledge: Building next-generation semantic communication networks,

C. Chaccour, W. Saad, M. Debbah, Z. Han, and H. V . Poor, “Less data, more knowledge: Building next-generation semantic communication networks,”IEEE Communications Surveys & Tutorials, vol. 27, no. 1, pp. 37–76, 2025

work page 2025

[27] [27]

6G networks: Beyond Shannon towards semantic and goal-oriented communications,

E. C. Strinati and S. Barbarossa, “6G networks: Beyond Shannon towards semantic and goal-oriented communications,”Computer Net- works, vol. 190, p. 107930, 2021

work page 2021

[28] [28]

Optimal fronthaul compression for synchronization in the uplink of cloud radio access networks,

E. Heo, O. Simeone, and H. Park, “Optimal fronthaul compression for synchronization in the uplink of cloud radio access networks,”EURASIP Journal on Wireless Communications and Networking, vol. 2017, no. 1, p. 22, Jan. 2017

work page 2017

[29] [29]

Adaptive token sampling for efficient vision transformers,

M. Fayyaz, S. A. Koohpayegani, F. R. Jafari, S. Sengupta, H. R. V . Joze, E. Sommerlade, H. Pirsiavash, and J. Gall, “Adaptive token sampling for efficient vision transformers,” inProceedings of the European Conference on Computer Vision (ECCV), 2022

work page 2022

[30] [30]

Coverage-centric coreset selection for high pruning rates,

H. Zheng, R. Liu, F. Lai, and A. Prakash, “Coverage-centric coreset selection for high pruning rates,” inProceedings of the International Conference on Learning Representations (ICLR), 2023

work page 2023

[31] [31]

Go- ing deeper with image transformers,

H. Touvron, M. Cord, A. Sablayrolles, G. Synnaeve, and H. J ´egou, “Go- ing deeper with image transformers,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 32–42. 11

work page 2021