Recognition: no theorem link
RailVQA: A Benchmark and Framework for Efficient Interpretable Visual Cognition in Automatic Train Operation
Pith reviewed 2026-05-14 22:19 UTC · model grok-4.3
The pith
A benchmark of 21,000 cab-view QA pairs and a three-module collaborative model framework together enable efficient interpretable visual cognition for automatic train operation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the RailVQA-CoM collaborative large-small model framework, using a transparent three-module architecture for perception, reasoning and planning together with adaptive temporal sampling, substantially improves performance, interpretability, efficiency and cross-domain generalization on visual cognition tasks for automatic train operation when evaluated on the new RailVQA-bench dataset.
What carries the argument
The three-module collaborative large-small model architecture that separates perception, reasoning and decision planning while using adaptive temporal sampling to process video inputs efficiently.
If this is right
- Automatic train systems can perform reliable high-level visual planning in complex environments at reduced computational cost.
- Reasoning steps become more transparent because each module contributes visibly to the final decision.
- Performance gains appear on both static images and dynamic video sequences from cab views.
- The same framework shows improved results when transferred to other autonomous driving domains.
Where Pith is reading between the lines
- The small-large model pairing could be tested on road-vehicle perception tasks where real-time constraints are equally strict.
- The benchmark dataset offers a ready-made way to measure hallucination rates in multimodal models for transportation.
- Live deployment would require separate checks on actual train routes to confirm that efficiency gains preserve safety margins.
Load-bearing premise
The collected 21,168 QA pairs adequately represent rare safety-critical railway corner cases and the three-module design reduces hallucination risk without lowering decision accuracy.
What would settle it
A test set of previously unseen real-world railway corner cases where the collaborative model either hallucinates answers or falls below large-model accuracy while using similar compute would falsify the central claim.
Figures
read the original abstract
As Automatic Train Operation (ATO) advances toward GoA4 and beyond, it increasingly depends on efficient, reliable cab-view visual perception and decision-oriented inference to ensure safe operation in complex and dynamic railway environments. However, existing approaches focus primarily on basic perception and often generalize poorly to rare yet safety-critical corner cases. They also lack the high-level reasoning and planning capabilities required for operational decision-making. Although recent Large Multi-modal Models (LMMs) show strong generalization and cognitive capabilities, their use in safety-critical ATO is hindered by high computational cost and hallucination risk. Meanwhile, reliable domain-specific benchmarks for systematically evaluating cognitive capabilities are still lacking. To address these gaps, we introduce RailVQA-bench, the first VQA benchmark for cab-view visual cognition in ATO, comprising 20,000 single-frame and 1,168 video based QA pairs to evaluate cognitive generalization and interpretability in both static and dynamic scenarios. Furthermore, we propose RailVQA-CoM, a collaborative large-small model framework that combines small-model efficiency with large-model cognition via a transparent three-module architecture and adaptive temporal sampling, improving perceptual generalization and enabling more efficient reasoning and planning. Experiments demonstrate that the proposed approach substantially improves performance, enhances interpretability, improves efficiency, and strengthens cross-domain generalization in autonomous driving systems. Code and datasets will be available at https://cybereye-bjtu.github.io/RailVQA.html.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces RailVQA-bench, the first VQA benchmark for cab-view visual cognition in Automatic Train Operation (ATO), comprising 20,000 single-frame and 1,168 video-based QA pairs to evaluate cognitive generalization and interpretability. It also proposes RailVQA-CoM, a collaborative large-small model framework with a transparent three-module architecture and adaptive temporal sampling that combines small-model efficiency with large-model cognition for perceptual generalization, reasoning, and planning. Experiments are stated to demonstrate substantial improvements in performance, interpretability, efficiency, and cross-domain generalization to autonomous driving systems.
Significance. If the experimental results hold, the benchmark would address the absence of domain-specific resources for high-level reasoning in safety-critical ATO, while the framework could offer a deployable route to using LMMs under computational and reliability constraints, potentially supporting GoA4+ operations.
major comments (2)
- [Abstract] Abstract: the statement that experiments demonstrate substantial improvements in performance, interpretability, efficiency, and cross-domain generalization supplies no quantitative metrics, baselines, ablation studies, or error analysis, making it impossible to determine whether the data support the central claims.
- [Abstract] Abstract: the claim that RailVQA-CoM strengthens cross-domain generalization in autonomous driving systems lacks any supporting evidence; no transfer experiments, domain-adaptation results, or shared-feature analysis between railway and road domains are described, despite the benchmark and framework being constructed exclusively from cab-view railway imagery.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and recognition of the potential significance of RailVQA-bench and RailVQA-CoM. We address the major comments point-by-point below and will revise the manuscript to strengthen the abstract.
read point-by-point responses
-
Referee: [Abstract] Abstract: the statement that experiments demonstrate substantial improvements in performance, interpretability, efficiency, and cross-domain generalization supplies no quantitative metrics, baselines, ablation studies, or error analysis, making it impossible to determine whether the data support the central claims.
Authors: We agree that the abstract should provide key quantitative highlights. The full manuscript contains detailed experimental results with specific metrics (accuracy, efficiency, interpretability scores), baseline comparisons, module ablations, and error analysis demonstrating the claimed improvements. We will revise the abstract to include representative quantitative results from these experiments. revision: yes
-
Referee: [Abstract] Abstract: the claim that RailVQA-CoM strengthens cross-domain generalization in autonomous driving systems lacks any supporting evidence; no transfer experiments, domain-adaptation results, or shared-feature analysis between railway and road domains are described, despite the benchmark and framework being constructed exclusively from cab-view railway imagery.
Authors: We acknowledge that the manuscript contains no transfer experiments, domain-adaptation results, or analysis between railway and road domains. The abstract claim was not supported by evidence and will be removed or qualified in revision to focus only on results within the railway ATO domain and the framework's modular design for potential future generalization. revision: yes
Circularity Check
No circularity in benchmark construction or framework claims
full rationale
The paper introduces RailVQA-bench as a new dataset of 21,168 QA pairs and RailVQA-CoM as a three-module collaborative architecture, then reports empirical results on performance, interpretability, efficiency, and generalization. No equations, fitted parameters, or self-referential definitions appear that would reduce any claimed prediction to the inputs by construction. The cross-domain generalization statement to autonomous driving is an unsupported extension rather than a circular reduction, and no load-bearing self-citations or ansatz smuggling are present. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Railway accident causa- tion analysis: Current approaches, challenges and potential solutions,
W.-T. Hong, G. Clifton, and J. D. Nelson, “Railway accident causa- tion analysis: Current approaches, challenges and potential solutions,” Accident Analysis & Prevention, vol. 186, p. 107049, 2023
work page 2023
-
[2]
Advanced learning technologies for intelligent transportation systems: Prospects and challenges,
R. A. Khalil, Z. Safelnasr, N. Yemane, M. Kedir, A. Shafiqurrahman, and N. Saeed, “Advanced learning technologies for intelligent transportation systems: Prospects and challenges,”IEEE Open Journal of Vehicular Technology, vol. 5, pp. 397–427, 2024. 12
work page 2024
-
[3]
Railsem19: A dataset for semantic rail scene understand- ing,
O. Zendel, M. Murschitz, M. Zeilinger, D. Steininger, S. Abbasi, and C. Beleznai, “Railsem19: A dataset for semantic rail scene understand- ing,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0
work page 2019
-
[4]
Osdar23: Open sensor data for rail 2023,
R. Tagiew, P. Klasek, R. Tilly, M. K ¨oppel, P. Denzler, P. Neumaier, T. Klockau, M. Boekhoff, and K. Schwalbe, “Osdar23: Open sensor data for rail 2023,” in2023 8th International Conference on Robotics and Automation Engineering (ICRAE). IEEE, 2023, pp. 270–276
work page 2023
-
[5]
Railgo- erl24: G¨orlitz rail test center cv dataset 2024,
R. Tagiew, I. Wunderlich, M. Sastuba, K. G ¨oller, and S. Seitz, “Railgo- erl24: G¨orlitz rail test center cv dataset 2024,” in2025 IEEE Engineering Reliable Autonomous Systems (ERAS). IEEE, 2025, pp. 1–4
work page 2024
-
[6]
A survey of multimodel large language models,
Z. Liang, Y . Xu, Y . Hong, P. Shang, Q. Wang, Q. Fu, and K. Liu, “A survey of multimodel large language models,” inProceedings of the 3rd international conference on computer, artificial intelligence and control engineering, 2024, pp. 405–409
work page 2024
-
[7]
Retrieval-based interleaved visual chain-of-thought in real-world driv- ing scenarios,
C. Corbiere, S. Roburin, S. Montariol, A. Bosselut, and A. Alahi, “Retrieval-based interleaved visual chain-of-thought in real-world driv- ing scenarios,”arXiv preprint arXiv:2501.04671, 2025
-
[8]
Flamingo: a visual language model for few-shot learning,
J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y . Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynoldset al., “Flamingo: a visual language model for few-shot learning,”Advances in neural information processing systems, vol. 35, pp. 23 716–23 736, 2022
work page 2022
-
[9]
Chain-of-thought prompting elicits reasoning in large language models,
J. Wei, X. Wang, D. Schuurmans, M. Bosma, b. ichter, F. Xia, E. Chi, Q. V . Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” inAdvances in Neural Information Processing Systems, vol. 35. Curran Associates, Inc., 2022, pp. 24 824–24 837
work page 2022
-
[10]
A Survey on Hallucination in Large Vision-Language Models
H. Liu, W. Xue, Y . Chen, D. Chen, X. Zhao, K. Wang, L. Hou, R. Li, and W. Peng, “A survey on hallucination in large vision-language models,” arXiv preprint arXiv:2402.00253, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[11]
Evaluating object hallucination in large vision-language models,
Y . Li, Y . Du, K. Zhou, J. Wang, W. X. Zhao, and J.-R. Wen, “Evaluating object hallucination in large vision-language models,” inProceedings of the 2023 conference on empirical methods in natural language processing, 2023, pp. 292–305
work page 2023
-
[12]
Mrsi: A multi- modal proximity remote sensing data set for environment perception in rail transit,
Y . Chen, N. Zhu, Q. Wu, C. Wu, W. Niu, and Y . Wang, “Mrsi: A multi- modal proximity remote sensing data set for environment perception in rail transit,”International Journal of Intelligent Systems, vol. 37, no. 9, pp. 5530–5556, 2022
work page 2022
-
[13]
A camera and lidar data fusion method for railway object detection,
W. Zhangyu, Y . Guizhen, W. Xinkai, L. Haoran, and L. Da, “A camera and lidar data fusion method for railway object detection,”IEEE Sensors Journal, vol. 21, no. 12, pp. 13 442–13 454, 2021
work page 2021
-
[14]
Synrailobs: A synthetic dataset for obstacle detection in railway scenarios,
Q. Guo and J. Rambach, “Synrailobs: A synthetic dataset for obstacle detection in railway scenarios,”arXiv preprint arXiv:2505.10784, 2025
-
[15]
H. Salmane, L. Khoudour, and Y . Ruichek, “A video-analysis-based railway–road safety system for detecting hazard situations at level crossings,”IEEE transactions on intelligent transportation systems, vol. 16, no. 2, pp. 596–609, 2015
work page 2015
-
[16]
The obstacle detection on the railway crossing based on optical flow and clustering,
Z. ˇSilar and M. Dobrovoln `y, “The obstacle detection on the railway crossing based on optical flow and clustering,” in2013 36th Interna- tional Conference on Telecommunications and Signal Processing (TSP). IEEE, 2013, pp. 755–759
work page 2013
-
[17]
Z. Cao, Y . Qin, Z. Xie, Q. Liu, E. Zhang, Z. Wu, and Z. Yu, “An effective railway intrusion detection method using dynamic intrusion region and lightweight neural network,”Measurement, vol. 191, p. 110564, 2022
work page 2022
-
[18]
Yolo-rail: An improved yolo model for obstacle detection on railway tracks,
Z. Wang and X. Du, “Yolo-rail: An improved yolo model for obstacle detection on railway tracks,”IEEE Sensors Journal, 2026
work page 2026
-
[19]
Railway obstacle intrusion warning mechanism integrating yolo-based detection and risk assessment,
Z. Zhang, P. Chen, Y . Huang, L. Dai, F. Xu, and H. Hu, “Railway obstacle intrusion warning mechanism integrating yolo-based detection and risk assessment,”Journal of Industrial Information Integration, vol. 38, p. 100571, 2024
work page 2024
-
[20]
Real-time railway obstacle detection based on multitask perception learning,
C. Chen, H. Qin, Y . Qin, and Y . Bai, “Real-time railway obstacle detection based on multitask perception learning,”IEEE Transactions on Intelligent Transportation Systems, 2025
work page 2025
-
[21]
Railfusion: A lidar- camera data interaction network for 3-d railway object detection,
W. Liu, Y . Wang, G. Yu, Z. Wang, and P. Chen, “Railfusion: A lidar- camera data interaction network for 3-d railway object detection,”IEEE Transactions on Intelligent Transportation Systems, 2025
work page 2025
-
[22]
Vqa: Visual question answering,
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh, “Vqa: Visual question answering,” inProceedings of the IEEE international conference on computer vision, 2015, pp. 2425–2433
work page 2015
-
[23]
H. Liu, C. Li, Q. Wu, and Y . J. Lee, “Visual instruction tuning,” Advances in neural information processing systems, vol. 36, pp. 34 892– 34 916, 2023
work page 2023
-
[24]
Qwen-vl: A versatile vision-language model for understanding, localization,
J. Bai, S. Bai, S. Yang, S. Wang, S. Tan, P. Wang, J. Lin, C. Zhou, and J. Zhou, “Qwen-vl: A versatile vision-language model for understanding, localization,”Text Reading, and Beyond, vol. 2, no. 1, p. 1, 2023
work page 2023
-
[25]
Radarscenes: A real-world radar point cloud data set for automotive applications,
O. Schumann, M. Hahn, N. Scheiner, F. Weishaupt, J. F. Tilly, J. Dick- mann, and C. W ¨ohler, “Radarscenes: A real-world radar point cloud data set for automotive applications,” in2021 IEEE 24th International Conference on Information Fusion (FUSION). IEEE, 2021, pp. 1–8
work page 2021
-
[26]
L. Xu, H. Huang, and J. Liu, “Sutd-trafficqa: A question answering benchmark and an efficient network for video reasoning over traffic events,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 9878–9888
work page 2021
-
[27]
Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario,
T. Qian, J. Chen, L. Zhuo, Y . Jiao, and Y .-G. Jiang, “Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 5, 2024, pp. 4542–4550
work page 2024
-
[28]
Lingoqa: Visual question answering for autonomous driving,
A.-M. Marcu, L. Chen, J. H ¨unermann, A. Karnsund, B. Hanotte, P. Chidananda, S. Nair, V . Badrinarayanan, A. Kendall, J. Shotton et al., “Lingoqa: Visual question answering for autonomous driving,” in European Conference on Computer Vision. Springer, 2024, pp. 252– 269
work page 2024
-
[29]
Large (vision) language models for autonomous vehicles: Current trends and future directions,
H. Tian, K. Reddy, Y . Feng, M. Quddus, Y . Demiris, and P. An- geloudis, “Large (vision) language models for autonomous vehicles: Current trends and future directions,”IEEE Transactions on Intelligent Transportation Systems, vol. 27, no. 1, pp. 187–210, 2025
work page 2025
-
[30]
Drivelm: Driving with graph visual ques- tion answering,
C. Sima, K. Renz, K. Chitta, L. Chen, H. Zhang, C. Xie, J. Beißwenger, P. Luo, A. Geiger, and H. Li, “Drivelm: Driving with graph visual ques- tion answering,” inEuropean conference on computer vision. Springer, 2024, pp. 256–274
work page 2024
-
[31]
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
X. Tian, J. Gu, B. Li, Y . Liu, Y . Wang, Z. Zhao, K. Zhan, P. Jia, X. Lang, and H. Zhao, “Drivevlm: The convergence of autonomous driving and large vision-language models,”arXiv preprint arXiv:2402.12289, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[32]
Drivegpt4: Interpretable end-to-end autonomous driving via large language model,
Z. Xu, Y . Zhang, E. Xie, Z. Zhao, Y . Guo, K.-Y . K. Wong, Z. Li, and H. Zhao, “Drivegpt4: Interpretable end-to-end autonomous driving via large language model,”IEEE Robotics and Automation Letters, vol. 9, no. 10, pp. 8186–8193, 2024
work page 2024
-
[33]
Robotron-drive: All-in-one large multimodal model for autonomous driving,
Z. Huang, C. Feng, F. Yan, B. Xiao, Z. Jie, Y . Zhong, X. Liang, and L. Ma, “Robotron-drive: All-in-one large multimodal model for autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 8011–8021
work page 2025
-
[34]
A. Wu and X. Luo, “Enhancing vision-language models for autonomous driving through task-specific prompting and spatial reasoning,”arXiv preprint arXiv:2510.24152, 2025
-
[35]
A. Ishaq, J. Lahoud, K. More, O. Thawakar, R. Thawkar, D. Dis- sanayake, N. Ahsan, Y . Li, F. S. Khan, H. Cholakkalet al., “Drivelmm- o1: A step-by-step reasoning dataset and large multimodal model for driving scenario understanding,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2025, pp. 20 501–20 508
work page 2025
-
[36]
Foundation models in autonomous driving: A survey on scenario generation and scenario analysis,
Y . Gao, M. Piccinini, Y . Zhang, D. Wang, K. Moller, R. Brusnicki, B. Zarrouki, A. Gambi, J. F. Totz, K. Stormset al., “Foundation models in autonomous driving: A survey on scenario generation and scenario analysis,”IEEE Open Journal of Intelligent Transportation Systems, 2026
work page 2026
-
[37]
A survey on collaborative mech- anisms between large and small language models,
Y . Chen, J. Zhao, and H. Han, “A survey on collaborative mech- anisms between large and small language models,”arXiv preprint arXiv:2505.07460, 2025
-
[38]
S. Li, H. Wang, W. Xu, R. Zhang, S. Guo, J. Yuan, X. Zhong, T. Zhang, and R. Li, “Collaborative inference and learning between edge slms and cloud llms: A survey of algorithms, execution, and open challenges,” arXiv preprint arXiv:2507.16731, 2025
-
[39]
big. little vi- sion transformer for efficient visual recognition,
H. Guo, Y . Wang, Z. Ye, J. Dai, and Y . Xiong, “big. little vi- sion transformer for efficient visual recognition,”arXiv preprint arXiv:2410.10267, 2024
-
[40]
Visiongpt: Vision-language understand- ing agent using generalized multimodal framework,
C. Kelly, L. Hu, B. Yang, Y . Tian, D. Yang, C. Yang, Z. Huang, Z. Li, J. Hu, and Y . Zou, “Visiongpt: Vision-language understand- ing agent using generalized multimodal framework,”arXiv preprint arXiv:2403.09027, 2024
-
[41]
Small drafts, big verdict: Information-intensive visual reasoning via speculation,
Y . Liu, L. Qin, and S. Wang, “Small drafts, big verdict: Information-intensive visual reasoning via speculation,”arXiv preprint arXiv:2510.20812, 2025
-
[42]
Kcm: Kan-based collaboration models enhance pretrained large models,
G. Dai, S. Tang, and Y . Zhuang, “Kcm: Kan-based collaboration models enhance pretrained large models,”arXiv preprint arXiv:2510.20278, 2025
-
[43]
Multi-modal medical diagnosis via large-small model collaboration,
W. Chen, Z. Zhao, J. Yao, Y . Zhang, J. Bu, and H. Wang, “Multi-modal medical diagnosis via large-small model collaboration,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 30 763–30 773
work page 2025
-
[44]
Z. Sun, K. Guo, Y . Hu, D. Tian, Q. Gao, J. Wang, J. Gao, Y . Sun, and B. Yin, “Large-small model synergy with multimodal fine-grained heuristics for knowledge-based visual question answering,” inProceed- ings of the 33rd ACM International Conference on Multimedia, 2025, pp. 935–944. 13
work page 2025
-
[45]
Space-llava: a vision-language model adapted to extraterrestrial applications,
M. Foutter, D. Gammelli, J. Kruger, E. Foss, P. Bhoj, T. Guffanti, S. D’Amico, and M. Pavone, “Space-llava: a vision-language model adapted to extraterrestrial applications,”arXiv preprint arXiv:2408.05924, 2024
-
[46]
S. Bai, Y . Cai, R. Chen, K. Chen, X. Chen, Z. Cheng, L. Deng, W. Ding, C. Gao, C. Geet al., “Qwen3-vl technical report,”arXiv preprint arXiv:2511.21631, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
Judging llm-as-a-judge with mt-bench and chatbot arena,
L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. Xinget al., “Judging llm-as-a-judge with mt-bench and chatbot arena,”Advances in neural information processing systems, vol. 36, pp. 46 595–46 623, 2023
work page 2023
-
[48]
G-eval: Nlg evaluation using gpt-4 with better human alignment,
Y . Liu, D. Iter, Y . Xu, S. Wang, R. Xu, and C. Zhu, “G-eval: Nlg evaluation using gpt-4 with better human alignment,” inProceedings of the 2023 conference on empirical methods in natural language processing, 2023, pp. 2511–2522
work page 2023
-
[49]
You only look once: Unified, real-time object detection,
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779– 788
work page 2016
-
[50]
Bytetrack: Multi-object tracking by associating every detection box,
Y . Zhang, P. Sun, Y . Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang, “Bytetrack: Multi-object tracking by associating every detection box,” inEuropean conference on computer vision. Springer, 2022, pp. 1–21
work page 2022
-
[51]
Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,
Z. Chen, J. Wu, W. Wang, W. Su, G. Chen, S. Xing, M. Zhong, Q. Zhang, X. Zhu, L. Luet al., “Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 24 185–24 198
work page 2024
-
[52]
Llama guard 3 vision: Safe- guarding human-ai image understanding conversations,
J. Chi, U. Karn, H. Zhan, E. Smith, J. Rando, Y . Zhang, K. Plawiak, Z. D. Coudert, K. Upasani, and M. Pasupuleti, “Llama guard 3 vision: Safe- guarding human-ai image understanding conversations,”arXiv preprint arXiv:2411.10414, 2024
-
[53]
A Survey on Efficient Inference for Large Language Models
Z. Zhou, X. Ning, K. Hong, T. Fu, J. Xu, S. Li, Y . Lou, L. Wang, Z. Yuan, X. Liet al., “A survey on efficient inference for large language models,”arXiv preprint arXiv:2404.14294, 2024
work page internal anchor Pith review arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.