pith. sign in

arxiv: 2605.19329 · v1 · pith:TBWQ7MEGnew · submitted 2026-05-19 · 💻 cs.CV · cs.AI

RE-VLM: Event-Augmented Vision-Language Model for Scene Understanding

Pith reviewed 2026-05-20 06:46 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords event camerasvision-language modelsRGB-event fusionscene understandingsynthetic data generationmultimodal alignmentchallenging environments
0
0 comments X

The pith

RE-VLM fuses RGB images with event camera streams to improve vision-language performance under poor lighting and fast motion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard vision-language models rely on RGB images that lose detail in low light, high contrast, or rapid movement. Event cameras record brightness changes asynchronously and retain motion information where frames fail. RE-VLM runs parallel RGB and event encoders, aligns their features to language through staged training, and generates its own training captions and QA pairs by first building scene graphs from paired streams. Two new datasets support evaluation on illumination-challenged and general scenes. The resulting model exceeds prior RGB-only and event-only baselines on captioning and visual question answering, with the largest margins appearing precisely when conventional images degrade.

Core claim

RE-VLM is the first dual-stream vision-language model that jointly processes synchronized RGB images and event streams through parallel encoders and progressive cross-modal alignment, while a graph-driven pipeline converts the paired visual input into scene graphs from which synthetic yet verifiable captions and QA pairs are generated, yielding consistent gains over RGB-only and event-only models on captioning and VQA benchmarks especially in challenging conditions.

What carries the argument

Parallel RGB and event encoders whose heterogeneous features are aligned to language via progressive training, plus a graph-driven pipeline that extracts scene graphs from RGB-Event streams to synthesize captions and QA pairs.

If this is right

  • Scene understanding remains reliable when RGB frames suffer from low light, high dynamic range, or fast motion.
  • Synthetic yet verifiable supervision can substitute for scarce human-annotated RGB-Event-Text data.
  • Event streams supply complementary motion cues that standard VLMs currently lack.
  • The dual-stream design scales to additional challenging environments beyond the two new datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph-to-text synthesis method could be applied to other sparse modalities such as lidar or thermal data to bootstrap multimodal models.
  • Performance gains in adverse conditions suggest the approach may reduce the need for specialized hardware in outdoor robotics or surveillance.
  • If the event branch can be optionally disabled at inference time, the model could serve as a drop-in upgrade for existing RGB-only VLMs.

Load-bearing premise

The graph-driven pipeline produces accurate scene graphs and high-quality synthetic captions or QA pairs that faithfully represent real scene content without introducing artifacts that would inflate measured performance.

What would settle it

Train an otherwise identical model on the same real paired RGB-Event streams but without the graph-synthesis step and compare its captioning and VQA scores directly against the full RE-VLM on the held-out portions of PEOD-Chat and RGBE-Chat.

Figures

Figures reproduced from arXiv: 2605.19329 by Chuang Zhu, Donghong Jiang, Endian Lin, Hanqing Liu, Luoping Cui, Mingjie Liu.

Figure 1
Figure 1. Figure 1: Illustration of RGB-Event complementarity in a challeng [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Construction of RE-VLM: data generation pipeline and model. Left: A graph-driven pipeline converts synchronized RGB frames and event streams into a graph, extracts verifiable scene facts, and synthesizes reliable caption and QA supervision. Center: Representative examples from the datasets yielded by the pipeline: PEOD-Chat (illumination-challenged scenes) and RGBE-Chat (general scenarios). Right: The RE-V… view at source ↗
Figure 3
Figure 3. Figure 3: Data generation pipeline overview. From reconstructed event frames and RGB images, two modality-specific graphs are constructed. A degradation-aware fusion then merges them into a single RGB-event graph (nodes: entities, edges: relations). Finally, captions and VQA items are synthesized from the fused graph. (S: subject, P: place, D: direction, T: target; H: hierarchical relation; A: attribute.) attributes… view at source ↗
Figure 4
Figure 4. Figure 4: RE-VLM model architecture. Synchronized RGB and event streams are encoded. During [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Training pipeline. Three compact stages: (1) Initial event– language alignment, (2) Align the event and RGB modalities with STAM, (3) End-to-end instruction tuning. We adopt a concise three-stage curriculum that first aligns event representations with language, then aligns it with the RGB representation via STAM, and finally performs lightweight instruction tuning on the LLM. Stage 1: Event-Language alignm… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative VQA comparison in an overexposed traffic [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Conventional vision-language models (VLMs) struggle to interpret scenes captured under adverse conditions (e.g., low light, high dynamic range, or fast motion) because standard RGB images degrade in such environments. Event cameras provide a complementary modality: they asynchronously record per-pixel brightness changes with high temporal resolution and wide dynamic range, preserving motion cues where frames fail. We propose RE-VLM, the first dual-stream vision-language model that jointly leverages RGB images and event streams for robust scene understanding across both normal and challenging conditions. RE-VLM employs parallel RGB and event encoders together with a progressive training strategy that aligns heterogeneous visual features with language. To address the scarcity of RGB-Event-Text supervision, we further propose a graph-driven pipeline that converts synchronized RGB-Event streams into verifiable scene graphs, from which we synthesize captions and question-answer (QA) pairs. To develop and evaluate RE-VLM, we construct two datasets: PEOD-Chat, targeting illumination-challenged scenes, and RGBE-Chat, covering diverse scenarios. On captioning and VQA benchmarks, RE-VLM consistently outperforms state-of-the-art RGB-only and event-only models with comparable parameter counts, with particularly large gains under challenging conditions. These results demonstrate the effectiveness of event-augmented VLMs in achieving robust vision-language understanding across a wide range of real-world environments. Code and datasets are available at https://github.com/bupt-ai-cz/RE-VLM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims to introduce RE-VLM, a novel dual-stream vision-language model that integrates RGB and event data for improved scene understanding in normal and challenging conditions. It proposes a graph-driven pipeline to generate scene graphs from RGB-Event streams and synthesize captions and QA pairs to overcome data scarcity, resulting in two new datasets: PEOD-Chat for illumination-challenged scenes and RGBE-Chat for diverse scenarios. Experimental results indicate that RE-VLM outperforms state-of-the-art RGB-only and event-only models on captioning and VQA tasks, with larger gains in adverse conditions, supported by the release of code and datasets.

Significance. Should the empirical results prove robust upon validation of the data generation process, this work would be significant for advancing multimodal AI by incorporating high-temporal-resolution event sensing into VLMs. This could lead to more reliable vision-language systems for real-world applications involving motion, low light, or high dynamic range. The open-sourcing of code and datasets is a commendable strength that enhances the paper's impact and allows for independent verification.

major comments (1)
  1. [Graph-driven pipeline and dataset construction] The description of the graph-driven pipeline for creating verifiable scene graphs and synthesizing captions/QA pairs does not include any quantitative evaluation of graph accuracy (such as node/edge precision or F1 scores) or human evaluation of the quality and faithfulness of the generated text. Since the performance gains are evaluated on PEOD-Chat and RGBE-Chat, which are derived from this pipeline, this omission is critical as it leaves open the possibility that reported improvements stem from artifacts or biases in the synthetic data rather than genuine multimodal advantages.
minor comments (1)
  1. [Abstract] The abstract mentions 'verifiable scene graphs' but does not elaborate on the verification process; this could be clarified for readers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and insightful comments on our manuscript. We address the major comment point by point below and outline the revisions we will make to strengthen the paper.

read point-by-point responses
  1. Referee: The description of the graph-driven pipeline for creating verifiable scene graphs and synthesizing captions/QA pairs does not include any quantitative evaluation of graph accuracy (such as node/edge precision or F1 scores) or human evaluation of the quality and faithfulness of the generated text. Since the performance gains are evaluated on PEOD-Chat and RGBE-Chat, which are derived from this pipeline, this omission is critical as it leaves open the possibility that reported improvements stem from artifacts or biases in the synthetic data rather than genuine multimodal advantages.

    Authors: We agree that quantitative and human evaluations of the graph-driven pipeline would further strengthen the paper and help rule out potential data artifacts. The pipeline builds on established, off-the-shelf detectors and relation extractors applied to synchronized RGB-Event streams, with scene graphs constructed to be verifiable by design. However, we acknowledge the absence of explicit metrics in the current manuscript. In the revised version, we will add a dedicated subsection reporting node/edge precision, recall, and F1 scores on a manually annotated subset of 500 samples. We will also include results from a human evaluation study (with at least 3 annotators per sample) measuring faithfulness, grammatical correctness, and relevance of the synthesized captions and QA pairs, along with inter-annotator agreement (e.g., Cohen's kappa). These additions will directly address the concern that gains may arise from synthetic data biases rather than the dual-stream architecture. revision: yes

Circularity Check

0 steps flagged

No circularity: model and pipeline are self-contained against external benchmarks

full rationale

The paper introduces a dual-stream RGB-event VLM architecture with progressive training and a graph-driven synthesis pipeline that generates scene graphs, captions, and QA pairs to create the PEOD-Chat and RGBE-Chat datasets. No equations, derivations, or parameter-fitting steps are described that reduce by construction to the model's own inputs or outputs. Performance claims rest on comparisons to prior RGB-only and event-only models on the newly constructed datasets and standard benchmarks, without self-citation chains, uniqueness theorems, or ansatzes that bear the central result. The approach is externally falsifiable via the released code and datasets rather than internally forced.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions from VLM training and event-camera literature plus the effectiveness of the proposed graph pipeline; no new physical entities or ad-hoc constants are introduced.

axioms (1)
  • domain assumption Event streams can be meaningfully aligned with language descriptions via progressive training on synthesized scene graphs.
    Invoked in the description of the dual-stream encoders and training strategy.

pith-pipeline@v0.9.0 · 5802 in / 1245 out tokens · 39027 ms · 2026-05-20T06:46:34.063724+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 7 internal anchors

  1. [1]

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923, 2025. 1, 2, 7

  2. [2]

    DDD17: End-To-End DAVIS Driving Dataset

    Jonathan Binas, Daniel Neil, Shih-Chii Liu, and Tobi Del- bruck. Ddd17: End-to-end davis driving dataset.arXiv preprint arXiv:1711.01458, 2017. 5

  3. [3]

    M3ed: Multi-robot, multi-sensor, multi-environment event dataset

    Kenneth Chaney, Fernando Cladera, Ziyun Wang, Anthony Bisulco, M Ani Hsieh, Christopher Korpela, Vijay Kumar, Camillo J Taylor, and Kostas Daniilidis. M3ed: Multi-robot, multi-sensor, multi-environment event dataset. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4016–4023, 2023. 5

  4. [4]

    Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks

    Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 24185–24198, 2024. 1, 2, 7

  5. [5]

    Segment any event streams via weighted adaptation of pivotal tokens

    Zhiwen Chen, Zhiyu Zhu, Yifan Zhang, Junhui Hou, Guang- ming Shi, and Jinjian Wu. Segment any event streams via weighted adaptation of pivotal tokens. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3890–3900, 2024. 5

  6. [6]

    Peod: A pixel-aligned event-rgb benchmark for object detection under challenging conditions, 2025

    Luoping Cui, Hanqing Liu, Mingjie Liu, Endian Lin, Donghong Jiang, Yuhao Wang, and Chuang Zhu. Peod: A pixel-aligned event-rgb benchmark for object detection under challenging conditions, 2025. 4, 5, 7

  7. [7]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 5

  8. [8]

    Standard and event cameras fusion for feature tracking

    Yan Dong and Tao Zhang. Standard and event cameras fusion for feature tracking. InProceedings of the 2021 International Conference on Machine Vision and Applications, pages 55–60,

  9. [9]

    Event-based vision: A survey.IEEE transactions on pattern analysis and machine intelligence, 44(1):154–180, 2020

    Guillermo Gallego, Tobi Delbr¨uck, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew J Davison, J ¨org Conradt, Kostas Daniilidis, et al. Event-based vision: A survey.IEEE transactions on pattern analysis and machine intelligence, 44(1):154–180, 2020. 1, 2

  10. [10]

    Low-latency automo- tive vision with event cameras.Nature, 629(8014):1034–1040,

    Daniel Gehrig and Davide Scaramuzza. Low-latency automo- tive vision with event cameras.Nature, 629(8014):1034–1040,

  11. [11]

    Asynchronous, photometric feature tracking using events and frames

    Daniel Gehrig, Henri Rebecq, Guillermo Gallego, and Davide Scaramuzza. Asynchronous, photometric feature tracking using events and frames. InProceedings of the European Conference on Computer Vision (ECCV), pages 750–765,

  12. [12]

    Dsec: A stereo event camera dataset for driving scenarios.IEEE Robotics and Automation Letters, 6(3):4947– 4954, 2021

    Mathias Gehrig, Willem Aarents, Daniel Gehrig, and Davide Scaramuzza. Dsec: A stereo event camera dataset for driving scenarios.IEEE Robotics and Automation Letters, 6(3):4947– 4954, 2021. 5

  13. [13]

    GPT-4o System Card

    Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276, 2024. 2

  14. [14]

    Real-time 3d reconstruction and 6-dof tracking with an event camera

    Hanme Kim, Stefan Leutenegger, and Andrew J Davison. Real-time 3d reconstruction and 6-dof tracking with an event camera. InEuropean conference on computer vision, pages 349–364. Springer, 2016. 2

  15. [15]

    Multimodal alzheimer’s disease recognition from image, text and audio.Scientific Reports, 15(1):29038,

    Byounghwa Lee, Hwa Jeon Song, Young-Jin Park, and Byung Ok Kang. Multimodal alzheimer’s disease recognition from image, text and audio.Scientific Reports, 15(1):29038,

  16. [16]

    Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023

    Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023. 2, 7

  17. [17]

    Seeing motion at nighttime with an event camera

    Haoyue Liu, Shihan Peng, Lin Zhu, Yi Chang, Hanyu Zhou, and Luxin Yan. Seeing motion at nighttime with an event camera. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 25648–25658,

  18. [18]

    Enhancing Event-based Object Detection with Monocular Normal Maps

    Mingjie Liu, Hanqing Liu, and Chuang Zhu. Beyond rgb and events: Enhancing object detection under adverse lighting with monocular normal maps.arXiv preprint arXiv:2508.02127, 2025. 2

  19. [19]

    Eventgpt: Event stream understanding with multimodal large language models

    Shaoyu Liu, Jianing Li, Guanghui Zhao, Yunjian Zhang, Xin Meng, Fei Richard Yu, Xiangyang Ji, and Ming Li. Eventgpt: Event stream understanding with multimodal large language models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 29139–29149, 2025. 2, 4, 5, 7

  20. [20]

    Enhancing traffic object detec- tion in variable illumination with rgb-event fusion.IEEE Transactions on Intelligent Transportation Systems, 2024

    Zhanwen Liu, Nan Yang, Yang Wang, Yuke Li, Xiangmo Zhao, and Fei-Yue Wang. Enhancing traffic object detec- tion in variable illumination with rgb-event fusion.IEEE Transactions on Intelligent Transportation Systems, 2024. 3

  21. [21]

    DeepSeek-VL: Towards Real-World Vision-Language Understanding

    Haoyu Lu, Wen Liu, Bo Zhang, Bingxuan Wang, Kai Dong, Bo Liu, Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, et al. Deepseek-vl: towards real-world vision-language understanding.arXiv preprint arXiv:2403.05525, 2024. 1, 2, 7

  22. [22]

    View selection for 3d captioning via diffusion ranking

    Tiange Luo, Justin Johnson, and Honglak Lee. View selection for 3d captioning via diffusion ranking. InEuropean Con- ference on Computer Vision, pages 180–197. Springer, 2024. 4

  23. [23]

    Video-chatgpt: Towards detailed video understand- ing via large vision and language models

    Muhammad Maaz, Hanoona Rasheed, Salman Khan, and Fa- had Khan. Video-chatgpt: Towards detailed video understand- ing via large vision and language models. InProceedings of the 62nd Annual Meeting of the Association for Compu- tational Linguistics (Volume 1: Long Papers), pages 12585– 12602, 2024. 7 9

  24. [24]

    Fast event-based corner detection

    Elias Mueggler, Chiara Bartolozzi, and Davide Scaramuzza. Fast event-based corner detection. 2017. 2

  25. [25]

    Learning transferable visual models from natural language supervi- sion

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 2

  26. [26]

    Emvs: Event-based multi-view stereo—3d reconstruction with an event camera in real-time.Interna- tional Journal of Computer Vision, 126(12):1394–1414, 2018

    Henri Rebecq, Guillermo Gallego, Elias Mueggler, and Da- vide Scaramuzza. Emvs: Event-based multi-view stereo—3d reconstruction with an event camera in real-time.Interna- tional Journal of Computer Vision, 126(12):1394–1414, 2018. 2

  27. [27]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, An- drew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805, 2023. 2

  28. [28]

    Ultimate slam? combining events, images, and imu for robust visual slam in hdr and high-speed scenarios.IEEE Robotics and Automation Letters, 3(2):994– 1001, 2018

    Antoni Rosinol Vidal, Henri Rebecq, Timo Horstschaefer, and Davide Scaramuzza. Ultimate slam? combining events, images, and imu for robust visual slam in hdr and high-speed scenarios.IEEE Robotics and Automation Letters, 3(2):994– 1001, 2018. 2

  29. [29]

    Eventclip: Adapting clip for event-based object recognition

    Ziyi Wu, Xudong Liu, and Igor Gilitschenski. Eventclip: Adapting clip for event-based object recognition.arXiv preprint arXiv:2306.06354, 2023. 2

  30. [30]

    Qwen3-Omni Technical Report

    Jin Xu, Zhifang Guo, Hangrui Hu, Yunfei Chu, Xiong Wang, Jinzheng He, Yuxuan Wang, Xian Shi, Ting He, Xinfa Zhu, et al. Qwen3-omni technical report.arXiv preprint arXiv:2509.17765, 2025. 8

  31. [31]

    Ezsr: Event- based zero-shot recognition

    Yan Yang, Liyuan Pan, Dongxu Li, and Liu Liu. Ezsr: Event- based zero-shot recognition. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4628–4638,

  32. [32]

    Frame-event alignment and fusion network for high frame rate tracking

    Jiqing Zhang, Yuanchen Wang, Wenxi Liu, Meng Li, Jinpeng Bai, Baocai Yin, and Xin Yang. Frame-event alignment and fusion network for high frame rate tracking. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9781–9790, 2023. 3

  33. [33]

    Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023. 8

  34. [34]

    Eventbind: Learning a unified representation to bind them all for event-based open-world understanding

    Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu, and Lin Wang. Eventbind: Learning a unified representation to bind them all for event-based open-world understanding. InEuropean Conference on Computer Vision, pages 477–494. Springer,

  35. [35]

    Rgb-event fusion for moving object detection in autonomous driving

    Zhuyun Zhou, Zongwei Wu, R ´emi Boutteau, Fan Yang, C´edric Demonceaux, and Dominique Ginhac. Rgb-event fusion for moving object detection in autonomous driving. arXiv preprint arXiv:2209.08323, 2022. 2, 3

  36. [36]

    The multivehicle stereo event camera dataset: An event camera dataset for 3d perception.IEEE Robotics and Automation Letters, 3(3): 2032–2039, 2018

    Alex Zihao Zhu, Dinesh Thakur, Tolga¨Ozaslan, Bernd Pfrom- mer, Vijay Kumar, and Kostas Daniilidis. The multivehicle stereo event camera dataset: An event camera dataset for 3d perception.IEEE Robotics and Automation Letters, 3(3): 2032–2039, 2018. 5 10