PointQ-Bench: Benchmarking Diagnostic and Interpretable Point Cloud Quality Assessment

Cheng Li; Di Wang; Duanchu Wang; Jing Huang; Junjie Yang; Zhi Gao; ZhuBohong; Zihang Cheng

arxiv: 2605.28241 · v1 · pith:GYTPEV3Gnew · submitted 2026-05-27 · 💻 cs.CV

PointQ-Bench: Benchmarking Diagnostic and Interpretable Point Cloud Quality Assessment

Duanchu Wang , Cheng Li , Junjie Yang , Jing Huang , Zihang Cheng , Zhi Gao , ZhuBohong , Di Wang This is my paper

Pith reviewed 2026-06-29 13:57 UTC · model grok-4.3

classification 💻 cs.CV

keywords point cloud quality assessmentbenchmarkdefect diagnosisvision-language modelsanomaly sensingquality calibration3D perceptionusability grading

0 comments

The pith

PointQ-Bench reveals that current models perceive point cloud defects coarsely but fail at grounded diagnosis and quality calibration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PointQ-Bench, a dataset of 3,083 point clouds with mean opinion scores, issue tags across eight types, expert descriptions, and 12,332 question-answer pairs. It defines four tasks that move quality assessment from scalar scores to anomaly sensing, defect diagnosis, usability grading, and open-ended reporting. Experiments across 14 vision-language models show emerging coarse perception but persistent weaknesses in precise diagnosis and calibration, with stronger 2D models outperforming 3D ones and variable gains from extra views. A new SSFRQ-5D protocol measures free-form descriptions via human-AI agreement. The work matters because real-world 3D inspection needs interpretable defect identification rather than single numbers.

Core claim

PointQ-Bench extends point cloud quality assessment beyond scalar scoring by supplying expert-grounded annotations on authentic, simulated, and generated samples for tasks in anomaly sensing, defect diagnosis, usability grading, and quality reporting; evaluation with the SSFRQ-5D protocol on 14 models establishes a consistent perception-diagnosis gap in which coarse defect perception appears but grounded diagnosis and quality calibration remain weak, and strong 2D MLLMs generally outperform existing 3D VLMs with non-uniform benefits from additional views.

What carries the argument

PointQ-Bench benchmark of 3,083 annotated point clouds together with the SSFRQ-5D five-dimensional protocol for scoring free-form quality descriptions.

If this is right

Targeted training focused on grounded diagnosis is required beyond current coarse perception capabilities.
Strong 2D MLLMs provide better starting points than existing 3D VLMs for point cloud quality tasks.
Benefits from multi-view or point-level inputs vary by task, data source, and model, especially under ambiguous boundaries.
The benchmark supports evaluation of interpretable quality understanding needed for practical 3D inspection workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Consistent outperformance by 2D models suggests projection-based methods may remain preferable until native 3D architectures close the diagnosis gap.
Non-uniform gains from extra views imply that input-selection strategies could be tuned per application using the benchmark results.
The human-AI agreement validation of SSFRQ-5D indicates the protocol could transfer to description-based evaluation in other 3D or multimodal settings.

Load-bearing premise

The expert-grounded descriptions, issue tags, and QA pairs in the 3,083 samples accurately capture human perception and remain consistent enough for reliable model evaluation across tasks.

What would settle it

A follow-up study in which newly trained models achieve high accuracy on the defect diagnosis and usability grading tasks while human raters still disagree with the SSFRQ-5D scores on the same samples would falsify the reported perception-diagnosis gap.

Figures

Figures reproduced from arXiv: 2605.28241 by Cheng Li, Di Wang, Duanchu Wang, Jing Huang, Junjie Yang, Zhi Gao, ZhuBohong, Zihang Cheng.

**Figure 1.** Figure 1: PointQ-Bench: An overview of a unified benchmark for point cloud quality assessment. The benchmark integrates real, synthetic, and AI-generated data with a unified annotation protocol, and organizes quality assessment beyond scalar MOS through explicit defect taxonomy and ambiguity-aware decision splits [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 3.** Figure 3: Demonstrations of PointQ-Bench. findings suggest that future progress should place greater emphasis on grounded defect diagnosis, uncertainty-aware calibration, and source-robust quality reasoning rather than simply scaling input evidence. Overall, PointQ-Bench highlights a critical gap between quality description and reliable 3D quality decision-making, and provides a diagnostic foundation for advancing t… view at source ↗

**Figure 4.** Figure 4: Usability grading across consensus strata. [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

read the original abstract

Point cloud quality plays a critical role in 3D acquisition, reconstruction, rendering, and perception, yet existing point cloud quality assessment (PCQA) research remains largely centered on scalar score prediction. In practical inspection scenarios, quality assessment often involves identifying defects, characterizing dominant issue types, assessing downstream usability, and providing evidence-supported descriptions, which are not explicitly evaluated by current benchmarks. We introduce PointQ-Bench, a benchmark designed to extend PCQA from scalar scoring toward comprehensive quality understanding. PointQ-Bench consists of 3,083 point clouds spanning authentic scans, simulated distortions, and AI-generated content, covering eight major issue types. Each sample is annotated with mean opinion scores (MOS), quality levels, issue tags, expert-grounded descriptions, and 12,332 question-answer pairs. The benchmark supports three perception-oriented tasks: anomaly sensing, defect diagnosis, and usability grading, as well as a cognition-oriented task of open-ended quality reporting. To evaluate free-form quality descriptions, we further propose SSFRQ-5D, a five-dimensional evaluation protocol validated through human-AI agreement analysis. Extensive experiments on 14 vision-language models and traditional PCQA baselines reveal a consistent perception-diagnosis gap: while current models exhibit emerging abilities in coarse defect perception, they struggle with grounded diagnosis and quality calibration. Strong 2D MLLMs generally outperform existing 3D VLMs, and the benefit of additional views or point-level inputs is non-uniform, varying across tasks, data sources, and models, particularly under boundary-ambiguous conditions. Overall, PointQ-Bench provides a diagnostic testbed for advancing reliable and interpretable point cloud quality understanding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PointQ-Bench shifts PCQA to diagnostic tasks with a new protocol and model tests, but missing inter-annotator stats on the core labels undercuts the gap claims.

read the letter

The main takeaway is that this paper builds a benchmark moving point cloud quality assessment past single-number scores. It adds tasks for spotting anomalies, naming defect types, grading usability, and writing explanations, backed by 3,083 samples and over 12,000 QA pairs across real, simulated, and generated clouds. They also introduce SSFRQ-5D to score the free-form descriptions and run it on 14 models.

What works is the scope and the framing. The multi-source data and the split into perception versus cognition tasks line up with practical 3D inspection needs. The reported pattern—that models handle coarse defect sensing better than grounded diagnosis or calibration, and that 2D MLLMs often beat 3D VLMs—gives a concrete starting point for follow-up work. The non-uniform effect of extra views or point inputs is also worth testing further.

The soft spot is the annotation foundation. The paper treats the issue tags, descriptions, and QA pairs as expert-grounded ground truth, yet provides no inter-annotator agreement numbers for those primary labels. Human-AI agreement appears only for the downstream SSFRQ-5D scoring. Without kappa or similar stats on the base annotations, especially on ambiguous defects, the perception-diagnosis gap could partly reflect label noise rather than model differences. That concern from the stress test holds up on the available description.

This is aimed at researchers in 3D vision and quality assessment who want evaluation that matches real workflows. A reader already running MLLMs on point clouds would find the task definitions and baseline numbers useful for comparison.

It deserves a serious referee. The benchmark design and experiment breadth are substantive enough to warrant review, even if the annotation validation section will need strengthening.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces PointQ-Bench, a benchmark of 3,083 point clouds (authentic scans, simulated distortions, AI-generated) annotated with MOS, quality levels, issue tags, expert-grounded descriptions, and 12,332 QA pairs. It defines four tasks—anomaly sensing, defect diagnosis, usability grading, and open-ended quality reporting—and proposes the SSFRQ-5D protocol (validated via human-AI agreement) for scoring free-form descriptions. Experiments on 14 VLMs and traditional PCQA baselines report a perception-diagnosis gap, with 2D MLLMs generally outperforming 3D VLMs and non-uniform benefits from multi-view or point-level inputs.

Significance. If the annotations are shown to be reliable, the benchmark would usefully shift PCQA evaluation from scalar prediction toward diagnostic and interpretable assessment. The multi-task design, release of the dataset, and SSFRQ-5D protocol are constructive contributions. The absence of inter-annotator agreement statistics for the core annotations, however, prevents a clear assessment of whether the reported model gaps reflect genuine capability differences or annotation variability.

major comments (2)

[Benchmark construction] Benchmark construction section: no inter-annotator agreement metrics (Cohen’s kappa, Krippendorff’s alpha, or pairwise consistency rates) are reported for the issue tags, expert-grounded descriptions, or QA pairs on the 3,083 samples. Because the central claims about the perception-diagnosis gap and 2D-vs-3D model ordering rest on these annotations serving as stable ground truth, the lack of reliability statistics is load-bearing for the experimental conclusions.
[Experiments] Experimental setup and results sections: data splits, cross-validation procedure, and any statistical significance tests for the reported performance differences across models and tasks are not described. Without these, it is impossible to determine whether the observed gaps are robust or sensitive to particular partitions of the 3,083 samples.

minor comments (1)

[Abstract] The abstract states that SSFRQ-5D is “validated through human-AI agreement analysis,” yet the precise protocol, number of raters, and agreement values are not summarized even at a high level; adding a one-sentence quantitative statement would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The two major comments identify important omissions that affect the transparency and interpretability of our results. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Benchmark construction] Benchmark construction section: no inter-annotator agreement metrics (Cohen’s kappa, Krippendorff’s alpha, or pairwise consistency rates) are reported for the issue tags, expert-grounded descriptions, or QA pairs on the 3,083 samples. Because the central claims about the perception-diagnosis gap and 2D-vs-3D model ordering rest on these annotations serving as stable ground truth, the lack of reliability statistics is load-bearing for the experimental conclusions.

Authors: We agree that quantitative inter-annotator agreement statistics are necessary to substantiate the reliability of the annotations used as ground truth. The original manuscript omitted these metrics. In the revised version we will compute and report Krippendorff’s alpha (and pairwise consistency rates where appropriate) for issue tags, expert-grounded descriptions, and QA pairs, using the multiple annotations collected during the benchmark construction process. These statistics will be added to the benchmark construction section to directly support the stability of the reported perception-diagnosis gap and model orderings. revision: yes
Referee: [Experiments] Experimental setup and results sections: data splits, cross-validation procedure, and any statistical significance tests for the reported performance differences across models and tasks are not described. Without these, it is impossible to determine whether the observed gaps are robust or sensitive to particular partitions of the 3,083 samples.

Authors: We concur that explicit documentation of data partitioning and statistical testing is required for assessing robustness. The original manuscript did not include these details. In the revision we will add a dedicated experimental setup subsection that specifies the train/test splits (stratified by source and issue type), any cross-validation procedure employed, and the statistical tests (e.g., paired significance tests with p-values or confidence intervals) used to evaluate performance differences across models and tasks. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark construction and empirical evaluation are self-contained

full rationale

The paper's core contribution is the creation of PointQ-Bench (3,083 samples with MOS, tags, descriptions, and QA pairs) plus the SSFRQ-5D protocol, followed by empirical evaluation of 14 VLMs. No derivations, equations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The perception-diagnosis gap claim rests on direct model testing against the new annotations rather than any reduction to prior fitted results or self-referential definitions. Human-AI agreement is cited only for protocol validation, not as a load-bearing self-citation. This is a standard benchmark paper with independent empirical content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

The paper introduces a new benchmark and evaluation protocol. No free parameters or mathematical axioms are invoked beyond standard assumptions in computer vision benchmarking.

invented entities (2)

PointQ-Bench no independent evidence
purpose: Benchmark for diagnostic and interpretable PCQA tasks
Newly constructed dataset and protocol introduced by the paper
SSFRQ-5D no independent evidence
purpose: Five-dimensional protocol for evaluating free-form quality descriptions
New evaluation method proposed in the paper

pith-pipeline@v0.9.1-grok · 5855 in / 1122 out tokens · 36651 ms · 2026-06-29T13:57:32.522303+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 12 canonical work pages

[1]

Predicting the perceptual quality of point cloud: A 3d-to-2d projection-based exploration.IEEE Transactions on Multimedia, 2020

Qi Yang, Hao Chen, Zhan Ma, Yiling Xu, Rongjun Tang, and Jun Sun. Predicting the perceptual quality of point cloud: A 3d-to-2d projection-based exploration.IEEE Transactions on Multimedia, 2020

2020
[2]

Perceptual quality assessment of colored 3d point clouds.IEEE Transactions on Visualization and Computer Graphics, 29(8):3642–3655, 2023

Honglei Su, Qi Liu, Zhengfang Duanmu, Wentao Liu, and Zhou Wang. Perceptual quality assessment of colored 3d point clouds.IEEE Transactions on Visualization and Computer Graphics, 29(8):3642–3655, 2023

2023
[3]

Xinju Wu, Yun Zhang, Chunling Fan, Junhui Hou, and Sam Kwong. Subjective quality database and objective study of compressed point clouds with 6DoF head-mounted display.IEEE Transactions on Circuits and Systems for Video Technology, 31(12):4630–4644, 2021

2021
[4]

Basics: Broad quality assessment of static point clouds in compression scenarios.arXiv preprint arXiv:2302.04796, 2023

Ali Ak, Emin Zerman, Maurice Quach, Aladine Chetouani, Aljosa Smolic, Giuseppe Valenzise, and Patrick Le Callet. Basics: Broad quality assessment of static point clouds in compression scenarios.arXiv preprint arXiv:2302.04796, 2023

work page arXiv 2023
[5]

Yipeng Liu, Qi Yang, Yiling Xu, and Le Yang. Point cloud quality assessment: Dataset construction and learning- based no-reference metric.ACM Transactions on Multimedia Computing, Communications, and Applications, 18(3):1–24, 2022

2022
[6]

Deep learning for 3d point clouds: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12):4338–4364, 2021

Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. Deep learning for 3d point clouds: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12):4338–4364, 2021

2021
[7]

T3Bench: Benchmarking current progress in text-to-3d generation.CoRR, abs/2310.02977, 2023

Yuze He, Yushi Bai, Matthieu Lin, Wang Zhao, Yubin Hu, Jenny Sheng, Ran Yi, Juanzi Li, and Yong-Jin Liu. T3Bench: Benchmarking current progress in text-to-3d generation.CoRR, abs/2310.02977, 2023. 11 PointQ-Bench arXiv preprint 100 15 20 25 30 35 40 45 50 55 60How Score (M-F1) How (M-F1) Across Consensus Strata High-consensus Medium-consensus Boundary 0 P...

work page arXiv 2023
[8]

Benchmarking and learning multi-dimensional quality evaluator for text-to-3d generation

Yujie Zhang, Bingyang Cui, Qi Yang, Zhu Li, and Yiling Xu. Benchmarking and learning multi-dimensional quality evaluator for text-to-3d generation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 18563–18574, 2025

2025
[9]

Towards fine-grained text-to-3d quality assessment: A benchmark and a two-stage rank-learning metric.arXiv preprint arXiv:2509.23841, 2025

Bingyang Cui, Yujie Zhang, Qi Yang, Zhu Li, and Yiling Xu. Towards fine-grained text-to-3d quality assessment: A benchmark and a two-stage rank-learning metric.arXiv preprint arXiv:2509.23841, 2025

work page arXiv 2025
[10]

Pcqm: A full-reference quality metric for colored 3d point clouds

Gabriel Meynet, Yana Nehmé, Julie Digne, and Guillaume Lavoué. Pcqm: A full-reference quality metric for colored 3d point clouds. In2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6. IEEE, 2020

2020
[11]

Ms-graphsim: Inferring point cloud quality via multiscale graph similarity

Yujie Zhang, Qi Yang, and Yiling Xu. Ms-graphsim: Inferring point cloud quality via multiscale graph similarity. InProceedings of the 29th ACM International Conference on Multimedia, pages 1230–1238, 2021

2021
[12]

Mped: Quantifying point cloud distortion based on multiscale potential energy discrepancy.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):6037–6054, 2022

Qi Yang, Yujie Zhang, Siheng Chen, Yiling Xu, Jun Sun, and Zhan Ma. Mped: Quantifying point cloud distortion based on multiscale potential energy discrepancy.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):6037–6054, 2022

2022
[13]

Reduced-reference quality assessment of point clouds via content-oriented saliency projection.IEEE Signal Processing Letters, 30:354–358, 2023

Wei Zhou, Guanghui Yue, Ruizeng Zhang, Yipeng Qin, and Hantao Liu. Reduced-reference quality assessment of point clouds via content-oriented saliency projection.IEEE Signal Processing Letters, 30:354–358, 2023

2023
[14]

No-reference point cloud quality assessment via domain adaptation

Qi Yang, Yipeng Liu, Siheng Chen, Yiling Xu, and Jun Sun. No-reference point cloud quality assessment via domain adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21147–21156, 2022

2022
[15]

Gpa-net: No-reference point cloud quality assessment with multi-task graph convolutional network.IEEE Transactions on Visualization and Computer Graphics, 30(8):4955–4967, 2024

Ziyu Shan, Qi Yang, Rui Ye, Yujie Zhang, Yiling Xu, Xiaozhong Xu, and Shan Liu. Gpa-net: No-reference point cloud quality assessment with multi-task graph convolutional network.IEEE Transactions on Visualization and Computer Graphics, 30(8):4955–4967, 2024

2024
[16]

Perceptual quality assessment for point clouds: A survey.ZTE Communications, (4):3–16, 2023

Yingjie Zhou, Zicheng Zhang, Wei Sun, Xiongkuo Min, and Guangtao Zhai. Perceptual quality assessment for point clouds: A survey.ZTE Communications, (4):3–16, 2023

2023
[17]

Dpcd: A quality assessment database for dynamic point clouds.arXiv preprint arXiv:2505.12431, 2025

Yating Liu, Yujie Zhang, Qi Yang, Yiling Xu, Zhu Li, and Ye-Kui Wang. Dpcd: A quality assessment database for dynamic point clouds.arXiv preprint arXiv:2505.12431, 2025. 12 PointQ-Bench arXiv preprint

work page arXiv 2025
[18]

No-reference point cloud quality assessment via weighted patch quality prediction

Jun Cheng, Honglei Su, and Jari Korhonen. No-reference point cloud quality assessment via weighted patch quality prediction. InProceedings of the 35th International Conference on Software Engineering and Knowledge Engineering (SEKE), pages 298–303, 2023

2023
[19]

No-reference point cloud geometry quality assessment based on pairwise rank learning.CoRR, abs/2211.01205, 2022

Zhiyong Su, Chao Chu, Long Chen, Yong Li, and Weiqing Li. No-reference point cloud geometry quality assessment based on pairwise rank learning.CoRR, abs/2211.01205, 2022

work page arXiv 2022
[20]

Progressive knowledge transfer based on human visual perception mechanism for perceptual quality assessment of point clouds.CoRR, abs/2211.16646, 2022

Qi Liu, Yiyun Liu, Honglei Su, Hui Yuan, and Raouf Hamzaoui. Progressive knowledge transfer based on human visual perception mechanism for perceptual quality assessment of point clouds.CoRR, abs/2211.16646, 2022

work page arXiv 2022
[21]

Contrastive pre-training with multi-view fusion for no-reference point cloud quality assessment

Ziyu Shan, Yujie Zhang, Qi Yang, Haichen Yang, Yiling Xu, Jenq-Neng Hwang, Xiaozhong Xu, and Shan Liu. Contrastive pre-training with multi-view fusion for no-reference point cloud quality assessment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 25942–25951, 2024

2024
[22]

Mm-pcqa: Multi-modal learning for no-reference point cloud quality assessment

Zicheng Zhang, Wei Sun, Xiongkuo Min, Quan Zhou, Jun He, Qiyuan Wang, and Guangtao Zhai. Mm-pcqa: Multi-modal learning for no-reference point cloud quality assessment. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2023

2023
[23]

Once-training-all-fine: No-reference point cloud quality assessment via domain-relevance degradation description.IEEE Transactions on Broadcasting, 71(2):616–630, 2025

Yipeng Liu, Qi Yang, Yujie Zhang, Yiling Xu, Le Yang, Xiaozhong Xu, and Shan-Lin Liu. Once-training-all-fine: No-reference point cloud quality assessment via domain-relevance degradation description.IEEE Transactions on Broadcasting, 71(2):616–630, 2025

2025
[24]

Lmm-pcqa: Assisting point cloud quality assessment with lmm.arXiv preprint arXiv:2404.18203, 2024

Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Wei Sun, Chaofeng Chen, Xiongkuo Min, Xiaohong Liu, Weisi Lin, and Guangtao Zhai. Lmm-pcqa: Assisting point cloud quality assessment with lmm.arXiv preprint arXiv:2404.18203, 2024

work page arXiv 2024
[25]

Clip-pcqa: Exploring subjective-aligned vision-language modeling for point cloud quality assessment

Yating Liu, Yujie Zhang, Ziyu Shan, and Yiling Xu. Clip-pcqa: Exploring subjective-aligned vision-language modeling for point cloud quality assessment. InProceedings of the AAAI Conference on Artificial Intelligence, 2025

2025
[26]

Shashank Gupta, Gregoire Phillips, and Alan C. Bovik. Pit-qmm: A large multimodal model for no-reference point cloud quality assessment.arXiv preprint arXiv:2510.07636, 2025

work page arXiv 2025
[27]

Towards explainable in-the-wild video quality assessment: A database and a language-prompted approach

Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, and Weisi Lin. Towards explainable in-the-wild video quality assessment: A database and a language-prompted approach. InProceedings of the 31st ACM International Conference on Multimedia, pages 1045–1054, 2023

2023
[28]

Q-bench: A benchmark for general-purpose foundation models on low-level vision.arXiv preprint arXiv:2309.14181, 2023a

Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Chunyi Li, Wenxiu Sun, Qiong Yan, Guangtao Zhai, and Weisi Lin. Q-bench: A benchmark for general-purpose foundation models on low-level vision.arXiv preprint arXiv:2309.14181, 2024

work page arXiv 2024
[29]

Depicting beyond scores: Advancing image quality assessment through multi-modal language models

Zhiyuan You, Zheyuan Li, Jinjin Gu, Zhenfei Yin, Tianfan Xue, and Chao Dong. Depicting beyond scores: Advancing image quality assessment through multi-modal language models. InEuropean Conference on Computer Vision (ECCV), 2024

2024
[30]

Enhancing descriptive image quality assessment with a large-scale multi-modal dataset.IEEE Transactions on Image Processing, 34:8201–8215, 2025

Zhiyuan You, Jinjin Gu, Xin Cai, Zheyuan Li, Kaiwen Zhu, Chao Dong, and Tianfan Xue. Enhancing descriptive image quality assessment with a large-scale multi-modal dataset.IEEE Transactions on Image Processing, 34:8201–8215, 2025

2025
[31]

Grounding-iqa: Grounding multimodal language model for image quality assessment

Zheng Chen, Xun Zhang, Wenbo Li, Renjing Pei, Fenglong Song, Xiongkuo Min, Xiaohong Liu, Xin Yuan, Yong Guo, and Yulun Zhang. Grounding-iqa: Grounding multimodal language model for image quality assessment. CoRR, abs/2411.17237, 2024

work page arXiv 2024
[32]

Q-bench-video: Benchmark the video quality understanding of lmms

Zicheng Zhang, Ziheng Jia, Haoning Wu, Chunyi Li, Zijian Chen, Yingjie Zhou, Wei Sun, Xiaohong Liu, Xiongkuo Min, Weisi Lin, and Guangtao Zhai. Q-bench-video: Benchmark the video quality understanding of lmms. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3229–3239, 2025

2025
[33]

Medq-bench: Evaluating and exploring medical image quality assessment abilities in mllms.arXiv preprint arXiv:2510.01691, 2025

Jiyao Liu, Jinjie Wei, Wanying Qu, Chenglong Ma, Junzhi Ning, Yunheng Li, Ying Chen, Xinzhe Luo, Pengcheng Chen, Xin Gao, Ming Hu, Huihui Xu, Xin Wang, Shujian Gao, Dingkang Yang, Zhongying Deng, Jin Ye, Lihao Liu, Junjun He, and Ningsheng Xu. Medq-bench: Evaluating and exploring medical image quality assessment abilities in mllms.arXiv preprint arXiv:251...

work page arXiv 2025
[34]

Lidar-net: A real-scanned 3d point cloud dataset for indoor scenes

Yanwen Guo, Yuanqi Li, Dayong Ren, Xiaohong Zhang, Jiawei Li, Liang Pu, et al. Lidar-net: A real-scanned 3d point cloud dataset for indoor scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21989–21999, 2024

2024
[35]

Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data

Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Duc Thanh Nguyen, and Sai-Kit Yeung. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1588–1597, 2019. 13 PointQ-Bench arXiv preprint

2019
[36]

Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction

Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10901–10911, 2021

2021
[37]

Morley Mao

Jiachen Sun, Qingzhao Zhang, Bhavya Kailkhura, Zhiding Yu, Chaowei Xiao, and Z. Morley Mao. Benchmarking robustness of 3d point cloud recognition against common corruptions.arXiv preprint arXiv:2201.12296, 2022

work page arXiv 2022
[38]

Recommendation ITU-R BT.500-15: Methodologies for the subjective assessment of the quality of television images

International Telecommunication Union. Recommendation ITU-R BT.500-15: Methodologies for the subjective assessment of the quality of television images. ITU-R Recommendation, 2023. Approved in May 2023

2023
[39]

Recommendation ITU-T P.910: Subjective video quality assessment methods for multimedia applications

International Telecommunication Union. Recommendation ITU-T P.910: Subjective video quality assessment methods for multimedia applications. ITU-T Recommendation, 2023. Approved in October 2023

2023
[40]

Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit

Jacob Cohen. Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4):213–220, 1968

1968
[41]

Openai gpt-5 system card, 2025

Aaditya Singh, Adam Fry, Adam Perelman, et al. Openai gpt-5 system card, 2025

2025
[42]

Claude haiku 4.5 system card

Anthropic. Claude haiku 4.5 system card. https://assets.anthropic.com/m/99128ddd009bdcb/ original/Claude-Haiku-4-5-System-Card.pdf, October 2025. System card

2025
[43]

Qwen3.5: Accelerating productivity with native multimodal agents, February 2026

Qwen Team. Qwen3.5: Accelerating productivity with native multimodal agents, February 2026

2026
[44]

Internvl3.5: Advancing open-source multimodal models in versatility, reasoning, and efficiency, 2025

Weiyun Wang, Zhangwei Gao, Lixin Gu, et al. Internvl3.5: Advancing open-source multimodal models in versatility, reasoning, and efficiency, 2025

2025
[45]

Mistralai/mistral-small-3.1-24b-instruct-2503 – hugging face

Team MistralAI. Mistralai/mistral-small-3.1-24b-instruct-2503 – hugging face
[46]

Gemma 3 technical report, 2025

Gemma Team, Aishwarya Kamath, Johan Ferret, et al. Gemma 3 technical report, 2025

2025
[47]

Pointllm: Empowering large language models to understand point clouds

Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, and Dahua Lin. Pointllm: Empowering large language models to understand point clouds. InEuropean Conference on Computer Vision (ECCV), 2024

2024
[48]

Shapellm: Universal 3d object understanding for embodied interaction

Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, Li Yi, and Kaisheng Ma. Shapellm: Universal 3d object understanding for embodied interaction. InEuropean Conference on Computer Vision (ECCV), 2024

2024
[49]

Minigpt-3d: Efficiently aligning 3d point clouds with large language models using 2d priors

Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Yixue Hao, Long Hu, and Min Chen. Minigpt-3d: Efficiently aligning 3d point clouds with large language models using 2d priors. InProceedings of the 32nd ACM International Conference on Multimedia, pages 6617–6626, 2024. 14

2024

[1] [1]

Predicting the perceptual quality of point cloud: A 3d-to-2d projection-based exploration.IEEE Transactions on Multimedia, 2020

Qi Yang, Hao Chen, Zhan Ma, Yiling Xu, Rongjun Tang, and Jun Sun. Predicting the perceptual quality of point cloud: A 3d-to-2d projection-based exploration.IEEE Transactions on Multimedia, 2020

2020

[2] [2]

Perceptual quality assessment of colored 3d point clouds.IEEE Transactions on Visualization and Computer Graphics, 29(8):3642–3655, 2023

Honglei Su, Qi Liu, Zhengfang Duanmu, Wentao Liu, and Zhou Wang. Perceptual quality assessment of colored 3d point clouds.IEEE Transactions on Visualization and Computer Graphics, 29(8):3642–3655, 2023

2023

[3] [3]

Xinju Wu, Yun Zhang, Chunling Fan, Junhui Hou, and Sam Kwong. Subjective quality database and objective study of compressed point clouds with 6DoF head-mounted display.IEEE Transactions on Circuits and Systems for Video Technology, 31(12):4630–4644, 2021

2021

[4] [4]

Basics: Broad quality assessment of static point clouds in compression scenarios.arXiv preprint arXiv:2302.04796, 2023

Ali Ak, Emin Zerman, Maurice Quach, Aladine Chetouani, Aljosa Smolic, Giuseppe Valenzise, and Patrick Le Callet. Basics: Broad quality assessment of static point clouds in compression scenarios.arXiv preprint arXiv:2302.04796, 2023

work page arXiv 2023

[5] [5]

Yipeng Liu, Qi Yang, Yiling Xu, and Le Yang. Point cloud quality assessment: Dataset construction and learning- based no-reference metric.ACM Transactions on Multimedia Computing, Communications, and Applications, 18(3):1–24, 2022

2022

[6] [6]

Deep learning for 3d point clouds: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12):4338–4364, 2021

Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. Deep learning for 3d point clouds: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12):4338–4364, 2021

2021

[7] [7]

T3Bench: Benchmarking current progress in text-to-3d generation.CoRR, abs/2310.02977, 2023

Yuze He, Yushi Bai, Matthieu Lin, Wang Zhao, Yubin Hu, Jenny Sheng, Ran Yi, Juanzi Li, and Yong-Jin Liu. T3Bench: Benchmarking current progress in text-to-3d generation.CoRR, abs/2310.02977, 2023. 11 PointQ-Bench arXiv preprint 100 15 20 25 30 35 40 45 50 55 60How Score (M-F1) How (M-F1) Across Consensus Strata High-consensus Medium-consensus Boundary 0 P...

work page arXiv 2023

[8] [8]

Benchmarking and learning multi-dimensional quality evaluator for text-to-3d generation

Yujie Zhang, Bingyang Cui, Qi Yang, Zhu Li, and Yiling Xu. Benchmarking and learning multi-dimensional quality evaluator for text-to-3d generation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 18563–18574, 2025

2025

[9] [9]

Towards fine-grained text-to-3d quality assessment: A benchmark and a two-stage rank-learning metric.arXiv preprint arXiv:2509.23841, 2025

Bingyang Cui, Yujie Zhang, Qi Yang, Zhu Li, and Yiling Xu. Towards fine-grained text-to-3d quality assessment: A benchmark and a two-stage rank-learning metric.arXiv preprint arXiv:2509.23841, 2025

work page arXiv 2025

[10] [10]

Pcqm: A full-reference quality metric for colored 3d point clouds

Gabriel Meynet, Yana Nehmé, Julie Digne, and Guillaume Lavoué. Pcqm: A full-reference quality metric for colored 3d point clouds. In2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6. IEEE, 2020

2020

[11] [11]

Ms-graphsim: Inferring point cloud quality via multiscale graph similarity

Yujie Zhang, Qi Yang, and Yiling Xu. Ms-graphsim: Inferring point cloud quality via multiscale graph similarity. InProceedings of the 29th ACM International Conference on Multimedia, pages 1230–1238, 2021

2021

[12] [12]

Mped: Quantifying point cloud distortion based on multiscale potential energy discrepancy.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):6037–6054, 2022

Qi Yang, Yujie Zhang, Siheng Chen, Yiling Xu, Jun Sun, and Zhan Ma. Mped: Quantifying point cloud distortion based on multiscale potential energy discrepancy.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):6037–6054, 2022

2022

[13] [13]

Reduced-reference quality assessment of point clouds via content-oriented saliency projection.IEEE Signal Processing Letters, 30:354–358, 2023

Wei Zhou, Guanghui Yue, Ruizeng Zhang, Yipeng Qin, and Hantao Liu. Reduced-reference quality assessment of point clouds via content-oriented saliency projection.IEEE Signal Processing Letters, 30:354–358, 2023

2023

[14] [14]

No-reference point cloud quality assessment via domain adaptation

Qi Yang, Yipeng Liu, Siheng Chen, Yiling Xu, and Jun Sun. No-reference point cloud quality assessment via domain adaptation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21147–21156, 2022

2022

[15] [15]

Gpa-net: No-reference point cloud quality assessment with multi-task graph convolutional network.IEEE Transactions on Visualization and Computer Graphics, 30(8):4955–4967, 2024

Ziyu Shan, Qi Yang, Rui Ye, Yujie Zhang, Yiling Xu, Xiaozhong Xu, and Shan Liu. Gpa-net: No-reference point cloud quality assessment with multi-task graph convolutional network.IEEE Transactions on Visualization and Computer Graphics, 30(8):4955–4967, 2024

2024

[16] [16]

Perceptual quality assessment for point clouds: A survey.ZTE Communications, (4):3–16, 2023

Yingjie Zhou, Zicheng Zhang, Wei Sun, Xiongkuo Min, and Guangtao Zhai. Perceptual quality assessment for point clouds: A survey.ZTE Communications, (4):3–16, 2023

2023

[17] [17]

Dpcd: A quality assessment database for dynamic point clouds.arXiv preprint arXiv:2505.12431, 2025

Yating Liu, Yujie Zhang, Qi Yang, Yiling Xu, Zhu Li, and Ye-Kui Wang. Dpcd: A quality assessment database for dynamic point clouds.arXiv preprint arXiv:2505.12431, 2025. 12 PointQ-Bench arXiv preprint

work page arXiv 2025

[18] [18]

No-reference point cloud quality assessment via weighted patch quality prediction

Jun Cheng, Honglei Su, and Jari Korhonen. No-reference point cloud quality assessment via weighted patch quality prediction. InProceedings of the 35th International Conference on Software Engineering and Knowledge Engineering (SEKE), pages 298–303, 2023

2023

[19] [19]

No-reference point cloud geometry quality assessment based on pairwise rank learning.CoRR, abs/2211.01205, 2022

Zhiyong Su, Chao Chu, Long Chen, Yong Li, and Weiqing Li. No-reference point cloud geometry quality assessment based on pairwise rank learning.CoRR, abs/2211.01205, 2022

work page arXiv 2022

[20] [20]

Progressive knowledge transfer based on human visual perception mechanism for perceptual quality assessment of point clouds.CoRR, abs/2211.16646, 2022

Qi Liu, Yiyun Liu, Honglei Su, Hui Yuan, and Raouf Hamzaoui. Progressive knowledge transfer based on human visual perception mechanism for perceptual quality assessment of point clouds.CoRR, abs/2211.16646, 2022

work page arXiv 2022

[21] [21]

Contrastive pre-training with multi-view fusion for no-reference point cloud quality assessment

Ziyu Shan, Yujie Zhang, Qi Yang, Haichen Yang, Yiling Xu, Jenq-Neng Hwang, Xiaozhong Xu, and Shan Liu. Contrastive pre-training with multi-view fusion for no-reference point cloud quality assessment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 25942–25951, 2024

2024

[22] [22]

Mm-pcqa: Multi-modal learning for no-reference point cloud quality assessment

Zicheng Zhang, Wei Sun, Xiongkuo Min, Quan Zhou, Jun He, Qiyuan Wang, and Guangtao Zhai. Mm-pcqa: Multi-modal learning for no-reference point cloud quality assessment. InProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2023

2023

[23] [23]

Once-training-all-fine: No-reference point cloud quality assessment via domain-relevance degradation description.IEEE Transactions on Broadcasting, 71(2):616–630, 2025

Yipeng Liu, Qi Yang, Yujie Zhang, Yiling Xu, Le Yang, Xiaozhong Xu, and Shan-Lin Liu. Once-training-all-fine: No-reference point cloud quality assessment via domain-relevance degradation description.IEEE Transactions on Broadcasting, 71(2):616–630, 2025

2025

[24] [24]

Lmm-pcqa: Assisting point cloud quality assessment with lmm.arXiv preprint arXiv:2404.18203, 2024

Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Wei Sun, Chaofeng Chen, Xiongkuo Min, Xiaohong Liu, Weisi Lin, and Guangtao Zhai. Lmm-pcqa: Assisting point cloud quality assessment with lmm.arXiv preprint arXiv:2404.18203, 2024

work page arXiv 2024

[25] [25]

Clip-pcqa: Exploring subjective-aligned vision-language modeling for point cloud quality assessment

Yating Liu, Yujie Zhang, Ziyu Shan, and Yiling Xu. Clip-pcqa: Exploring subjective-aligned vision-language modeling for point cloud quality assessment. InProceedings of the AAAI Conference on Artificial Intelligence, 2025

2025

[26] [26]

Shashank Gupta, Gregoire Phillips, and Alan C. Bovik. Pit-qmm: A large multimodal model for no-reference point cloud quality assessment.arXiv preprint arXiv:2510.07636, 2025

work page arXiv 2025

[27] [27]

Towards explainable in-the-wild video quality assessment: A database and a language-prompted approach

Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, and Weisi Lin. Towards explainable in-the-wild video quality assessment: A database and a language-prompted approach. InProceedings of the 31st ACM International Conference on Multimedia, pages 1045–1054, 2023

2023

[28] [28]

Q-bench: A benchmark for general-purpose foundation models on low-level vision.arXiv preprint arXiv:2309.14181, 2023a

Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Chunyi Li, Wenxiu Sun, Qiong Yan, Guangtao Zhai, and Weisi Lin. Q-bench: A benchmark for general-purpose foundation models on low-level vision.arXiv preprint arXiv:2309.14181, 2024

work page arXiv 2024

[29] [29]

Depicting beyond scores: Advancing image quality assessment through multi-modal language models

Zhiyuan You, Zheyuan Li, Jinjin Gu, Zhenfei Yin, Tianfan Xue, and Chao Dong. Depicting beyond scores: Advancing image quality assessment through multi-modal language models. InEuropean Conference on Computer Vision (ECCV), 2024

2024

[30] [30]

Enhancing descriptive image quality assessment with a large-scale multi-modal dataset.IEEE Transactions on Image Processing, 34:8201–8215, 2025

Zhiyuan You, Jinjin Gu, Xin Cai, Zheyuan Li, Kaiwen Zhu, Chao Dong, and Tianfan Xue. Enhancing descriptive image quality assessment with a large-scale multi-modal dataset.IEEE Transactions on Image Processing, 34:8201–8215, 2025

2025

[31] [31]

Grounding-iqa: Grounding multimodal language model for image quality assessment

Zheng Chen, Xun Zhang, Wenbo Li, Renjing Pei, Fenglong Song, Xiongkuo Min, Xiaohong Liu, Xin Yuan, Yong Guo, and Yulun Zhang. Grounding-iqa: Grounding multimodal language model for image quality assessment. CoRR, abs/2411.17237, 2024

work page arXiv 2024

[32] [32]

Q-bench-video: Benchmark the video quality understanding of lmms

Zicheng Zhang, Ziheng Jia, Haoning Wu, Chunyi Li, Zijian Chen, Yingjie Zhou, Wei Sun, Xiaohong Liu, Xiongkuo Min, Weisi Lin, and Guangtao Zhai. Q-bench-video: Benchmark the video quality understanding of lmms. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3229–3239, 2025

2025

[33] [33]

Medq-bench: Evaluating and exploring medical image quality assessment abilities in mllms.arXiv preprint arXiv:2510.01691, 2025

Jiyao Liu, Jinjie Wei, Wanying Qu, Chenglong Ma, Junzhi Ning, Yunheng Li, Ying Chen, Xinzhe Luo, Pengcheng Chen, Xin Gao, Ming Hu, Huihui Xu, Xin Wang, Shujian Gao, Dingkang Yang, Zhongying Deng, Jin Ye, Lihao Liu, Junjun He, and Ningsheng Xu. Medq-bench: Evaluating and exploring medical image quality assessment abilities in mllms.arXiv preprint arXiv:251...

work page arXiv 2025

[34] [34]

Lidar-net: A real-scanned 3d point cloud dataset for indoor scenes

Yanwen Guo, Yuanqi Li, Dayong Ren, Xiaohong Zhang, Jiawei Li, Liang Pu, et al. Lidar-net: A real-scanned 3d point cloud dataset for indoor scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21989–21999, 2024

2024

[35] [35]

Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data

Mikaela Angelina Uy, Quang-Hieu Pham, Binh-Son Hua, Duc Thanh Nguyen, and Sai-Kit Yeung. Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1588–1597, 2019. 13 PointQ-Bench arXiv preprint

2019

[36] [36]

Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction

Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 10901–10911, 2021

2021

[37] [37]

Morley Mao

Jiachen Sun, Qingzhao Zhang, Bhavya Kailkhura, Zhiding Yu, Chaowei Xiao, and Z. Morley Mao. Benchmarking robustness of 3d point cloud recognition against common corruptions.arXiv preprint arXiv:2201.12296, 2022

work page arXiv 2022

[38] [38]

Recommendation ITU-R BT.500-15: Methodologies for the subjective assessment of the quality of television images

International Telecommunication Union. Recommendation ITU-R BT.500-15: Methodologies for the subjective assessment of the quality of television images. ITU-R Recommendation, 2023. Approved in May 2023

2023

[39] [39]

Recommendation ITU-T P.910: Subjective video quality assessment methods for multimedia applications

International Telecommunication Union. Recommendation ITU-T P.910: Subjective video quality assessment methods for multimedia applications. ITU-T Recommendation, 2023. Approved in October 2023

2023

[40] [40]

Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit

Jacob Cohen. Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4):213–220, 1968

1968

[41] [41]

Openai gpt-5 system card, 2025

Aaditya Singh, Adam Fry, Adam Perelman, et al. Openai gpt-5 system card, 2025

2025

[42] [42]

Claude haiku 4.5 system card

Anthropic. Claude haiku 4.5 system card. https://assets.anthropic.com/m/99128ddd009bdcb/ original/Claude-Haiku-4-5-System-Card.pdf, October 2025. System card

2025

[43] [43]

Qwen3.5: Accelerating productivity with native multimodal agents, February 2026

Qwen Team. Qwen3.5: Accelerating productivity with native multimodal agents, February 2026

2026

[44] [44]

Internvl3.5: Advancing open-source multimodal models in versatility, reasoning, and efficiency, 2025

Weiyun Wang, Zhangwei Gao, Lixin Gu, et al. Internvl3.5: Advancing open-source multimodal models in versatility, reasoning, and efficiency, 2025

2025

[45] [45]

Mistralai/mistral-small-3.1-24b-instruct-2503 – hugging face

Team MistralAI. Mistralai/mistral-small-3.1-24b-instruct-2503 – hugging face

[46] [46]

Gemma 3 technical report, 2025

Gemma Team, Aishwarya Kamath, Johan Ferret, et al. Gemma 3 technical report, 2025

2025

[47] [47]

Pointllm: Empowering large language models to understand point clouds

Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, and Dahua Lin. Pointllm: Empowering large language models to understand point clouds. InEuropean Conference on Computer Vision (ECCV), 2024

2024

[48] [48]

Shapellm: Universal 3d object understanding for embodied interaction

Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, Li Yi, and Kaisheng Ma. Shapellm: Universal 3d object understanding for embodied interaction. InEuropean Conference on Computer Vision (ECCV), 2024

2024

[49] [49]

Minigpt-3d: Efficiently aligning 3d point clouds with large language models using 2d priors

Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Yixue Hao, Long Hu, and Min Chen. Minigpt-3d: Efficiently aligning 3d point clouds with large language models using 2d priors. InProceedings of the 32nd ACM International Conference on Multimedia, pages 6617–6626, 2024. 14

2024