Multimodal QUD: Inquisitive Questions from Scientific Figures
Pith reviewed 2026-05-08 06:14 UTC · model grok-4.3
The pith
Author-annotated inquisitive questions drawn from scientific figures and their surrounding paper context improve vision-language models' ability to generate content-specific multimodal questions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We extend QUD theory to the multimodal case by collecting author-annotated questions that are raised by figures yet resolved only through the accompanying text, release the resulting MQUD dataset, and demonstrate that fine-tuning a VLM on these annotations produces questions that are more visually grounded and that demand higher-level multimodal reasoning than those generated by untuned models.
What carries the argument
The MQUD dataset of author-annotated multimodal questions under discussion, which captures implicit questions raised by a scientific figure and resolved by the paper's textual analysis.
If this is right
- Fine-tuned VLMs produce questions that are measurably more specific to the figure's role in the paper rather than generic visual descriptions.
- The same fine-tuning process yields questions that require cross-modal reasoning instead of isolated visual extraction.
- MQUD supplies a concrete benchmark for evaluating how well VLMs track discourse goals in scientific documents.
- The approach offers a scalable way to generate training data for models that aim to simulate human scientific reading.
Where Pith is reading between the lines
- Models trained this way could be applied to generate study questions or summaries that better anticipate what a reader needs to understand next in a paper.
- The dataset construction method could be adapted to other domains such as medical imaging or engineering diagrams where figures carry essential arguments.
- If the improvement generalizes, it suggests that explicit discourse-level annotations can serve as a stronger training signal than purely visual question-answering data.
Load-bearing premise
That the questions marked by the original authors accurately reflect the depth and type of inquisitive questions humans naturally raise when reading scientific figures together with their textual context.
What would settle it
A direct comparison in which human raters judge the visual grounding and reasoning depth of questions generated by the fine-tuned model versus a baseline model on a set of held-out papers; no improvement would falsify the claim.
Figures
read the original abstract
Asking inquisitive questions while reading, and looking for their answers, is an important part in human discourse comprehension, curiosity, and creative ideation, and prior work has investigated this in text-only scenarios. However, in scientific or research papers, many of the critical takeaways are conveyed through both figures and the text that analyzes them. While scientific visualizations have been used to evaluate Vision-Language Models (VLMs) capabilities, current benchmarks are limited to questions that focus simply on extracting information from them. Such questions only require lower-level reasoning, do not take into account the context in which a figure appears, and do not reflect the communicative goals the authors wish to achieve. We generate inquisitive questions that reach the depth of questions humans generate when engaging with scientific papers, conditioned on both the figure and the paper's context, and require reasoning across both modalities. To do so, we extend the linguistic theory of Questions Under Discussion (QUD) from being text-only to multimodal, where implicit questions are raised and resolved as discourse progresses. We present MQUD, a dataset of research papers in which such questions are made explicit and annotated by the original authors. We show that fine-tuning a VLM on MQUD shifts the model from generating generic low-level visual questions to content-specific grounding that requires a high-level of multimodal reasoning, yielding higher-quality, more visually grounded multimodal QUD generation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces MQUD, a dataset of multimodal Questions Under Discussion (QUD) derived from scientific figures and surrounding paper text, with questions explicitly annotated by the original paper authors. It extends linguistic QUD theory to multimodal discourse and reports that fine-tuning a vision-language model on MQUD shifts output from generic low-level visual questions toward content-specific, high-level questions requiring cross-modal reasoning and better visual grounding.
Significance. If the central empirical claims hold after addressing annotation validity, the work would offer a novel resource and training signal for VLMs on scientific multimodal reasoning, moving beyond standard visual QA toward inquisitive, context-aware question generation with potential uses in research tools and scientific education. The theoretical extension of QUD is a clear strength.
major comments (2)
- [Dataset construction] Dataset construction section: The central claim that MQUD captures 'the depth of questions humans generate when engaging with scientific papers' rests on author annotations. No evidence is provided of inter-annotator agreement with independent readers, controls for author knowledge leakage, or external validation that the questions reflect natural reader inquisitiveness rather than expert intent. This directly undermines the interpretation of fine-tuning gains as genuine multimodal reasoning improvements rather than dataset artifacts.
- [Experiments] Experiments and evaluation section: The reported shift to 'higher-quality, more visually grounded multimodal QUD generation' is not supported by any quantitative metrics, human evaluation protocol, baseline comparisons, or example outputs in the provided description. Without these, the empirical result cannot be assessed for effect size or robustness.
minor comments (1)
- [Abstract] The abstract would be strengthened by including at least one key quantitative result or example to convey the scale of the improvement.
Simulated Author's Rebuttal
Thank you for the detailed and constructive review. We appreciate the feedback highlighting the need for stronger validation of the dataset and more rigorous presentation of experimental results. We address each major comment below and have revised the manuscript to incorporate clarifications, additional discussion, and expanded evaluation details.
read point-by-point responses
-
Referee: [Dataset construction] Dataset construction section: The central claim that MQUD captures 'the depth of questions humans generate when engaging with scientific papers' rests on author annotations. No evidence is provided of inter-annotator agreement with independent readers, controls for author knowledge leakage, or external validation that the questions reflect natural reader inquisitiveness rather than expert intent. This directly undermines the interpretation of fine-tuning gains as genuine multimodal reasoning improvements rather than dataset artifacts.
Authors: Author annotations were chosen to directly capture the intended communicative goals and implicit QUDs that the paper authors had when designing the figures and text, which aligns with the core of QUD theory (questions that advance the discourse from the producer's perspective). This approach ensures high fidelity to the scientific context rather than relying on external readers inferring intent. We agree that independent validation would further strengthen claims about reflecting natural reader inquisitiveness. In the revision, we have expanded the dataset section with details on the annotation guidelines, added a limitations paragraph explicitly discussing potential author bias and knowledge leakage, and included a small pilot comparison with questions from independent readers. We do not claim the dataset is the only possible set of questions but argue it provides a valuable, grounded training signal; the fine-tuning results demonstrate improved multimodal reasoning regardless. revision: partial
-
Referee: [Experiments] Experiments and evaluation section: The reported shift to 'higher-quality, more visually grounded multimodal QUD generation' is not supported by any quantitative metrics, human evaluation protocol, baseline comparisons, or example outputs in the provided description. Without these, the empirical result cannot be assessed for effect size or robustness.
Authors: We agree that the initial presentation of results required more explicit detail to allow assessment. The manuscript includes human evaluation protocols (rating on specificity, visual grounding, and relevance), baseline comparisons against zero-shot and few-shot VLM prompting, and example outputs. In the revised version, we have restructured the Experiments section to prominently feature quantitative results (e.g., human agreement rates and automated metrics for question complexity), full evaluation protocols, effect size reporting, and additional example generations in the main text and appendix. This makes the empirical claims fully assessable. revision: yes
Circularity Check
No circularity: new dataset construction and empirical fine-tuning evaluation are independent of inputs.
full rationale
The paper introduces MQUD as a novel dataset of author-annotated multimodal questions under discussion from scientific papers, then reports empirical results from fine-tuning VLMs on this dataset to measure shifts in generated question quality and grounding. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text. The central claims rest on the external validity of the new annotations and the observable differences in model outputs, which do not reduce to the annotation process by construction. This is a standard empirical pipeline with no detectable circular steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Express genuine curiosity (why, how, what)
Questions arise from viewing the figure:Reference what isvisible(trends, differences, patterns, values, annotations). Express genuine curiosity (why, how, what). Cannot be answered from the caption alone
-
[2]
Should provide insight, not just restate the question
Answers from paper text:2–4 substantive sentences providing interpretation, cause, or context. Should provide insight, not just restate the question
-
[3]
Match the paper’s sophisti- cation level
Natural research language:Write from a researcher’s perspective. Match the paper’s sophisti- cation level. Vary question structures naturally
-
[4]
Why does the accuracy drop sharply after 100 tokens in the left panel?
Specific to this figure:Reference concrete visual elements (lines, bars, regions, numerical values). Use terms from the provided paragraphs, not generic placeholders. Question type examples(good✓and bad ×): •Cause:✓“Why does the accuracy drop sharply after 100 tokens in the left panel?” • Comparison: ✓ “How does the baseline’s behavior differ from the pro...
-
[5]
Every claim in the answer must be traceable to the caption and/or source text
-
[6]
can be identified by looking at the figure
The answer must be concrete and informative — non-answers like “can be identified by looking at the figure” arenotgrounded. Figure caption:{caption}. Source text:{source text}. Question:{question}. Answer:{answer}. Output:JSON withgrounded(boolean) andreason(brief explanation). G.4 Zero-shot LLM judge We evaluate all 1,250 QUDs using a zero-shot LLM judge...
work page 2001
-
[7]
Juan P . Farias and Jonathan C. Tan. On the Formation of Runaway Stars BN and x in the Orion Nebula Cluster.Astronomy and Astrophysics, 2018
work page 2018
-
[8]
Venkata Subrahmanyan Govindarajan, Benjamin Van Durme, and Aaron Steven White. Decomposing Generalization: Models of Generic, Habitual, and Episodic Statements.Trans- actions of the Association for Computational Linguistics, 2019
work page 2019
-
[9]
Juan P . Farias, Jonathan C. Tan, and Laurent Eyer. Hunting for Runaways from the Orion Nebula Cluster.The Astrophysical Journal, 2020
work page 2020
-
[10]
Help! Need Advice on Identifying Advice
Venkata Subrahmanyan Govindarajan, Benjamin T Chen, Rebecca Warholic, Katrin Erk, and Junyi Jessy Li. Help! Need Advice on Identifying Advice. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
work page 2020
-
[11]
IsoScore: Measuring the Uniformity of Embedding Space Utilization
William Rudman, Nate Gillman, Taylor Rayne, and Carsten Eickhoff. IsoScore: Measuring the Uniformity of Embedding Space Utilization. InFindings of the Association for Computational Linguistics: ACL 2022, 2022
work page 2022
-
[12]
Aleksey Generozov and Hagai B. Perets. Constraints on the origins of hypervelocity stars: velocity distribution, mergers and star-formation history.Monthly Notices of the Royal Astro- nomical Society, 2022
work page 2022
-
[13]
Yu Liu, Pengyu Nie, Owolabi Legunsen, and Milos Gligoric. Inline Tests. InProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022
work page 2022
-
[14]
Venkata S Govindarajan, Katherine Atwell, Barea Sinno, Malihe Alikhani, David I. Beaver, and Junyi Jessy Li. How people talk about each other: Modeling Generalized Intergroup Bias and Emotion. InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023
work page 2023
-
[15]
Kanishka Misra, Julia Taylor Rayz, and Allyson Ettinger. COMPS: Conceptual Minimal Pair Sentences for testing Robust Property Knowledge and its Inheritance in Pre-trained Language Models. InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023
work page 2023
-
[16]
Aleksey Generozov and Hagai B. Perets. Capture of stars into gaseous discs around massive black holes: Alignment, circularization and growth.Monthly Notices of the Royal Astronomical Society, 2023
work page 2023
-
[17]
Huancheng Chen, Johnny Wang, and Haris Vikalo. The Best of Both Worlds: Accurate Global and Personalized Models through Federated Learning with Data-Free Hyper-Knowledge Distillation.arXiv preprint, 2023
work page 2023
-
[18]
Hsin-Pei Chen, Shiau-Jie Rau, and Kuo-Chuan Pan. Exploring the Observability of Surviving Companions of Stripped-Envelope Supernovae: A Case Study of Type Ic SN 2020oi.The Astrophysical Journal, 2023
work page 2023
-
[19]
Elaborative Simplification as Implicit Questions Under Discussion
Yating Wu, William Sheffield, Kyle Mahowald, and Junyi Jessy Li. Elaborative Simplification as Implicit Questions Under Discussion. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
work page 2023
-
[20]
pytest-inline: An Inline Testing Tool for Python
Yu Liu, Zachary Thurston, Alan Han, Pengyu Nie, Milos Gligoric, and Owolabi Legunsen. pytest-inline: An Inline Testing Tool for Python. In2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), 2023
work page 2023
-
[21]
Stable Anisotropic Regularization.arXiv preprint, 2023
William Rudman and Carsten Eickhoff. Stable Anisotropic Regularization.arXiv preprint, 2023
work page 2023
-
[22]
ReFACT: Updating Text-to-Image Models by Editing the Text Encoder
Dana Arad, Hadas Orgad, and Yonatan Belinkov. ReFACT: Updating Text-to-Image Models by Editing the Text Encoder. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
work page 2024
-
[23]
One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data
Michal Golovanevsky, Eva Schiller, Akira Nair, Eric Han, Ritambhara Singh, and Carsten Eickhoff. One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data. InBiocomputing 2025, 2024. 25
work page 2025
-
[24]
Multilingual Code Co- Evolution Using Large Language Models
Jiyang Zhang, Pengyu Nie, Junyi Jessy Li, and Milos Gligoric. Multilingual Code Co- Evolution Using Large Language Models. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023
work page 2023
-
[25]
Juan P . Farias, Stella S. R. Offner, Michael Y. Grudi´c, D´avid Guszejnov, and Anna L. Rosen. Stellar Populations in STARFORGE: The Origin and Evolution of Star Clusters and Associa- tions.Monthly Notices of the Royal Astronomical Society, 2023
work page 2023
-
[26]
Heterogeneity-Guided Client Sampling: Towards Fast and Efficient Non-IID Federated Learning
Huancheng Chen and Haris Vikalo. Heterogeneity-Guided Client Sampling: Towards Fast and Efficient Non-IID Federated Learning. InAdvances in Neural Information Processing Systems 37, 2024
work page 2024
-
[27]
QUDEVAL: The Evaluation of Questions Under Discussion Discourse Parsing
Yating Wu, Ritika Mangla, Greg Durrett, and Junyi Jessy Li. QUDEVAL: The Evaluation of Questions Under Discussion Discourse Parsing. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
work page 2023
-
[28]
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices
Huancheng Chen and Haris Vikalo. Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
work page 2024
-
[29]
A. Generozov and H. B. Perets. A Triple Scenario for the Formation of Wide Black Hole Binaries Such As Gaia BH1.The Astrophysical Journal, 2024
work page 2024
-
[30]
Dimakis, Greg Durrett, and Junyi Jessy Li
Yating Wu, Ritika Mangla, Alexandros G. Dimakis, Greg Durrett, and Junyi Jessy Li. Which questions should I answer? Salience Prediction of Inquisitive Questions. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
work page 2024
-
[31]
Recovering Labels from Local Updates in Federated Learning.arXiv preprint, 2024
Huancheng Chen and Haris Vikalo. Recovering Labels from Local Updates in Federated Learning.arXiv preprint, 2024
work page 2024
-
[32]
Shan Chen, Jack Gallifant, Mingye Gao, Pedro Moreira, Nikolaj Munch, Ajay Muthukkumar, Arvind Rajan, Jaya Kolluri, Amelia Fiske, Janna Hastings, Hugo Aerts, Brian Anthony, Leo Anthony Celi, William G. La Cava, and Danielle S. Bitterman. Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias. InAdvances in Neural Info...
work page 2024
-
[33]
exLong: Generating Exceptional Behavior Tests with Large Language Models.arXiv preprint, 2024
Jiyang Zhang, Yu Liu, Pengyu Nie, Junyi Jessy Li, and Milos Gligoric. exLong: Generating Exceptional Behavior Tests with Large Language Models.arXiv preprint, 2024
work page 2024
-
[34]
Michal Golovanevsky, William Rudman, Vedant Palit, Ritambhara Singh, and Carsten Eickhoff. What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian- Noise-free Text-Image Corruption and Evaluation.arXiv preprint, 2024
work page 2024
-
[35]
Do they mean ’us’? Interpreting Referring Expressions in Intergroup Bias
Venkata S Govindarajan, Matianyu Zang, Kyle Mahowald, David Beaver, and Junyi Jessy Li. Do they mean ’us’? Interpreting Referring Expressions in Intergroup Bias. InFindings of the Association for Computational Linguistics: EMNLP 2024, 2024
work page 2024
-
[36]
Kuleen Sasse, Shinjitha Vadlakonda, Richard E. Kennedy, and John D. Osborne. Disease Entity Recognition and Normalization is Improved with Large Language Model Derived Synthetic Normalized Mentions.arXiv preprint, 2024
work page 2024
-
[37]
Juan Diego Rodriguez, Aaron Mueller, and Kanishka Misra. Characterizing the Role of Similarity in the Property Inferences of Language Models.arXiv preprint, 2024
work page 2024
-
[38]
Aklant K Bhowmick, Laura Blecha, Paul Torrey, Rachel S Somerville, Luke Zoltan Kelley, Rainer Weinberger, Mark Vogelsberger, Lars Hernquist, Priyamvada Natarajan, Jonathan Kho, and Tiziana Di Matteo. Signatures of black hole seeding in the local Universe: Predic- tions from the BRAHMA cosmological simulations.arXiv preprint, 2024
work page 2024
-
[39]
Making FETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats
Kuleen Sasse, Carlos Aguirre, Isabel Cachola, Sharon Levy, and Mark Dredze. Making FETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
work page 2025
-
[40]
Enhancing Retrieval- Augmented Generation: A Study of Best Practices.arXiv preprint, 2025
Siran Li, Linus Stenzel, Carsten Eickhoff, and Seyed Ali Bahrainian. Enhancing Retrieval- Augmented Generation: A Study of Best Practices.arXiv preprint, 2025
work page 2025
-
[41]
Xiangyu Dong, Xingyi Zhang, Lei Chen, Mingxuan Yuan, and Sibo Wang. SpaceGNN: Multi-Space Graph Neural Network for Node Anomaly Detection with Extremely Limited Labels.arXiv preprint, 2025
work page 2025
-
[42]
Can Language Models Learn Typologically Implausible Languages?arXiv preprint, 2025
Tianyang Xu, Tatsuki Kuribayashi, Yohei Oseki, Ryan Cotterell, and Alex Warstadt. Can Language Models Learn Typologically Implausible Languages?arXiv preprint, 2025. 26
work page 2025
-
[43]
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind.arXiv preprint, 2025
William Rudman, Michal Golovanevsky, Amir Bar, Vedant Palit, Yann LeCun, Carsten Eickhoff, and Ritambhara Singh. Forgotten Polygons: Multimodal Large Language Models are Shape-Blind.arXiv preprint, 2025
work page 2025
-
[44]
Towards More Accurate Full-Atom Antibody Co-Design.arXiv preprint, 2025
Jiayang Wu, Xingyi Zhang, Xiangyu Dong, Kun Xie, Ziqi Liu, Wensheng Gan, Sibo Wang, and Le Song. Towards More Accurate Full-Atom Antibody Co-Design.arXiv preprint, 2025
work page 2025
-
[45]
Fast and Accurate Antibody Sequence Design via Structure Retrieval
Xingyi Zhang, Kun Xie, Ningqiao Huang, Wei Liu, Peilin Zhao, Sibo Wang, Kangfei Zhao, and Biaobin Jiang. Fast and Accurate Antibody Sequence Design via Structure Retrieval. arXiv preprint, 2025
work page 2025
-
[46]
QUDsim: Quantifying Discourse Similarities in LLM-Generated Text.arXiv preprint, 2025
Ramya Namuduri, Yating Wu, Anshun Asher Zheng, Manya Wadhwa, Greg Durrett, and Junyi Jessy Li. QUDsim: Quantifying Discourse Similarities in LLM-Generated Text.arXiv preprint, 2025
work page 2025
- [47]
-
[48]
Michal Golovanevsky, William Rudman, Michael Lepori, Amir Bar, Ritambhara Singh, and Carsten Eickhoff. Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts.arXiv preprint, 2025
work page 2025
-
[49]
SAEs Are Good for Steering – If You Select the Right Features
Dana Arad, Aaron Mueller, and Yonatan Belinkov. SAEs Are Good for Steering – If You Select the Right Features. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
work page 2025
-
[50]
The Harmonic Structure of Information Contours.arXiv preprint, 2025
Eleftheria Tsipidi, Samuel Kiegeland, Franz Nowak, Tianyang Xu, Ethan Wilcox, Alex Warstadt, Ryan Cotterell, and Mario Giulianelli. The Harmonic Structure of Information Contours.arXiv preprint, 2025
work page 2025
-
[51]
Jonathan Kho, Aklant K. Bhowmick, Paul Torrey, Alex M. Garcia, Niusha Ahvazi, Laura Blecha, and Mark Vogelsberger. Signatures of BH seeding on the M•–σ relation: Predictions from the BRAHMA simulations.arXiv preprint, 2025
work page 2025
-
[52]
PiCME: Pipeline for Contrastive Modality Evaluation and Encoding in the MIMIC Dataset
Michal Golovanevsky, Pranav Mahableshwarkar, Carsten Eickhoff, and Ritambhara Singh. PiCME: Pipeline for Contrastive Modality Evaluation and Encoding in the MIMIC Dataset. arXiv preprint, 2025
work page 2025
-
[53]
Yu-Hui Wang, Hsin-Pei Chen, and Kuo-Chuan Pan. Type Ia Supernova Progenitors and Surviving Companions within the Symbiotic Channel.The Astrophysical Journal, 2025
work page 2025
-
[54]
Yulu Qin, Dheeraj Varghese, Adam Dahlgren Lindstr¨om, Lucia Donatelli, Kanishka Misra, and Najoung Kim. Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It.arXiv preprint, 2025
work page 2025
-
[55]
CRISP: Persistent Concept Unlearning via Sparse Autoencoders.arXiv preprint, 2025
Tomer Ashuach, Dana Arad, Aaron Mueller, Martin Tutek, and Yonatan Belinkov. CRISP: Persistent Concept Unlearning via Sparse Autoencoders.arXiv preprint, 2025
work page 2025
-
[56]
Task-Agnostic Federated Continual Learning via Replay-Free Gradient Projection.arXiv preprint, 2025
Seohyeon Cha, Huancheng Chen, and Haris Vikalo. Task-Agnostic Federated Continual Learning via Replay-Free Gradient Projection.arXiv preprint, 2025
work page 2025
-
[57]
Bhowmick, Laura Blecha, Paul Torrey, Luke Zoltan Kelley, Priyamvada Natarajan, Rachel S
Aklant K. Bhowmick, Laura Blecha, Paul Torrey, Luke Zoltan Kelley, Priyamvada Natarajan, Rachel S. Somerville, Rainer Weinberger, Alex M. Garcia, Lars Hernquist, Tiziana Di Matteo, Jonathan Kho, and Mark Vogelsberger. Heavy seeds and the first black holes: Insights from the BRAHMA simulations.arXiv preprint, 2025
work page 2025
-
[58]
Dark & Stormy: Modeling Humor in the Worst Sentences Ever Written.arXiv preprint, 2025
Venkata S Govindarajan and Laura Biester. Dark & Stormy: Modeling Humor in the Worst Sentences Ever Written.arXiv preprint, 2025
work page 2025
-
[59]
Yanhong Li, Tianyang Xu, Kenan Tang, Karen Livescu, David McAllester, and Jiawei Zhou. OKBench: Democratizing LLM Evaluation with Fully Automated, On-Demand, Open Knowledge Benchmarking.arXiv preprint, 2025
work page 2025
-
[60]
Seohyeon Cha, Huancheng Chen, Dongjun Kim, Haoran Zhang, Kevin Chan, Gustavo de Veciana, and Haris Vikalo. Regularized Calibration with Successive Rounding for Post- Training Quantization.arXiv preprint, 2026
work page 2026
-
[61]
Siran Li, Li Mi, Javiera Castillo-Navarro, and Devis Tuia. Questions beyond Pixels: Integrat- ing Commonsense Knowledge in Visual Question Generation for Remote Sensing.arXiv preprint, 2026
work page 2026
-
[62]
Knowledge-aware Visual Question Generation for Remote Sensing Images
Siran Li, Li Mi, Javiera Castillo-Navarro, and Devis Tuia. Knowledge-aware Visual Question Generation for Remote Sensing Images. InIGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium, 2024. 27
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.