pith. sign in

arxiv: 2409.14634 · v7 · pith:3SZFULCKnew · submitted 2024-09-23 · 💻 cs.HC · cs.AI

Human-LLM Compound System for Scientific Ideation through Facet Recombination and Novelty Evaluation

Pith reviewed 2026-05-23 20:55 UTC · model grok-4.3

classification 💻 cs.HC cs.AI
keywords scientific ideationhuman-LLM collaborationfacet recombinationcreativity support toolsidea novelty evaluationanalogous paper retrievalLLM applications in research
0
0 comments X

The pith

Scideator improves scientific ideation by letting users recombine extracted purposes, mechanisms, and evaluations from papers through a human-LLM system.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Scideator as the first human-LLM compound system designed for facet-based scientific ideation. It extracts purposes, mechanisms, and evaluations from user-provided and related papers, then supports interactive recombination via three modules: a Faceted Idea Generator for analogy-based idea synthesis, an Analogous Paper Facet Finder for distance-controlled retrieval across topics, and an Idea Novelty Checker that uses facets in a retrieve-then-rerank pipeline. In a user study with computer science researchers, the full system delivered significantly more creativity support than a baseline using the same LLM without the facet modules, with gains especially in idea exploration and expressiveness. Ablations confirmed that the facet structure improves both retrieval relevance and novelty classification accuracy over unstructured approaches.

Core claim

Scideator extracts key facets (purposes, mechanisms, and evaluations) from papers, enables human-in-the-loop recombination to synthesize ideas by finding analogies, surfaces papers at varying distances via controlled retrieval, and verifies novelty with a facet-grounded retrieve-then-rerank method, resulting in greater creativity support than a plain LLM baseline in a user study with computer science researchers.

What carries the argument

Facet recombination, in which users select purposes, mechanisms, and evaluations extracted by the LLM from papers and the system generates ideas by identifying analogies across those facets.

If this is right

  • Users gain a spectrum of ideation directions from same-topic to distant papers through distance-controlled retrieval.
  • Facet-based retrieve-then-rerank surfaces more relevant papers for novelty checking than standard retrieval methods.
  • A facet-grounded novelty classifier outperforms classifiers that reason over unstructured text of ideas and papers.
  • The system particularly boosts idea exploration and expressiveness compared to using the backbone LLM without facet modules.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This facet approach could extend to domains outside computer science if the extraction accuracy holds across fields.
  • Researchers might spend less time manually scanning literature if facet extraction becomes reliable enough for direct use in recombination.
  • Future tests could check whether the creativity gains persist when users start from fewer seed papers or when the system is applied to hypothesis generation rather than full idea synthesis.

Load-bearing premise

LLM-extracted facets from papers are accurate and complete enough that recombining them produces scientifically coherent and novel ideas that users can productively evaluate.

What would settle it

An experiment in which independent experts rate ideas generated via Scideator as no more novel or coherent than those from the plain-LLM baseline, or in which many recombined ideas are later found to already exist in the literature.

Figures

Figures reproduced from arXiv: 2409.14634 by Daniel S. Weld, Marissa Radensky, Pao Siangliulue, Raymond Fok, Simra Shahid, Tom Hope.

Figure 1
Figure 1. Figure 1: The Scideator workflow. 1) The interaction begins with the user providing an ideation topic and set of input papers as a starting point for ideation. 2) Scideator responds by retrieving analogous papers to the input papers and extracting facets (purpose, mechanism, and evaluation) from the input and analogous papers. (The evaluation facets are omitted in the figure for clarity, as it is not part of the mai… view at source ↗
Figure 2
Figure 2. Figure 2: Scideator’s cold start. Above, the user selects or adds facets to generate ideas. They can also generate more facets to consider, and add custom instructions for the idea generation. Below, the user peruses their ideas and evaluates an idea for novelty by clicking the search icon to its left. The ideation topic here is human-AI collaboration in art. Hope et al. found that this faceted idea framework helps … view at source ↗
Figure 3
Figure 3. Figure 3: The Analogous Paper Facet Finder module. For a [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Scideator’s novelty assessment modal for one idea, which presents the idea (a) as well as its facets (b), related papers (c), adjustable novelty classification (d), and adjustable classification reason (e). When the idea is classified as “not novel,” the system provides a set of three suggestions for more novel ideas (f), each of which replace one of the idea’s original facets. The ideation topic here is h… view at source ↗
Figure 5
Figure 5. Figure 5: The Idea Novelty Checker module follows a retrieve-then-re-rank approach for novelty evaluation. In Step 1, it gathers [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The cold start of the baseline UI for the user study’s idea-generation task. The ideation topic here is human-AI [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: (a) The difference between participants’ unweighted CSI scores for [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: (a) Participants more often opted to select their own facets rather than let the LLM select for them. (b) Participants [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Participants’ average perceived idea novelty before (a) and after (b) utilizing their assigned tool for idea novelty [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Performance trends of test accuracy across prompts during prompt optimization with TextGRAD. Highlighted text [PITH_FULL_IMAGE:figures/full_fig_p034_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: contd. TextGrad Prompt Optimisation [PITH_FULL_IMAGE:figures/full_fig_p035_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: contd. TextGrad Prompt Optimisation [PITH_FULL_IMAGE:figures/full_fig_p036_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Two example ideas used as the basis for comparison in subsequent figures, evaluated by [PITH_FULL_IMAGE:figures/full_fig_p037_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Reviews corresponding to idea 1 in Figure 13. [PITH_FULL_IMAGE:figures/full_fig_p038_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Reviews corresponding to idea 2 in Figure 13. [PITH_FULL_IMAGE:figures/full_fig_p039_15.png] view at source ↗
read the original abstract

The scientific ideation process often involves blending facets of existing papers to create new ideas. We contribute Scideator, the first human-LLM system for facet-based scientific ideation. Starting from user-provided papers, Scideator extracts key facets -- purposes, mechanisms, and evaluations -- from these and related papers, allowing users to interactively recombine facets to synthesize ideas. Scideator is driven by three design choices: (1) human-in-the-loop facet recombination, in which users select facets from retrieved papers and the system generates ideas by finding analogies across them via the Faceted Idea Generator module; (2) distance-controlled retrieval via the Analogous Paper Facet Finder module, which surfaces papers ranging from the same topic to entirely different areas to provide a spectrum of directions; and (3) facet-based novelty verification via the Idea Novelty Checker module, a retrieve-then-rerank pipeline that helps users to evaluate idea originality using facets. In a user study with computer science researchers, Scideator provided significantly more creativity support than a baseline using the same backbone LLM without our facet-based modules, particularly in idea exploration and expressiveness. Ablations further show that the facets benefit the novelty checker: facet-based retrieve-then-rerank surfaces more relevant papers than standard retrieval and re-ranking, and a facet-grounded novelty classifier outperforms classifiers that reason over unstructured ideas and papers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Scideator, a human-LLM compound system for scientific ideation. It extracts facets (purposes, mechanisms, evaluations) from user-provided and retrieved papers, supports interactive facet recombination via analogy in the Faceted Idea Generator, uses distance-controlled retrieval in the Analogous Paper Facet Finder to surface papers from similar to distant domains, and includes a retrieve-then-rerank Idea Novelty Checker grounded in facets. A user study with computer science researchers reports that Scideator provides significantly more creativity support than a baseline LLM without the facet modules, especially for idea exploration and expressiveness; ablations indicate that facet-based retrieval and classification improve novelty checking over unstructured baselines.

Significance. If the user-study results hold under fuller reporting, the work offers a structured, facet-driven alternative to unstructured LLM ideation that integrates human control with retrieval and verification modules. The explicit ablations on the novelty checker (facet-based retrieve-then-rerank vs. standard methods, facet-grounded classifier vs. unstructured) and the human-in-the-loop design are concrete strengths that allow partial isolation of the contribution. The distance-controlled retrieval spectrum is a useful design choice for controlling exploration breadth.

major comments (2)
  1. [User Study] User Study section (and abstract): the central claim of 'significantly more creativity support' is reported without participant count (n), statistical tests performed, blinding procedures, exact system prompts, or raw score distributions. These omissions make it impossible to assess effect size, power, or reproducibility of the reported difference, which is load-bearing for the empirical contribution.
  2. [Idea Novelty Checker] § on Idea Novelty Checker: while ablations are presented, the paper does not report inter-annotator agreement or error analysis on the LLM-extracted facets themselves; if facet extraction accuracy is low, the downstream recombination and novelty signals rest on an unquantified foundation.
minor comments (2)
  1. [System Overview] Notation for facets (purposes, mechanisms, evaluations) is introduced without an explicit formal definition or example table showing extraction output for a sample paper.
  2. [Figures] Figure captions for the system pipeline and retrieval spectrum could more explicitly label the three modules (Faceted Idea Generator, Analogous Paper Facet Finder, Idea Novelty Checker) to aid navigation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for recognizing the value of the human-in-the-loop design, distance-controlled retrieval, and the ablations on the novelty checker. We address each major comment below.

read point-by-point responses
  1. Referee: [User Study] User Study section (and abstract): the central claim of 'significantly more creativity support' is reported without participant count (n), statistical tests performed, blinding procedures, exact system prompts, or raw score distributions. These omissions make it impossible to assess effect size, power, or reproducibility of the reported difference, which is load-bearing for the empirical contribution.

    Authors: We agree that these details are required to evaluate the strength and reproducibility of the user-study results. The current manuscript does not report the participant count, the statistical tests performed, blinding procedures, exact system prompts, or raw score distributions. In the revised manuscript we will expand the User Study section (and update the abstract) to include the number of participants, the specific statistical tests and results, blinding procedures, the exact prompts, and raw score distributions or summary visualizations. revision: yes

  2. Referee: [Idea Novelty Checker] § on Idea Novelty Checker: while ablations are presented, the paper does not report inter-annotator agreement or error analysis on the LLM-extracted facets themselves; if facet extraction accuracy is low, the downstream recombination and novelty signals rest on an unquantified foundation.

    Authors: We agree that an explicit quantification of facet-extraction reliability would strengthen the claims about the downstream modules. The manuscript does not currently report inter-annotator agreement or an error analysis on the LLM-extracted facets. In the revision we will add a targeted analysis of facet-extraction quality, for example by reporting agreement or error rates on a sampled set of extractions, to provide a quantified basis for the recombination and novelty-checking components. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical human-LLM system (Scideator) whose central claims rest on a comparative user study with computer science researchers and ablations of its modules. No equations, fitted parameters, or derivation chain appear in the provided material. The evaluation compares the full facet-based system to a baseline using the same backbone LLM, making the reported superiority an external, falsifiable outcome rather than a quantity that reduces to the system's own inputs by construction. No self-citation load-bearing steps, self-definitional relations, or renamed known results are present.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The system depends on the untested premise that current LLMs can extract and analogize facets reliably enough for the recombination and novelty modules to be useful; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption Large language models can reliably extract purposes, mechanisms, and evaluations from scientific papers and perform analogy-based recombination across them.
    This extraction and recombination capability is invoked as the foundation for the Faceted Idea Generator, Analogous Paper Facet Finder, and Idea Novelty Checker modules.

pith-pipeline@v0.9.0 · 5797 in / 1377 out tokens · 35471 ms · 2026-05-23T20:55:52.377332+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 12 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Evolving Idea Graphs with Learnable Edits-and-Commits for Multi-Agent Scientific Ideation

    cs.MA 2026-05 unverdicted novelty 7.0

    EIG represents research ideas as evolving graphs with nodes for claims and edges for relations, using a learned controller for edits and commits to produce higher-quality scientific proposals than text-only multi-agen...

  2. ResearchCube: Multi-Dimensional Trade-off Exploration for Research Ideation

    cs.HC 2026-04 unverdicted novelty 7.0

    ResearchCube provides a 3D spatial interface with bipolar trade-off dimensions and direct-manipulation interactions to support multi-dimensional research ideation, shown helpful in a study with 11 researchers for exte...

  3. LitPivot: Developing Well-Situated Research Ideas Through Dynamic Contextualization and Critique within the Literature Landscape

    cs.HC 2026-04 conditional novelty 7.0

    LitPivot introduces literature-initiated pivots where engagement with dynamically retrieved papers prompts revisions to a developing research idea.

  4. The Alien Space of Science: Sampling Coherent but Cognitively Unavailable Research Directions

    cs.AI 2026-03 conditional novelty 7.0

    A framework decomposes LLM papers into idea atoms, trains coherence and availability models over the resulting vocabulary, and samples atom combinations that are coherent yet unlikely under existing author communities.

  5. IDRBench: Understanding the Capability of Large Language Models on Interdisciplinary Research

    cs.CL 2025-07 unverdicted novelty 7.0

    IDRBench is presented as the first benchmark framework consisting of datasets and three evaluation tasks to measure LLMs' ability to perform interdisciplinary research.

  6. CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation

    cs.CL 2025-05 unverdicted novelty 7.0

    CHIMERA is the first large-scale mined KB of concept recombinations from scientific literature, created via a new IE task and LLM extraction, with demonstrated uses in pattern analysis and hypothesis generation.

  7. When AI reviews science: Can we trust the referee?

    cs.AI 2026-04 unverdicted novelty 6.0

    AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference sub...

  8. Attribution Gradients: Incrementally Unfolding Citations for Critical Examination of Attributed AI Answers

    cs.HC 2025-10 unverdicted novelty 6.0

    Attribution gradients consolidate citation evidence and enable incremental unfolding of secondary sources, leading to deeper engagement in a lab study of critical reading tasks for AI answers.

  9. "Like Taking the Path of Least Resistance": Exploring the Impact of LLM Interaction on the Creative Process of Programming

    cs.HC 2026-05 conditional novelty 5.0

    LLM assistance shortens idea-generation periods and reduces creative moments during programming tasks while yielding solutions with comparable idea counts and greater functional correctness.

  10. AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery

    cs.AI 2026-05 unverdicted novelty 4.0

    A survey organizing AI-powered research automation into five workflow stages, defining AutoResearch and Vibe Research, and proposing five evaluation dimensions while noting domain-conditioned limits on autonomy.

  11. Omakase: proactive assistance with actionable suggestions for evolving scientific research projects

    cs.HC 2026-04 unverdicted novelty 4.0

    Omakase monitors project documents to infer timely queries and distills research reports into actionable suggestions that users rated significantly more useful than raw reports.

  12. Evolving Roles of LLMs in Scientific Innovation: Assistant, Collaborator, Scientist, and Evaluator

    cs.DL 2025-07 unverdicted novelty 4.0

    The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.

Reference graph

Works this paper leans on

135 extracted references · 135 canonical work pages · cited by 12 Pith papers · 4 internal anchors

  1. [1]

    Abdelrahman Abdallah, Bhawna Piryani, Jamshid Mozafari, Mohammed Ali, and Adam Jatowt. 2025. Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation. https://api.semanticscholar. org/CorpusID:276107364

  2. [2]

    Afra Feyza Akyürek, Ekin Akyürek, Leshem Choshen, Derry Wijaya, and Jacob Andreas. 2024. Deductive closure training of language models for coherence, accuracy, and updatability. arXiv preprint arXiv:2401.08574 (2024)

  3. [3]

    Shm Garanganao Almeda, JD Zamfirescu-Pereira, Kyu Won Kim, Pradeep Mani Rathnam, and Bjoern Hartmann. 2024. Prompting for discovery: Flex- ible sense-making for ai art-making with dreamsheets. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems . 1–17

  4. [4]

    Jinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, and Sung Ju Hwang. 2024. Researchagent: Iterative research idea generation over scientific literature with large language models. arXiv preprint arXiv:2404.07738 (2024)

  5. [5]

    Davide Baldelli, Junfeng Jiang, Akiko Aizawa, and Paolo Torroni. 2024. TWOLAR: A TWO-Step LLM-Augmented Distillation Method for Passage Reranking. ArXiv abs/2403.17759 (2024). https://api.semanticscholar.org/CorpusID:268691914

  6. [6]

    Lutz Bornmann and Rüdiger Mutz. 2015. Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the association for information science and technology 66, 11 (2015), 2215–2222

  7. [7]

    Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77–101

  8. [8]

    Joel Chan, Joseph Chee Chang, Tom Hope, Dafna Shahaf, and Aniket Kittur. 2018. Solvent: A mixed initiative system for finding analogies between research papers. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1–21

  9. [9]

    Liuqing Chen, Yuan Zhang, Ji Han, Lingyun Sun, Peter Childs, and Boheng Wang. 2024. A foundation model enhanced approach for generative design in combinational creativity. Journal of Engineering Design 35, 11 (2024), 1394–1420

  10. [10]

    Alan Y Cheng, Meng Guo, Melissa Ran, Arpit Ranasaria, Arjun Sharma, Anthony Xie, Khuyen N Le, Bala Vinaithirthan, Shihe Luan, David Thomas Henry Wright, et al. 2024. Scientific and fantastical: Creating immersive, culturally relevant learning experiences with augmented reality and large language models. In Proceedings of the 2024 CHI Conference on Human F...

  11. [11]

    Erin Cherry and Celine Latulipe. 2014. Quantifying the creativity support of digital tools through the creativity support index.ACM Transactions on Computer- Human Interaction (TOCHI) 21, 4 (2014), 1–25

  12. [12]

    DaEun Choi, Sumin Hong, Jeongeon Park, John Joon Young Chung, and Juho Kim. 2024. CreativeConnect: Supporting Reference Recombination for Graphic Design Ideation with Generative AI. In Proceedings of the CHI Conference on Human Factors in Computing Systems . 1–25

  13. [13]

    Seulgi Choi, Hyewon Lee, Yoonjoo Lee, and Juho Kim. 2024. VIVID: Human-AI Collaborative Authoring of Vicarious Dialogues from Lecture Videos. In Pro- ceedings of the 2024 CHI Conference on Human Factors in Computing Systems . 1–26

  14. [14]

    Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, and Daniel S. Weld

  15. [15]

    ArXiv abs/2004.07180 (2020)

    SPECTER: Document-level Representation Learning using Citation- informed Transformers. ArXiv abs/2004.07180 (2020). https://api.semanticscholar. org/CorpusID:215768677

  16. [16]

    Arthur Cropley. 2006. In praise of convergent thinking. Creativity research journal 18, 3 (2006), 391–404

  17. [17]

    Mike D’Arcy, Tom Hope, Larry Birnbaum, and Doug Downey. 2024. Marg: Multi- agent review generation for scientific papers. arXiv preprint arXiv:2401.04259 (2024)

  18. [18]

    Douglas L Dean, Jill Hender, Tom Rodgers, and Eric Santanen. 2006. Identifying good ideas: constructs and scales for idea evaluation. Journal of Association for Information Systems 7, 10 (2006), 646–699

  19. [19]

    Karl Duncker and Lynne S Lees. 1945. On problem-solving. Psychological mono- graphs 58, 5 (1945), i

  20. [20]

    James Enouen, Hootan Nakhost, Sayna Ebrahimi, Sercan O Arik, Yan Liu, and Tomas Pfister. 2023. Textgenshap: Scalable post-hoc explanations in text genera- tion with long documents. arXiv preprint arXiv:2312.01279 (2023)

  21. [21]

    Jingtong Gao, Bo Chen, Xiangyu Zhao, Weiwen Liu, Xiangyang Li, Yichao Wang, Zijian Zhang, Wanyu Wang, Yuyang Ye, Shanru Lin, Huifeng Guo, and Ruim- ing Tang. 2024. LLM-enhanced Reranking in Recommender Systems. ArXiv abs/2406.12433 (2024). https://api.semanticscholar.org/CorpusID:270562015

  22. [22]

    Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, et al. 2025. Towards an AI co-scientist. arXiv preprint arXiv:2502.18864 (2025)

  23. [23]

    Tianyang Gu, Jingjin Wang, Zhihao Zhang, and HaoHong Li. 2024. LLMs can realize combinatorial creativity: generating creative ideas via LLMs for scientific research. arXiv preprint arXiv:2412.14141 (2024)

  24. [24]

    Yuling Gu, Oyvind Tafjord, and Peter Clark. 2023. Digital socrates: Evaluating llms through explanation critiques. arXiv preprint arXiv:2311.09613 (2023)

  25. [25]

    Hua Guo and David H Laidlaw. 2018. Topic-based exploration and embedded visualizations for research idea generation. IEEE transactions on visualization and computer graphics 26, 3 (2018), 1592–1607

  26. [26]

    Tarun Gupta and Danish Pruthi. 2025. All that glitters is not novel: Plagiarism in ai generated research. arXiv preprint arXiv:2502.16487 (2025)

  27. [27]

    1996.Mental leaps: Analogy in creative thought

    Keith J Holyoak and Paul Thagard. 1996.Mental leaps: Analogy in creative thought. MIT press

  28. [28]

    Tom Hope, Joel Chan, Aniket Kittur, and Dafna Shahaf. 2017. Accelerating innovation through analogy mining. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining . 235–243

  29. [29]

    Tom Hope, Doug Downey, Daniel S Weld, Oren Etzioni, and Eric Horvitz. 2023. A computational inflection for scientific discovery. Commun. ACM 66, 8 (2023), 62–73

  30. [30]

    Tom Hope, Ronen Tamari, Daniel Hershcovich, Hyeonsu B Kang, Joel Chan, Aniket Kittur, and Dafna Shahaf. 2022. Scaling creative inspiration with fine- grained functional aspects of ideas. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems . 1–15. Scideator: Human-LLM Scientific Idea Generation and Novelty Evaluation Grounded in...

  31. [31]

    Peter Jansen, Oyvind Tafjord, Marissa Radensky, Pao Siangliulue, Tom Hope, Bhavana Dalvi Mishra, Bodhisattwa Prasad Majumder, Daniel S Weld, and Peter Clark. 2025. CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation. arXiv preprint arXiv:2503.22708 (2025)

  32. [32]

    Arif E Jinha. 2010. Article 50 million: an estimate of the number of scholarly articles in existence. Learned publishing 23, 3 (2010), 258–263

  33. [33]

    Hyeonsu B Kang, David Chuan-En Lin, Nikolas Martelaro, Aniket Kittur, Yan- Ying Chen, and Matthew K Hong. 2024. BioSpark: An End-to-End Generative System for Biological-Analogical Inspirations and Ideation. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems . 1–13

  34. [34]

    Hyeonsu B Kang, Xin Qian, Tom Hope, Dafna Shahaf, Joel Chan, and Aniket Kittur. 2022. Augmenting scientific creativity with an analogical search engine. ACM Transactions on Computer-Human Interaction 29, 6 (2022), 1–36

  35. [35]

    James C Kaufman and Robert J Sternberg. 2010. The Cambridge handbook of creativity. Cambridge University Press

  36. [36]

    DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

    Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts. 2023. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. arXiv preprint arXiv:2310.03714 (2023)

  37. [37]

    Kinney, C

    Rodney Michael Kinney, Chloe Anastasiades, Russell Authur, Iz Beltagy, Jonathan Bragg, Alexandra Buraczynski, Isabel Cachola, Stefan Candra, Yoganand Chan- drasekhar, Arman Cohan, Miles Crawford, Doug Downey, Jason Dunkelberger, Oren Etzioni, Rob Evans, Sergey Feldman, Joseph Gorney, David W. Graham, F.Q. Hu, Regan Huff, Daniel King, Sebastian Kohlmeier, ...

  38. [38]

    Dan Lahav, Jon Saad Falcon, Bailey Kuehl, Sophie Johnson, Sravanthi Parasa, Noam Shomron, Duen Horng Chau, Diyi Yang, Eric Horvitz, Daniel S Weld, et al

  39. [39]

    In Proceedings of the AAAI Conference on Artificial Intelligence, Vol

    A search engine for discovery of scientific challenges and directions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 11982–11990

  40. [40]

    Pier Luca Lanzi and Daniele Loiacono. 2023. Chatgpt and other large language models as evolutionary engines for online interactive collaborative game design. In Proceedings of the Genetic and Evolutionary Computation Conference . 1383– 1390

  41. [41]

    Joanne Leong, Pat Pataranutaporn, Valdemar Danry, Florian Perteneder, Yaoli Mao, and Pattie Maes. 2024. Putting things into context: Generative AI-enabled context personalization for vocabulary learning improves learning motivation. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems . 1–15

  42. [42]

    Weixin Liang, Yuhui Zhang, Hancheng Cao, Binglu Wang, Daisy Yi Ding, Xinyu Yang, Kailas Vodrahalli, Siyu He, Daniel Scott Smith, Yian Yin, et al. 2024. Can large language models provide useful feedback on research papers? A large-scale empirical analysis. NEJM AI 1, 8 (2024), AIoa2400196

  43. [43]

    Yiren Liu, Si Chen, Haocong Cheng, Mengxia Yu, Xiao Ran, Andrew Mo, Yiliu Tang, and Yun Huang. 2024. How ai processing delays foster creativity: Exploring research question co-creation with an llm-based agent. In Proceedings of the CHI Conference on Human Factors in Computing Systems . 1–25

  44. [44]

    Yiren Liu, Pranav Sharma, Mehul Jitendra Oswal, Haijun Xia, and Yun Huang

  45. [45]

    arXiv preprint arXiv:2409.12538 (2024)

    Personaflow: Boosting research ideation with llm-simulated expert per- sonas. arXiv preprint arXiv:2409.12538 (2024)

  46. [46]

    Yiren Liu, Mengxia Yu, Meng Jiang, and Yun Huang. 2023. Creative Research Question Generation for Human-Computer Interaction Research.. In IUI Work- shops. 58–66

  47. [47]

    Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha

  48. [49]

    The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

    Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob N. Foerster, Jeff Clune, and David Ha. 2024. The AI Scientist: Towards Fully Automated Open-Ended Scientific Dis- covery. ArXiv abs/2408.06292 (2024). https://api.semanticscholar.org/CorpusID: 271854887

  49. [50]

    Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, and Maarten de Rijke. 2024. Ranked List Truncation for Large Language Model- based Re-Ranking. ArXiv abs/2404.18185 (2024). https://api.semanticscholar.org/ CorpusID:269449617

  50. [51]

    Louie Meyer, Johanne Engel Aaen, Anitamalina Regitse Tranberg, Peter Kun, Matthias Freiberger, Sebastian Risi, and Anders Sundnes Løvlie. 2024. Algorithmic ways of seeing: Using object detection to facilitate art exploration. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems . 1–18

  51. [52]

    Sheshera Mysore, Arman Cohan, and Tom Hope. 2022. Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity. InProceed- ings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 4453–4470

  52. [53]

    Sheshera Mysore, Tim O’Gorman, Andrew McCallum, and Hamed Zamani. [n. d.]. CSFCube–A Test Collection of Computer Science Research Articles for Faceted Query by Example. ([n. d.])

  53. [54]

    Harshit Nigam, Manasi Patwardhan, Lovekesh Vig, and Gautam Shroff. 2024. Acceleron: A Tool to Accelerate Research Ideation.arXiv preprint arXiv:2403.04382 (2024)

  54. [55]

    Harshit Nigam, Manasi Patwardhan, Lovekesh Vig, and Gautam Shroff. 2024. An Interactive Co-Pilot for Accelerated Research Ideation. In Proceedings of the Third Workshop on Bridging Human–Computer Interaction and Natural Language Processing. 60–73

  55. [56]

    Baharan Nouriinanloo and Maxime Lamothe. 2024. Re-Ranking Step by Step: Investigating Pre-Filtering for Re-Ranking with Large Language Models. ArXiv abs/2406.18740 (2024). https://api.semanticscholar.org/CorpusID:270764517

  56. [57]

    Jeongseok Oh, Seungju Kim, and Seungjun Kim. 2024. LumiMood: A Creativity Support Tool for Designing the Mood of a 3D Scene. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems . 1–21

  57. [58]

    OpenReview. [n. d.]. OpenReview. https://openreview.net/

  58. [59]

    Jason Portenoy, Marissa Radensky, Jevin D West, Eric Horvitz, Daniel S Weld, and Tom Hope. 2022. Bursting scientific filter bubbles: Boosting innovation via novel author discovery. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–13

  59. [60]

    Kevin Pu, KJ Feng, Tovi Grossman, Tom Hope, Bhavana Dalvi Mishra, Matt Latzke, Jonathan Bragg, Joseph Chee Chang, and Pao Siangliulue. 2024. IdeaSynth: Itera- tive Research Idea Development Through Evolving and Composing Idea Facets with Literature-Grounded Feedback. arXiv preprint arXiv:2410.04025 (2024)

  60. [61]

    A Terry Purcell and John S Gero. 1996. Design and other types of fixation.Design studies 17, 4 (1996), 363–383

  61. [62]

    Abhilasha Ravichander, Shrusti Ghela, David Wadden, and Yejin Choi. 2025. HALoGEN: Fantastic LLM Hallucinations and Where to Find Them.arXiv preprint arXiv:2501.08292 (2025)

  62. [63]

    Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:201646309

  63. [64]

    Mark A Runco et al . 2010. Divergent thinking, creativity, and ideation. The Cambridge handbook of creativity 413 (2010), 446

  64. [66]

    Chenglei Si, Diyi Yang, and Tatsunori Hashimoto. 2024. Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers. ArXiv abs/2409.04109 (2024). https://api.semanticscholar.org/CorpusID:272463952

  65. [67]

    Dean Keith Simonton. 2021. Scientific Creativity: Discovery and Invention as Combinatorial. Frontiers in Psychology 12 (2021). https://api.semanticscholar. org/CorpusID:237262181

  66. [68]

    Arvind Srinivasan and Joel Chan. 2024. Improving Selection of Analogical Inspirations through Chunking and Recombination. In Proceedings of the 16th Conference on Creativity & Cognition . 374–397

  67. [69]

    Sangho Suh, Meng Chen, Bryan Min, Toby Jia-Jun Li, and Haijun Xia. 2024. Luminate: Structured Generation and Exploration of Design Space with Large Language Models for Human-AI Co-Creation. InProceedings of the CHI Conference on Human Factors in Computing Systems . 1–26

  68. [70]

    Lu Sun, Aaron Chan, Yun Seo Chang, and Steven P Dow. 2024. ReviewFlow: Intelligent Scaffolding to Support Academic Peer Reviewing. In Proceedings of the 29th International Conference on Intelligent User Interfaces . 120–137

  69. [71]

    Weiwei Sun, Lingyong Yan, Xinyu Ma, Pengjie Ren, Dawei Yin, and Zhaochun Ren. 2023. Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent. ArXiv abs/2304.09542 (2023). https://api.semanticscholar. org/CorpusID:258212638

  70. [72]

    Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. 2023. Is ChatGPT Good at Search? Investi- gating Large Language Models as Re-Ranking Agents. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing . 14918–14937

  71. [73]

    P Thagard. 2012. The cognitive science of science: Explanation, discovery, and conceptual change. The MIT Press

  72. [74]

    Jianyou Wang, Kaicheng Wang, Xiaoyue Wang, Prudhviraj Naidu, Leon Bergen, and Ramamohan Paturi. 2023. DORIS-MAE: scientific document retrieval using multi-level aspect-based queries. In Proceedings of the 37th International Confer- ence on Neural Information Processing Systems . 38404–38419

  73. [75]

    Meiyun Wang, Kiyoshi Izumi, and Hiroki Sakaji. 2024. LLMFactor: Extracting profitable factors through prompts for explainable stock movement prediction. arXiv preprint arXiv:2406.10811 (2024)

  74. [76]

    Qingyun Wang, Doug Downey, Heng Ji, and Tom Hope. 2023. Scimon: Scientific inspiration machines optimized for novelty. arXiv preprint arXiv:2305.14259 (2023)

  75. [77]

    Hongji Yang, Delin Jing, and Lu Zhang. 2016. Creative Computing: an approach to knowledge combination for creativity?. In 2016 IEEE Symposium on Service- Oriented System Engineering (SOSE) . IEEE, 407–414. , , Radensky et al

  76. [78]

    TextGrad: Automatic "Differentiation" via Text

    Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Zhi Huang, Carlos Guestrin, and James Zou. 2024. TextGrad: Automatic" Differentiation" via Text. arXiv preprint arXiv:2406.07496 (2024)

  77. [79]

    "" {text}

    Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, Lifeng Jin, Linfeng Song, Haitao Mi, and Helen Meng. 2024. Self-alignment for factuality: Mitigating hallucinations in llms via self-evaluation. arXiv preprint arXiv:2402.09267 (2024). A PROMPTS FOR ANALOGOUS PAPER FACET FINDER A.1 Prompt to extract facets from a paper title/abstract. def promptTextToPur...

  78. [86]

    No referencing the purpose in the evaluation facet. Examples of bad vs good purposes: - bad (too specific): to generate creative writing activities for third-grade English lessons --> good: to support elementary creative writing↩→ - bad (too broad): to support healthcare --> good: to provide clinical decision support↩→ - bad (more than one purpose that ar...

  79. [91]

    "" return prompt A.2 Prompt to retrieve facets from papers associated with an analogous query. def promptFacetsFromQueryPapers(papers, corpus_ids, query, type=

    Do NOT reuse the words already in the facet. Examples of bad vs good definitions: - facet: longitudinal study. bad: a study that evaluates the tool Toolio over the course of a year --> good: a study that takes place over a long period of time extending at least multiple days ↩→ ↩→ - facet: Toolio for creative writing. bad: Toolio implements SLM for genera...

  80. [92]

    Specific enough to be helpful in coming up with research ideas

Showing first 80 references.