pith. sign in

arxiv: 2505.20779 · v5 · submitted 2025-05-27 · 💻 cs.CL

CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation

Pith reviewed 2026-05-19 13:54 UTC · model grok-4.3

classification 💻 cs.CL
keywords recombinationknowledge basescientific innovationinformation extractionhypothesis generationAI subfieldscross-disciplinary research
0
0 comments X

The pith

CHIMERA is a large-scale knowledge base that automatically extracts examples of scientific idea recombinations from research papers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces CHIMERA as the first large-scale knowledge base of recombination instances mined from scientific literature. It defines a new information extraction task to identify when papers integrate elements from existing concepts, curates an expert-annotated dataset, and fine-tunes an LLM to extract these instances from AI papers. The KB supports analyzing how scientists recombine ideas across subfields and training models to generate new research directions. Researchers would care if this reveals mechanisms behind innovation and helps in creating novel hypotheses. The extraction method is shown to work in biology as well.

Core claim

We introduce CHIMERA, the first large-scale knowledge base of recombination examples automatically mined from the scientific literature, which enables empirical analysis of how scientists recombine concepts and draw inspiration from different areas, and training of models that propose cross-disciplinary research directions.

What carries the argument

LLM-based extraction model fine-tuned on an expert-annotated dataset to identify recombination instances in papers

If this is right

  • Recombination patterns across AI subfields can be systematically analyzed using the KB.
  • A hypothesis generation model trained on the KB proposes research directions that researchers rate as inspiring.
  • The extraction approach generalizes to domains beyond AI, such as biology.
  • Empirical studies of inspiration and cross-disciplinary idea integration become possible at scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such knowledge bases could help identify promising but underexplored combinations of ideas for future research.
  • Training data from recombinations might improve AI systems for scientific discovery beyond current methods.
  • Tracking changes in recombination patterns over time could reveal shifts in research focus.

Load-bearing premise

The fine-tuned LLM model correctly identifies genuine recombination instances in papers without substantial false positives, false negatives, or biases.

What would settle it

A large-scale manual review by domain experts of extracted instances showing many are not true recombinations or many real ones are missed would disprove the utility of the KB.

Figures

Figures reproduced from arXiv: 2505.20779 by Noy Sternlicht, Tom Hope.

Figure 1
Figure 1. Figure 1: We propose a new task of extracting recom￾binations: examples of how scientists connect ideas in novel ways (a). This data enables applications in re￾search analysis and automated ideation (b). tion: from classic work (Gentner, 1983; Falken￾hainer et al., 1989), to more modern mixed￾initiative systems in the scientific domain (Chan et al., 2018; Kang et al., 2022; Choi et al., 2024; Radensky et al., 2024).… view at source ↗
Figure 2
Figure 2. Figure 2: CHIMERA KB construction and applications. Construction: (1) We use human-annotated recombination examples to fine-tune an LLM for information extraction; (2) the model extracts recombinations from arXiv abstracts to build a large-scale KB. Applications: CHIMERA supports diverse use cases, including computational ideation, exploration of recombination patterns across scientific domains, and meta-scientific … view at source ↗
Figure 3
Figure 3. Figure 3: Recombinations between areas. cs.*, q-bio.nc and math.oc are arXiv categories. Inspirational connections are often cross-domain (Figure 3a), whereas blends tend to occur within the same domain (Figure 3b). Figure 3c zooms in on a few domains, for example, revealing that robotics often draws inspiration from zoology. cs.CL (Computation & Language) cs.CV (Computer Vision) cs.AI (Artificial Intelligence) cs.L… view at source ↗
Figure 4
Figure 4. Figure 4: Prevalent domains inspired by cs.CL concepts (NLP). Note the decrease in within-domain inspiration. gories (as discussed in Section 3.1), our model generalizes well, achieving 94% accuracy in our automatic LLM-as-a-Judge evaluation setup [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Recombination prediction. Given a context string and a query about recombining a graph node, a model [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Researchers find our recombination sugges [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: E2E extraction prompt. {TEXT} is the placeholder for the input abstract text. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: GoLLIE guidelines. data (a total of 135). As [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: E2E ICL prompt. {TEXT} is a placeholder for the abstract text, and {EXAMPLES} for the ICL examples. [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Entity extraction prompt. {TEXT} is a place [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Span similarity prompt. {ENTITY_TYPE} is either "combination-element", "inspiration-source" or "inspiration-target". {TEXT} is a placeholder for the paper’s abstract. {SPAN1}, {SPAN2} are placeholders for the compared spans. B.5 Extraction error analysis We perform analysis over the test set, revealing different sources of error which may inspire fu￾ture improvements. Our focus is on understanding how dif… view at source ↗
Figure 12
Figure 12. Figure 12: Large-scale evaluation prompt. {AB￾STRACT} is a placeholder for the original abstract text. {EXTRACTED_RELATION}, {ENTITY1}, and {EN￾TITY2} are placeholders for the relation type and enti￾ties extracted by our model. GPT-4.1 was prompted with the same examples using an evaluation template aligned with the as￾sessment criteria (see [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: blend domain analysis prompt. {ELEMENTS} is a placeholder for the recombination entities extracted [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: inspiration domain analysis prompt. {INSPIRATION_SOURCE} and {INSPIRATION_TARGET} are [PITH_FULL_IMAGE:figures/full_fig_p023_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Entity postprocessing prompt. {{ABSTRACT}} is a placeholder for the source abstract. {{RECOMBI [PITH_FULL_IMAGE:figures/full_fig_p026_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Context extraction prompt. {{ABSTRACT}} is a placeholder for the input abstract. {{METHODOL [PITH_FULL_IMAGE:figures/full_fig_p027_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Leak detection prompt. as described in [PITH_FULL_IMAGE:figures/full_fig_p028_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Adjusted RankGPT prompt. (Wang et al., 2022) (335M parameters). These mod￾els’ checkpoints predate 2024, meaning they are unfamiliar with our test set. The model receives a query string composed of a context description, a graph entity, and a relation type and returns a ranked list of answers (other graph nodes). We perform HPO (random grid search of 10 trails) to select the number of training epochs, war… view at source ↗
Figure 19
Figure 19. Figure 19: User study guidelines. We request each to fill out a form ask￾ing in what scientific domains they feel com￾fortable reading papers and a short descrip￾tion of their research area. We then used 31 [PITH_FULL_IMAGE:figures/full_fig_p031_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: User study interface. granite-embedding-125m-english to retrieve semantically similar contexts to this description from the relevant arXiv categories. We manually verify that the retrieved contexts match the descrip￾tion and discard examples with poorly extracted in￾formation (e.g., the context begins with "This study reviews the problem of..." instead of directly de￾scribing the source study problem). In… view at source ↗
Figure 21
Figure 21. Figure 21: Comparison of our designate recombination extraction method to alternative approaches. Figure [PITH_FULL_IMAGE:figures/full_fig_p035_21.png] view at source ↗
read the original abstract

A hallmark of human innovation is recombination -- the creation of novel ideas by integrating elements from existing concepts and mechanisms. In this work, we introduce CHIMERA, the first large-scale Knowledge Base (KB) of recombination examples automatically mined from the scientific literature. CHIMERA enables empirical analysis of how scientists recombine concepts and draw inspiration from different areas, and enables training models that propose cross-disciplinary research directions. To construct this KB, we define a new information extraction task: identifying recombination instances in papers. We curate an expert-annotated dataset and use it to fine-tune an LLM-based extraction model, which we apply to a broad corpus of AI papers. We also demonstrate generalization to a biological domain. We showcase the utility of CHIMERA through two applications. First, we analyze patterns of recombination across AI subfields. Second, we train a scientific hypothesis generation model using the KB, showing that it can propose directions that researchers rate as inspiring.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces CHIMERA, the first large-scale knowledge base of scientific idea recombinations automatically mined from the literature. It defines a new information extraction task for identifying recombinations in papers, curates an expert-annotated dataset to fine-tune an LLM-based extractor, applies the model to a broad corpus of AI papers, demonstrates generalization to biology, and showcases utility via (1) analysis of recombination patterns across AI subfields and (2) training a hypothesis generation model whose outputs researchers rate as inspiring.

Significance. If the KB construction is reliable, CHIMERA could enable new quantitative studies of how scientists recombine ideas across domains and support improved AI systems for research ideation. The two applications provide concrete demonstrations of downstream use, including human ratings of generated hypotheses as an external validity check. However, the absence of reported quantitative metrics on annotation quality and extraction accuracy makes it difficult to gauge the robustness of these contributions.

major comments (1)
  1. [Methods / Extraction Model Evaluation] The description of the expert annotation process and the subsequent fine-tuning of the LLM extractor (in the methods and evaluation sections) does not report inter-annotator agreement statistics or held-out precision, recall, or F1 scores for the extraction model. Because the central claims about KB utility rest on the extractor correctly and comprehensively identifying genuine recombinations without substantial false positives, false negatives, or domain biases, these metrics are load-bearing for interpreting both the pattern analysis and the hypothesis-generation results.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by including at least one quantitative indicator of corpus scale or extraction performance to give readers an immediate sense of the resource size and reliability.
  2. [Introduction / Task Definition] Notation for recombination instances and the precise definition of the IE task could be clarified with a short example table early in the paper to aid readers unfamiliar with the new task formulation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights an important aspect of our methods reporting. We address the single major comment below and have revised the manuscript to incorporate the requested quantitative metrics.

read point-by-point responses
  1. Referee: [Methods / Extraction Model Evaluation] The description of the expert annotation process and the subsequent fine-tuning of the LLM extractor (in the methods and evaluation sections) does not report inter-annotator agreement statistics or held-out precision, recall, or F1 scores for the extraction model. Because the central claims about KB utility rest on the extractor correctly and comprehensively identifying genuine recombinations without substantial false positives, false negatives, or domain biases, these metrics are load-bearing for interpreting both the pattern analysis and the hypothesis-generation results.

    Authors: We agree that inter-annotator agreement and held-out extraction metrics are necessary to substantiate the reliability of the mined KB. In the revised manuscript we have expanded the Methods section to describe the expert annotation protocol in greater detail and now report inter-annotator agreement statistics (pairwise agreement and Fleiss’ kappa) computed on the double-annotated subset. We have also added a dedicated evaluation subsection that presents precision, recall, and F1 scores on a held-out test set of expert-annotated recombinations, together with error analysis that addresses potential domain biases. These additions directly support the validity of the downstream pattern analysis and hypothesis-generation experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper constructs CHIMERA via LLM-based extraction from external AI and biology literature after fine-tuning on an expert-annotated dataset. Downstream applications consist of empirical pattern analysis across subfields and training a hypothesis-generation model whose outputs are assessed via independent human researcher ratings for inspiration. These steps operate on external corpus data and external human judgments rather than any fitted parameter, self-referential definition, or self-citation chain that reduces the reported results back to the inputs by construction. No equations, uniqueness theorems, or ansatzes are invoked that would create circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical resource-construction paper with no mathematical derivations or theoretical claims. No free parameters, axioms, or invented entities are load-bearing for the central contribution.

pith-pipeline@v0.9.0 · 5691 in / 1218 out tokens · 114538 ms · 2026-05-19T13:54:49.072101+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 4 internal anchors

  1. [1]

    Joel Chan, Joseph Chee Chang, Tom Hope, Dafna Sha- haf, and Aniket Kittur

    The cambridge handbook of creativity. Joel Chan, Joseph Chee Chang, Tom Hope, Dafna Sha- haf, and Aniket Kittur. 2018. Solvent.Proceedings of the ACM on Human-Computer Interaction, 2:1 – 21. Joel Chan, Katherine Fu, Christian Schunn, Jonathan Cagan, Kristin Wood, and Kenneth Kotovsky. 2011. On the benefits and pitfalls of analogies for innovative design: ...

  2. [2]

    10 DaEun Choi, Sumin Hong, Jeongeon Park, John Joon Young Chung, and Juho Kim

    Visiblends: A flexible workflow for visual blends.Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 10 DaEun Choi, Sumin Hong, Jeongeon Park, John Joon Young Chung, and Juho Kim. 2023. Creative- connect: Supporting reference recombination for graphic design ideation with generative ai.Proceed- ings of the CHI Conference on Huma...

  3. [3]

    LoRA: Low-Rank Adaptation of Large Language Models

    Iter: Iterative transformer-based entity recog- nition and relation extraction. InConference on Em- pirical Methods in Natural Language Processing. Keith J. Holyoak and Paul Thagard. 1994. Mental leaps: Analogy in creative thought. Tom Hope, Joel Chan, Aniket Kittur, and Dafna Sha- haf. 2017. Accelerating innovation through analogy mining. InProceedings o...

  4. [4]

    Constraint relaxation and chunk decomposi- tion in insight problem solving.Journal of Experi- mental Psychology: Learning, Memory, and Cogni- tion, 25(6):1534–1555. 00691. Mario Krenn, Lorenzo Buffoni, Bruno Coutinho, Sagi Eppel, Jacob Gates Foster, Andrew Gritsevskiy, Har- lin Lee, Yichao Lu, João P. Moutinho, Nima Sanjabi, Rishi Sonthalia, Ngoc M. Tran,...

  5. [5]

    11 Céline McKeown

    00117. 11 Céline McKeown. 2014. The cognitive science of sci- ence: explanation, discovery, and conceptual change. Ergonomics, 57:632 – 633. Yan Meng, Liangming Pan, Yixin Cao, and Min-Yen Kan. 2023. Followupqg: Towards information- seeking follow-up question generation. InInterna- tional Joint Conference on Natural Language Pro- cessing. Aakanksha Naik, ...

  6. [6]

    C-Pack: Packed Resources For General Chinese Embeddings

    Scimon: Scientific inspiration machines opti- mized for novelty.ACL. Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, and 1 others. 2022. Chain-of-thought prompting elic- its reasoning in large language models.Advances in Neural Information Processing Systems, 35:24824– 24837. Shitao Xiao, Zheng Liu, Peitian ...

  7. [7]

    Zexuan Zhong and Danqi Chen

    Massw: A new dataset and benchmark tasks for ai-assisted scientific workflows.arXiv preprint arXiv:2406.06357. Zexuan Zhong and Danqi Chen. 2021. A frustratingly easy approach for entity and relation extraction. In North American Chapter of the Association for Com- putational Linguistics. Appendix Contents A Annotator Agreement. . . . . . . . . . . . . . ...

  8. [8]

    reinforce- ment learning which uses traditional time se- 13 Annotators’ Disagreement Examples Abstract:

    Boundary disagreements, where annota- tors selected different spans with overlapping meaning. Here, the expert favored the span that preserved more context (e.g., "reinforce- ment learning which uses traditional time se- 13 Annotators’ Disagreement Examples Abstract: ". . . This research proposed a framework based on Long Short-Term Memory (LSTM) deep lea...

  9. [9]

    These were resolved through further discus- sion and clarification

    Conceptual disagreements, where annota- tors identified fundamentally different entities. These were resolved through further discus- sion and clarification. B Additional Extraction Details B.1 Recombination keywords We use keyword-based filtering to identify works that are more likely to discuss recombination before assigning papers to human annotators. ...

  10. [10]

    inspiration-source: A concept, idea, problem, approach, or domain the authors drew inspiration from

    First, familiarize yourself with the possible entity types for recombinations: <entity_types> combination-element: An idea, method, model, technique, or approach combined in the text with other elements. inspiration-source: A concept, idea, problem, approach, or domain the authors drew inspiration from. inspiration-target: A concept, idea, problem, approa...

  11. [20]

    This paper discusses a recombination since the authors take inspira- tion from [inspiration-source] and implement it in [inspiration-target]

    Now, provide your final output in the specified JSON format. Ensure that the output is a valid JSON string. If the output is empty, return {}. Place your answer within <answer> tags. Remember to carefully analyze the abstract and only identify a recombination if it is clearly present and central to the work described. Figure 7: E2E extraction prompt. {TEX...

  12. [21]

    inspiration-src: A concept, idea, problem, approach, or domain the authors drew inspiration from

    First, familiarize yourself with the possible entity types for recombinations: <entity_types> comb-element: An idea, method, model, technique, or approach combined in the text with other elements. inspiration-src: A concept, idea, problem, approach, or domain the authors drew inspiration from. inspiration-target: A concept, idea, problem, approach, or dom...

  13. [22]

    Review the following examples to understand the expected output format and the process of identifying recombinations: <examples>{EXAMPLES}</examples>

  14. [23]

    Now, carefully read the following scientific abstract: <abstract>{TEXT}</abstract>

  15. [24]

    A recombination can be either: a) Combination: The authors combine two or more ideas, methods, models, techniques, or approaches to obtain a certain goal

    Your task is to extract the most salient recombination from this abstract. A recombination can be either: a) Combination: The authors combine two or more ideas, methods, models, techniques, or approaches to obtain a certain goal. b) Inspiration: The authors draw inspiration or similarities from one concept, idea, problem, approach, or domain and implement...

  16. [25]

    After identifying the recombination, you will format it as a JSON string in the following structure: <recombination>{recombination_type: {entity_type_1: [ent_1, ent_2], entity_type_2: [ent_3],...}}</recombination> If you don't think the text discusses a recombination, or that the recombination is not a central part of the work, return an empty JSON object: {}

  17. [26]

    Before providing your final answer, use the following scratchpad to think through the process: <scratchpad>

  18. [27]

    Identify the main ideas, methods, or approaches discussed in the abstract

  19. [28]

    Determine if there is a clear combination of ideas or if one idea inspired the application in another domain

  20. [29]

    Identify the specific entities involved in the recombination

  21. [30]

    Classify the entities according to the provided entity types

  22. [31]

    </scratchpad>

    Determine the recombination type (combination or inspiration). </scratchpad>

  23. [32]

    Ensure that the output is a valid JSON string

    Now, provide your final output in the specified JSON format. Ensure that the output is a valid JSON string. If the output is empty, return {}. Place your answer within <recombination> tags. Remember to carefully analyze the abstract and only identify a recombination if it is clearly present and central to the work described. Figure 9: E2E ICL prompt. {TEX...

  24. [33]

    comb-element: An idea, method, model, technique, or approach combined in the text with other elements

  25. [34]

    inspiration-src: A concept, idea, problem, approach, or domain the authors drew inspiration from

  26. [35]

    comb-element

    inspiration-target: A concept, idea, problem, approach, or domain in which the authors utilize the inspiration they drew from the inspiration source. Here is the text you need to analyze: <text>{TEXT}</text> Please read the text carefully and identify all entities that belong to the types listed above. Pay close attention to the context and relationships ...

  27. [36]

    To mitigate position bias, we query the model twice per pair with re- versed orderings, accepting a match only if both judgments are positive

    We use it in the extraction evaluation process as discussed in Section 4.1. To mitigate position bias, we query the model twice per pair with re- versed orderings, accepting a match only if both judgments are positive. We prefer GPT-4o-mini over GPT-4o based on a comparison which found only3disagreements across the test set. You are tasked with comparing ...

  28. [37]

    First, read the full text for context: <full_text>{TEXT}</full_text>

  29. [38]

    Now, consider these two spans extracted from the text above: <span1>{SPAN1}</span1> <span2>{SPAN2}</span2>

  30. [39]

    The idea the spans discuss should be exactly the same, up to minor lexical or semantic variations

    Your task is to carefully analyze these two spans and determine if they discuss the same {ENTITY_TYPE}. The idea the spans discuss should be exactly the same, up to minor lexical or semantic variations

  31. [40]

    The main topic or idea presented in each span b

    In your analysis, consider the following: a. The main topic or idea presented in each span b. The context in which these spans appear in the full text c. Any potential contradictions between the spans

  32. [41]

    Explain your reasoning clearly, referencing specific elements from the spans and the full text if necessary

    After your analysis, provide a justification for your determination. Explain your reasoning clearly, referencing specific elements from the spans and the full text if necessary

  33. [42]

    Yes" or

    Based on your analysis and justification, provide a "Yes" or "No" answer to whether the spans discuss the same {ENTITY_TYPE}

  34. [43]

    Yes" or

    Present your response in the following format: <justification>[Your detailed justification here]</justification> <answer>[Your "Yes" or "No" answer here]</answer> Figure 11: Span similarity prompt. {ENTITY_TYPE} is either "combination-element", "inspiration-source" or "inspiration-target". {TEXT} is a placeholder for the paper’s abstract. {SPAN1}, {SPAN2}...

  35. [45]

    insufficient-info

    Identify the scientific branch from the provided branches list. If there's insufficient information, use "insufficient-info". If no branch name in the list describes the source properly, use "other". Return your output in the following format: <output> [{"text": "element1", "arxiv_category": "category1", "scientific_branch": "branch1"}, {"text": "element2...

  36. [46]

    other". If there's insufficient information, use

    Identify the best matching arXiv taxonomy category from the provided list. If it doesn't match any category, use "other". If there's insufficient information, use "insufficient-info"

  37. [47]

    insufficient-info

    Identify the scientific branch from the provided branches list. If there's insufficient information, use "insufficient-info". If no branch name in the list describes the source properly, use "other". Repeat the same process for the inspiration target. Provide your analysis in the following format: <source-branch>[Insert the scientific branch of the inspir...

  38. [48]

    Chain of Thought (CoT)

    Abbreviation Expansion: We standardize en- tities using (a) Regex-based cleaning (e.g., “Chain of Thought (CoT)” → “Chain of Thought”) and (b) Scispacy’s abbreviation ex- pansion mechanism 8, which resolves short- forms to their long-form versions using the document context (e.g., “CoT” → “Chain of Thought”)

  39. [49]

    the snap-through action of a steel hairclip

    Agglomerative Clustering: We group entities by performing agglomerative clustering with cosine similarity on all-mpnet-base-v2 embeddings, us- ing average linkage and a distance threshold of0.05. 8https://github.com/allenai/scispacy# abbreviationdetector C.3 Entity Postprocessing After the initial large-scale extraction, we ap- ply an LLM-based postproces...

  40. [50]

    Large language models (LLMs) commonly employ autoregressive generation during inference, leading to high memory bandwidth demand and consequently extended latency

  41. [51]

    However, existing methods suffer from slow rendering speed, greatly limiting their practical use

    Reconstructing deformable tissues from endoscopic videos is essential in many downstream surgical applications. However, existing methods suffer from slow rendering speed, greatly limiting their practical use

  42. [52]

    Many industrial tasks-such as sanding, installing fasteners, and wire harnessing-are difficult to automate due to task complexity and variability

  43. [53]

    Now, consider this methodology statement: <methodology_statement>{{METHODOLOGY_STATEMENT}}</methodology_statement> To complete this task, follow these steps:

    Multi-legged robots offer enhanced stability in complex terrains, yet autonomously learning natural and robust motions in such environments remains challenging. Now, consider this methodology statement: <methodology_statement>{{METHODOLOGY_STATEMENT}}</methodology_statement> To complete this task, follow these steps:

  44. [54]

    Analyze the abstract thoroughly, focusing on: - The context or reasons that justify the methodology choice - Any challenges, limitations, or research needs the methodology addresses - Mentions of previous research or knowledge gaps that the methodology aims to target

  45. [55]

    - Use exclusively the information from the abstract

    When formulating your response: - Phrase your response as a general 1-2 sentence description of a challenge, limitation research needs, etc. - Use exclusively the information from the abstract. Do not incorporate external knowledge or assumptions. - Minimize including information from the methodology statement in your answer. - Do not include information ...

  46. [56]

    Combine <source-entity> and <target-entity>

    Format your response as follows: <background> [1-2 background sentences] </background> Remember to base your response strictly on the provided abstract and statement. Do not include additional information or assumptions. Figure 16: Context extraction prompt. {{ABSTRACT}} is a placeholder for the input abstract. {{METHODOL- OGY_STATEMENT}} is a sentence de...

  47. [57]

    Read the following query: <query>{{QUERY}}</query>

  48. [58]

    Now, read the corresponding answer: <answer>{{ANSWER}}</answer>

  49. [59]

    Look for words, phrases, or implications in the query that directly relate or reveal information from the answer

    Analyze the query for any information that might disclose the answer. Look for words, phrases, or implications in the query that directly relate or reveal information from the answer

  50. [60]

    no leakage

    Write your analysis in the following format: <analysis> [If you identified a leakage, briefly explain what information from the answer is included in the query. If you did not identify a leakage, write "no leakage".] </analysis>

  51. [61]

    Based on your analysis, determine if there is a leakage

  52. [62]

    yes" if there is a leakage, or

    Provide your response in the following format: <leakage> [Write "yes" if there is a leakage, or "no" if there is no leakage. Do not include any additional explanation or reasoning.] </leakage> Remember, your task is to identify leakages, not to answer the query or explain your reasoning. Stick strictly to the output format provided. Figure 17: Leak detect...

  53. [63]

    the inherent human attribute of engaging in logical rea- soning to facilitate decision-making

  54. [64]

    principles of rational decision-making

  55. [65]

    Post-reranking (top-20)

    the Level-K framework from game theory and behavioral economics, which extends reasoning from simple reactions to structured strategic depth ... Post-reranking (top-20)

  56. [66]

    the Level-K framework from game theory and behavioral economics, which extends reasoning from simple reactions to structured strategic depth

  57. [67]

    Bayesian inference: conditioning a prior on evidence

  58. [68]

    What would be a good source of inspiration forData-driven storytelling?" Pre-reranking (top-20)

    principles of rational decision-making Query: "...while Large Language Models (LLMs) excel in various NLP tasks, their ability to generate comprehensive data stories remains underexplored... What would be a good source of inspiration forData-driven storytelling?" Pre-reranking (top-20)

  59. [69]

    the human storytelling process

  60. [70]

    Interactive digital stories

  61. [71]

    Post-reranking (top-20)

    narrative structure designs ... Post-reranking (top-20)

  62. [72]

    story analysis and generation systems

  63. [73]

    generative artificial intelligence (Gen-AI)-driven narrative personalization

  64. [74]

    narrative structure designs

  65. [75]

    the human storytelling process ... (ii) Semantically similar variants Query: "Prior methods for aligning large language models face challenges in tuning to maximize non-differentiable and non-binary objectives...This highlights a need for a more flexible approach that can generalize to various user preferences... while maintaining alignment... What could ...

  66. [76]

    aligning Large Language Models with human preferences

  67. [78]

    direct preference optimization

  68. [79]

    Post-reranking (top-20)

    State-of-the-art language model fine-tuning techniques, such as Direct Preference Optimization ... Post-reranking (top-20)

  69. [80]

    Direct Preference Optimization for preference alignment

  70. [81]

    State-of-the-art language model fine-tuning techniques, such as Direct Preference Optimization

  71. [82]

    contrastive learning-based methods like Direct Preference Optimization

  72. [83]

    a Semi-Policy Preference Optimization method

  73. [84]

    This study reviews the problem of

    direct preference optimization ... Table 22: Illustrative examples where the reranker preferred a different answer over the gold one. 32 Figure 20: User study interface. granite-embedding-125m-english to retrieve semantically similar contexts to this description from the relevant arXiv categories. We manually verify that the retrieved contexts match the d...

  74. [85]

    BV-MAPP (Verbal Behavior Milestones Assessment and Placement Program)

    Figure 21a presents how general scientific IE schemas lack relation types to model recombina- tions. The figure presents the results of our spe- cialized extraction method besides a transformer- based extraction model (Hennen et al., 2024) fine- tuned on SciERC (Luan et al., 2018), a general IE schema. While our new data schema easily mod- els the recombi...