pith. machine review for the scientific record. sign in

arxiv: 2510.17795 · v3 · submitted 2025-10-20 · 💻 cs.CL · cs.AI· cs.LG· cs.MA· cs.SE

Recognition: unknown

What Makes AI Research Replicable? Executable Knowledge Graphs as Scientific Knowledge Representations

Authors on Pith no claims yet
classification 💻 cs.CL cs.AIcs.LGcs.MAcs.SE
keywords knowledgecodeexecutableresearchapproachesgraphsrepresentationsscientific
0
0 comments X
read the original abstract

Replicating AI research is a crucial yet challenging task for large language model (LLM) agents. Existing approaches often struggle to generate executable code, primarily due to insufficient background knowledge and the limitations of retrieval-augmented generation (RAG) methods, which fail to capture latent technical details hidden in referenced papers. Furthermore, previous approaches tend to overlook valuable implementation-level code signals and lack structured knowledge representations that support multi-granular retrieval and reuse. To overcome these challenges, we propose Executable Knowledge Graphs (xKG), a pluggable, paper-centric knowledge base that automatically integrates code snippets and technical insights extracted from scientific literature. When integrated into three agent frameworks with two different LLMs, xKG shows substantial performance gains (10.9% with o3-mini) on PaperBench, demonstrating its effectiveness as a general and extensible solution for automated AI research replication. Code is available at https://github.com/zjunlp/xKG.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Scaling Human-AI Coding Collaboration Requires a Governable Consensus Layer

    cs.SE 2026-04 unverdicted novelty 5.0

    Agentic Consensus replaces code as the main artifact with a typed property graph world model that maintains commitments and evidence through synchronization operators, shifting evaluation to alignment fidelity and con...