Knows uses a YAML sidecar specification to provide structured, agent-consumable representations of research papers, yielding large accuracy gains for small LLMs on comprehension tasks and rapid community adoption via a public hub.
Dolphin: Closed-loop open-ended auto-research through thinking, practice, and feedback
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
TREX automates the LLM training lifecycle via collaborative agents and tree-based exploration, delivering consistent performance gains across 10 real-world fine-tuning tasks in FT-Bench.
TusoAI is an LLM-based agent that builds and iteratively optimizes domain-specific computational methods for scientific data analysis, outperforming expert baselines on RNA-seq denoising and earth monitoring while reporting new genetic associations.
HalluScan benchmark evaluates hallucination detection in LLMs, reporting NLI Verification at AUROC 0.88 and introducing HalluScore (r=0.41 with humans) plus Adaptive Detection Routing for 2x cost savings.
The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.
citing papers explorer
-
Knows: Agent-Native Structured Research Representations
Knows uses a YAML sidecar specification to provide structured, agent-consumable representations of research papers, yielding large accuracy gains for small LLMs on comprehension tasks and rapid community adoption via a public hub.
-
TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration
TREX automates the LLM training lifecycle via collaborative agents and tree-based exploration, delivering consistent performance gains across 10 real-world fine-tuning tasks in FT-Bench.
-
TusoAI: Agentic Optimization for Scientific Methods
TusoAI is an LLM-based agent that builds and iteratively optimizes domain-specific computational methods for scientific data analysis, outperforming expert baselines on RNA-seq denoising and earth monitoring while reporting new genetic associations.
-
HalluScan: A Systematic Benchmark for Detecting and Mitigating Hallucinations in Instruction-Following LLMs
HalluScan benchmark evaluates hallucination detection in LLMs, reporting NLI Verification at AUROC 0.88 and introducing HalluScore (r=0.41 with humans) plus Adaptive Detection Routing for 2x cost savings.
-
Evolving Roles of LLMs in Scientific Innovation: Assistant, Collaborator, Scientist, and Evaluator
The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.