hub

Talm: Tool augmente d language models

Aaron Parisi, Y ao Zhao, Noah Fiedel · 2025 · arXiv 2205.12255

19 Pith papers cite this work. Polarity classification is still indexing.

19 Pith papers citing it

read on arXiv browse 19 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use

cs.AI · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

Model-adaptive tool necessity shows 26-54% mismatch with actual tool calls across LLMs, driven by nearly orthogonal hidden-state signals for cognition versus action.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

Single-agent systems with tools provide the optimal performance-efficiency trade-off for small language models, outperforming base models and multi-agent setups.

Efficient numeracy in language models through single-token number embeddings

cs.LG · 2025-10-08 · unverdicted · novelty 7.0

BitTokens represent numbers as single tokens via IEEE 754 binary format, allowing small language models to learn basic arithmetic algorithms nearly perfectly.

ViperGPT: Visual Inference via Python Execution for Reasoning

cs.CV · 2023-03-14 · unverdicted · novelty 7.0

ViperGPT generates executable Python code to compose pre-trained vision-and-language modules into programs that answer visual queries, reaching state-of-the-art results with no additional training.

RooAgent: An LLM Agent for Root-Based High Energy Physics Analysis

hep-ph · 2026-05-17 · unverdicted · novelty 6.0

RooAgent provides an LLM agent interface that translates natural-language prompts into calls to PyROOT analysis functions for high energy physics tasks, with support for multiple AI backends and tested on ZH simulations and ATLAS open data.

Trace-Level Analysis of Information Contamination in Multi-Agent Systems

cs.AI · 2026-04-30 · unverdicted · novelty 6.0

Agent workflows can diverge substantially from contaminated inputs yet recover correct answers, or stay similar while failing, as measured by trace divergence on GAIA tasks.

Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts

cs.CL · 2026-04-09 · conditional · novelty 6.0

Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-parameter model on the full dataset.

CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning

cs.AI · 2026-01-19 · unverdicted · novelty 6.0

CURE-MED pairs a new 13-language medical reasoning benchmark with curriculum RL to raise logical correctness to 70% and language consistency to 95% at 32B scale while outperforming baselines.

NaviAgent: Bilevel Planning on Tool Navigation Graph for Large-Scale Orchestration

cs.AI · 2025-06-24 · unverdicted · novelty 6.0

NaviAgent decouples task planning from tool execution via a Tool World Navigation Model graph to improve scalability and success rates in LLM agents handling large tool ecosystems.

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

cs.CR · 2024-04-19 · unverdicted · novelty 6.0

Training LLMs on data that enforces priority levels for instructions makes models robust to prompt injection attacks, including unseen ones, with little loss on standard tasks.

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

cs.CL · 2023-09-29 · conditional · novelty 6.0

ToRA trains language models on interactive tool-use trajectories with imitation learning and output shaping to integrate reasoning and external tools, yielding 13-19% gains on math datasets and new highs like 44.6% on MATH for a 7B model.

Cognitive Architectures for Language Agents

cs.AI · 2023-09-05 · accept · novelty 6.0

CoALA is a modular cognitive architecture for language agents that organizes memory components, action spaces for internal and external interaction, and a generalized decision-making loop to support more systematic development of capable agents.

ART: Automatic multi-step reasoning and tool-use for large language models

cs.CL · 2023-03-16 · unverdicted · novelty 6.0

ART automatically generates multi-step reasoning programs with tool integration for LLMs, yielding substantial gains over few-shot and auto-CoT prompting on BigBench and MMLU while matching hand-crafted CoT on most tasks.

Your LLM Agents are Temporally Blind: The Misalignment Between Tool Use Decisions and Human Time Perception

cs.CL · 2025-10-27 · unverdicted · novelty 5.0

LLM agents exhibit temporal blindness, achieving no better than 65% normalized alignment with human preferences on tool-use decisions across time-sensitive scenarios in the new TicToc dataset.

Tracing the ongoing emergence of human-like reasoning in Large Language Models

cs.CL · 2026-05-20 · unverdicted · novelty 4.0

LLMs function as accurate semantic processors for conditionals but do not replicate the pragmatic inferences that define human reasoning.

A Survey on Multimodal Large Language Models

cs.CV · 2023-06-23 · accept · novelty 3.0

This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.

Bridging Language Models and Financial Analysis

q-fin.ST · 2025-03-14 · unverdicted · novelty 2.0

A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.

A Comprehensive Overview of Large Language Models

cs.CL · 2023-07-12 · unverdicted · novelty 2.0

A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.

citing papers explorer

Showing 19 of 19 citing papers.

Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use cs.AI · 2026-05-13 · unverdicted · none · ref 22 · 2 links
Model-adaptive tool necessity shows 26-54% mismatch with actual tool calls across LLMs, driven by nearly orthogonal hidden-state signals for cognition versus action.
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment cs.CL · 2026-05-08 · unverdicted · none · ref 36
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms cs.CL · 2026-04-21 · unverdicted · none · ref 18
Single-agent systems with tools provide the optimal performance-efficiency trade-off for small language models, outperforming base models and multi-agent setups.
Efficient numeracy in language models through single-token number embeddings cs.LG · 2025-10-08 · unverdicted · none · ref 16
BitTokens represent numbers as single tokens via IEEE 754 binary format, allowing small language models to learn basic arithmetic algorithms nearly perfectly.
ViperGPT: Visual Inference via Python Execution for Reasoning cs.CV · 2023-03-14 · unverdicted · none · ref 41
ViperGPT generates executable Python code to compose pre-trained vision-and-language modules into programs that answer visual queries, reaching state-of-the-art results with no additional training.
RooAgent: An LLM Agent for Root-Based High Energy Physics Analysis hep-ph · 2026-05-17 · unverdicted · none · ref 5
RooAgent provides an LLM agent interface that translates natural-language prompts into calls to PyROOT analysis functions for high energy physics tasks, with support for multiple AI backends and tested on ZH simulations and ATLAS open data.
Trace-Level Analysis of Information Contamination in Multi-Agent Systems cs.AI · 2026-04-30 · unverdicted · none · ref 25
Agent workflows can diverge substantially from contaminated inputs yet recover correct answers, or stay similar while failing, as measured by trace divergence on GAIA tasks.
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts cs.CL · 2026-04-09 · conditional · none · ref 67
Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-parameter model on the full dataset.
CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning cs.AI · 2026-01-19 · unverdicted · none · ref 36
CURE-MED pairs a new 13-language medical reasoning benchmark with curriculum RL to raise logical correctness to 70% and language consistency to 95% at 32B scale while outperforming baselines.
NaviAgent: Bilevel Planning on Tool Navigation Graph for Large-Scale Orchestration cs.AI · 2025-06-24 · unverdicted · none · ref 1
NaviAgent decouples task planning from tool execution via a Tool World Navigation Model graph to improve scalability and success rates in LLM agents handling large tool ecosystems.
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions cs.CR · 2024-04-19 · unverdicted · none · ref 9
Training LLMs on data that enforces priority levels for instructions makes models robust to prompt injection attacks, including unseen ones, with little loss on standard tasks.
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving cs.CL · 2023-09-29 · conditional · none · ref 31
ToRA trains language models on interactive tool-use trajectories with imitation learning and output shaping to integrate reasoning and external tools, yielding 13-19% gains on math datasets and new highs like 44.6% on MATH for a 7B model.
Cognitive Architectures for Language Agents cs.AI · 2023-09-05 · accept · none · ref 60
CoALA is a modular cognitive architecture for language agents that organizes memory components, action spaces for internal and external interaction, and a generalized decision-making loop to support more systematic development of capable agents.
ART: Automatic multi-step reasoning and tool-use for large language models cs.CL · 2023-03-16 · unverdicted · none · ref 166
ART automatically generates multi-step reasoning programs with tool integration for LLMs, yielding substantial gains over few-shot and auto-CoT prompting on BigBench and MMLU while matching hand-crafted CoT on most tasks.
Your LLM Agents are Temporally Blind: The Misalignment Between Tool Use Decisions and Human Time Perception cs.CL · 2025-10-27 · unverdicted · none · ref 25
LLM agents exhibit temporal blindness, achieving no better than 65% normalized alignment with human preferences on tool-use decisions across time-sensitive scenarios in the new TicToc dataset.
Tracing the ongoing emergence of human-like reasoning in Large Language Models cs.CL · 2026-05-20 · unverdicted · none · ref 49
LLMs function as accurate semantic processors for conditionals but do not replicate the pragmatic inferences that define human reasoning.
A Survey on Multimodal Large Language Models cs.CV · 2023-06-23 · accept · none · ref 193
This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.
Bridging Language Models and Financial Analysis q-fin.ST · 2025-03-14 · unverdicted · none · ref 74
A survey synthesizing recent LLM research and assessing its applicability to financial data analysis.
A Comprehensive Overview of Large Language Models cs.CL · 2023-07-12 · unverdicted · none · ref 27
A survey paper providing an overview of Large Language Models, their background, and recent advances in the field.

Talm: Tool augmente d language models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer