PAL improves few-shot reasoning accuracy by having LLMs generate executable programs rather than text-based chains of thought, outperforming much larger models on math and logic benchmarks.
hub
(2020), ‘The next decade in ai: Four steps towards robust artificial intelligence’
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 3representative citing papers
RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.
AI benchmarks trap progress by operationalizing assumptions that redefine capabilities around the benchmarks themselves, and Epistematics provides an audit procedure to detect when evaluations cannot discriminate claimed capabilities from proxy behaviors.
Grounding LLMs via node-wise anchors in a traffic scenario taxonomy improves law-scenario matching by 29.1% and derived requirement accuracy by 36.9-38.2% on Chinese laws and 5,897 scenarios, enabling a compliance layer and real-time monitor for AVs.
AI job substitution rates are limited by business risks such as liability and compliance rather than technical capability alone, resulting in high exposure for cognitive roles like data scientists and resilience for physical trades.
ActivationReasoning grounds logical reasoning in LLM latent activations via SAEs to enable structured inference, concept composition, and behavior steering on multi-hop, abstraction, and safety tasks.
AI's compositional reasoning failures originate in psychological learning paradigms that shaped its architectures, and the ReSynth trimodular framework is proposed to embed systematicity structurally.
A causal attribution model is proposed that applies do-operators to quantify component contributions in LLMs' causal reasoning, motivating a fine-tuned model for pairwise causal discovery that combines knowledge and numerical data.
Hugging Face discussions show that access barriers, output quality, and setup complexity are the main user concerns for both general and multimodal LLMs.
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.
LLMs fail to detect hidden harmful intent, allowing systematic bypass of safety mechanisms through framing techniques, with reasoning modes often worsening the issue.
citing papers explorer
-
PAL: Program-aided Language Models
PAL improves few-shot reasoning accuracy by having LLMs generate executable programs rather than text-based chains of thought, outperforming much larger models on math and logic benchmarks.
-
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.
-
The Evaluation Trap: Benchmark Design as Theoretical Commitment
AI benchmarks trap progress by operationalizing assumptions that redefine capabilities around the benchmarks themselves, and Epistematics provides an audit procedure to detect when evaluations cannot discriminate claimed capabilities from proxy behaviors.
-
Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations
Grounding LLMs via node-wise anchors in a traffic scenario taxonomy improves law-scenario matching by 29.1% and derived requirement accuracy by 36.9-38.2% on Chinese laws and 5,897 scenarios, enabling a compliance layer and real-time monitor for AVs.
-
Bounded by Risk, Not Capability: Quantifying AI Occupational Substitution Rates via a Tech-Risk Dual-Factor Model
AI job substitution rates are limited by business risks such as liability and compliance rather than technical capability alone, resulting in high exposure for cognitive roles like data scientists and resilience for physical trades.
-
ActivationReasoning: Logical Reasoning in Latent Activation Spaces
ActivationReasoning grounds logical reasoning in LLM latent activations via SAEs to enable structured inference, concept composition, and behavior steering on multi-hop, abstraction, and safety tasks.
-
How Psychological Learning Paradigms Shaped and Constrained Artificial Intelligence
AI's compositional reasoning failures originate in psychological learning paradigms that shaped its architectures, and the ReSynth trimodular framework is proposed to embed systematicity structurally.
-
Enhancing Causal Reasoning in Large Language Models: A Causal Attribution Model for Precision Fine-Tuning
A causal attribution model is proposed that applies do-operators to quantify component contributions in LLMs' causal reasoning, motivating a fine-tuned model for pairwise causal discovery that combines knowledge and numerical data.
-
An Empirical Study of Perceptions of General LLMs and Multimodal LLMs on Hugging Face
Hugging Face discussions show that access barriers, output quality, and setup complexity are the main user concerns for both general and multimodal LLMs.
-
Agent AI: Surveying the Horizons of Multimodal Interaction
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.
-
Beyond Context: Large Language Models' Failure to Grasp Users' Intent
LLMs fail to detect hidden harmful intent, allowing systematic bypass of safety mechanisms through framing techniques, with reasoning modes often worsening the issue.
- To Use AI as Dice of Possibilities with Timing Computation
- Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions