An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.
Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality
11 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 11verdicts
UNVERDICTED 11roles
background 2representative citing papers
The paper defines five AI system categories for public administration and reports that 55% of 91 recent papers leave the system type underspecified while 31% study one type but motivate with another.
LLMs suppress causal caution in practical advisory contexts (rates drop from 91.7-100% to 6.7-18.3%) but recover it with a self-correction prompt (to 71.4-100%).
BlueFin is a new benchmark for LLM agents on financial spreadsheets showing frontier models score below 50% with weaknesses in dynamic correctness.
A queueing model of AI task processing identifies a 'variance wedge' where mean task speed falls but system delay rises due to rework and reduced oversight under congestion.
AI deployment in high-stakes areas requires domain-scoped calibrated verification with monitoring and revocation, using a proposed six-component Verification Coverage standard instead of mechanistic interpretability.
Difference-in-differences analysis around ChatGPT release shows commoditization of labor in AI-exposed job categories on Upwork, with declining human capital importance and rising price importance.
The paper claims that alignment requires treating AI as part of the self through cognitive co-regulation, identifying risks like deskilling and automation bias while drawing on System 0 cognition theory.
AI peer reviewers for POMP analyses show jagged performance: strong on technical error detection and invalid inference but weak on interpretive errors, narrative coherence, and domain-informed critique.
Generative AI adoption in Europe ranges from under 3% to 25%, is steeper for skilled workers in abstract-task jobs and in digitally advanced countries with training, shows a gender gap in exposed roles, and has produced no detectable shift in reported task content so far.
Literature review synthesizing evidence on user skepticism, verification, and reliance with hallucinating AI advisors, noting that output-related cues like warnings show weak effects and that content category has not been experimentally varied.
citing papers explorer
-
AI co-mathematician: Accelerating mathematicians with agentic AI
An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.
-
A Technical Typology of AI Systems in Public Administration
The paper defines five AI system categories for public administration and reports that 55% of 91 recent papers leave the system type underspecified while 31% study one type but motivate with another.
-
When Helpfulness Overrides Causal Caution: Context-Dependent Suppression and Recovery in LLMs
LLMs suppress causal caution in practical advisory contexts (rates drop from 91.7-100% to 6.7-18.3%) but recover it with a self-correction prompt (to 71.4-100%).
-
BlueFin: Benchmarking LLM Agents on Financial Spreadsheets
BlueFin is a new benchmark for LLM agents on financial spreadsheets showing frontier models score below 50% with weaknesses in dynamic correctness.
-
Queue & AI: When Faster Tasks Slow Down the Workflow
A queueing model of AI task processing identifies a 'variance wedge' where mean task speed falls but system delay rises due to rework and reduced oversight under congestion.
-
The Open-Box Fallacy: Why AI Deployment Needs a Calibrated Verification Regime
AI deployment in high-stakes areas requires domain-scoped calibrated verification with monitoring and revocation, using a proposed six-component Verification Coverage standard instead of mechanistic interpretability.
-
Human Capital, AI, and Labor Commoditization
Difference-in-differences analysis around ChatGPT release shows commoditization of labor in AI-exposed job categories on Upwork, with declining human capital importance and rising price importance.
-
Position: AI as Part of Self -- Extending the Mind Requires Cognitive Co-Regulation
The paper claims that alignment requires treating AI as part of the self through cognitive co-regulation, identifying risks like deskilling and automation bias while drawing on System 0 cognition theory.
-
Jagged AI in Scientific Peer Review: Evidence from POMP Data Analysis
AI peer reviewers for POMP analyses show jagged performance: strong on technical error detection and invalid inference but weak on interpretive errors, narrative coherence, and domain-informed critique.
-
From Exposure to Adoption: Generative AI in European Workplaces
Generative AI adoption in Europe ranges from under 3% to 25%, is steeper for skilled workers in abstract-task jobs and in digitally advanced countries with training, shows a gender gap in exposed roles, and has produced no detectable shift in reported task content so far.
-
Hallucinations in Organization-backed AI advisors: Evidence about Skepticism, Verification, and Reliance in Goal-Directed Use
Literature review synthesizing evidence on user skepticism, verification, and reliance with hallucinating AI advisors, noting that output-related cues like warnings show weak effects and that content category has not been experimentally varied.