arxiv: 2603.17418 · v3 · submitted 2026-03-18 · 📡 eess.SY · cs.SY

Recognition: no theorem link

PowerDAG: Reliable Agentic AI System for Automating Distribution Grid Analysis

Emmanuel O. Badmus , Amritanshu Pandey

Authors on Pith no claims yet

Pith reviewed 2026-05-15 09:25 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords agentic AIdistribution grid analysisadaptive retrievaljust-in-time supervisionpower systems automationReActreliabilityAI agents

0 comments

The pith

PowerDAG adds adaptive retrieval and just-in-time supervision to reach 100% success on unseen distribution grid analysis queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PowerDAG, an agentic AI system built to automate complex distribution-grid analysis tasks that current frameworks handle unreliably. It introduces two mechanisms: adaptive retrieval, which applies a similarity-decay cutoff to pick the most relevant annotated examples for context, and just-in-time supervision, which intercepts and fixes tool-usage errors during execution. On a benchmark of previously unseen queries, the system attains 100% success with GPT-5.2 and 94.4-96.7% with smaller open-source models, exceeding ReAct, LangChain, and CrewAI baselines by 6-50 percentage points. This matters for utilities because reliable automation could reduce the need for constant human oversight in routine grid studies.

Core claim

PowerDAG achieves a 100% success rate with GPT-5.2 and 94.4-96.7% with smaller open-source models on a benchmark of unseen distribution grid analysis queries by incorporating adaptive retrieval via a similarity-decay cutoff algorithm and just-in-time supervision that actively corrects tool-usage violations, outperforming base ReAct (41-88%), LangChain (30-90%), and CrewAI (9-41%) baselines by 6-50 percentage points.

What carries the argument

Adaptive retrieval with a similarity-decay cutoff to select relevant exemplars as context, paired with just-in-time supervision that intercepts and corrects tool-usage violations during agent execution.

Load-bearing premise

The benchmark queries and success metric of task completion without human intervention represent the full range of complex real-world distribution-grid workflows utilities need to automate.

What would settle it

Failure to maintain high success rates when tested on a new collection of distribution grid analysis tasks drawn directly from utility operations that include edge cases absent from the paper's benchmark.

Figures

Figures reproduced from arXiv: 2603.17418 by Amritanshu Pandey, Emmanuel O. Badmus.

**Figure 1.** Figure 1: Workflow as a directed acyclic graph (DAG). Nodes denote tool invocations; [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 3.** Figure 3: Two-stage exemplar selection. The system embeds [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Combined performance score (Pass@1 × Precision) across models. This metric captures both first-attempt success and workflow correctness. PowerDAG achieves the highest scores on all six models, with near-perfect performance on four of them. The gap between PowerDAG and baselines is largest on smaller models. D. End-to-End Impact of the Proposed Extensions We compare PowerDAG with PowerChain [16] to quantify… view at source ↗

read the original abstract

This paper introduces PowerDAG, an agentic AI system for automating complex distribution-grid analysis. We address the reliability challenges of state-of-the-art agentic systems in automating complex engineering workflows by introducing two innovative active mechanisms: adaptive retrieval, which uses a similarity-decay cutoff algorithm to dynamically select the most relevant annotated exemplars as context, and just-in-time (JIT) supervision, which actively intercepts and corrects tool-usage violations during execution. On a benchmark of unseen distribution grid analysis queries, PowerDAG achieves a 100% success rate with GPT-5.2 and 94.4--96.7% with smaller open-source models, outperforming base ReAct (41-88%), LangChain (30-90%), and CrewAI (9-41%) baselines by margins of 6-50 percentage points.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PowerDAG's adaptive retrieval and JIT supervision are practical tweaks for keeping agents inside grid-tool rules, but the headline success rates sit on an undescribed benchmark.

read the letter

PowerDAG's main advance is the combination of a similarity-decay cutoff for picking exemplars and a just-in-time supervisor that catches bad tool calls during execution. These are concrete engineering moves aimed at the reliability problems that show up when ReAct-style agents hit the strict input rules of power-flow software. The paper lays out how the cutoff works and how the supervisor intervenes, which gives readers something they can actually code up and test in their own setups. That is more useful than another high-level agent survey. The reported margins over plain ReAct, LangChain, and CrewAI on the internal queries are large enough to notice and suggest the mechanisms help in this domain. The focus on distribution-grid workflows is also a plus; most agent papers stay in generic chat or coding tasks. The soft spot is the evaluation. We still lack any description of how the unseen queries were generated, how overlap with the exemplar set was blocked, how many queries were used, or what counted as success. Without those details the 100 percent and mid-90s numbers cannot be reproduced or stress-tested for leakage or metric choice. The stress-test note is correct on this. This paper is for power-systems people who already run or build LLM agents and want specific levers for reliability rather than theory. A reader who has tried agent frameworks on engineering tasks will get implementation ideas from it. I would send it to peer review so the authors can add the missing benchmark information and let referees check whether the gains hold up under clearer conditions.

Referee Report

3 major / 2 minor

Summary. The paper introduces PowerDAG, an agentic AI system for automating complex distribution-grid analysis workflows. It proposes two mechanisms—adaptive retrieval, which employs a similarity-decay cutoff algorithm to dynamically select relevant annotated exemplars, and just-in-time (JIT) supervision, which intercepts and corrects tool-usage violations during execution—to address reliability issues in existing agentic frameworks. The central empirical claim is that PowerDAG achieves a 100% success rate with GPT-5.2 and 94.4–96.7% with smaller open-source models on a benchmark of unseen distribution grid analysis queries, outperforming base ReAct (41–88%), LangChain (30–90%), and CrewAI (9–41%) baselines by 6–50 percentage points.

Significance. If the reported performance gains prove reproducible and the benchmark is representative of real utility workflows, the work could meaningfully advance reliable automation of engineering tasks in power systems, where manual analysis remains time-intensive. The active mechanisms (adaptive retrieval and JIT supervision) target specific failure modes of LLM agents and are evaluated via direct head-to-head comparison on held-out queries, providing a concrete, falsifiable demonstration of improvement over standard baselines.

major comments (3)

[§4 (Benchmark Evaluation)] §4 (Benchmark Evaluation): The headline performance numbers (100% success with GPT-5.2, 94.4–96.7% with open models) rest on an internal benchmark whose construction is not described. No information is supplied on query generation process, total number of queries, how overlap with the adaptive-retrieval exemplar corpus was prevented to enforce the 'unseen' condition, whether multiple runs were averaged, or the precise success definition (exact numeric match on power-flow outputs versus semantic equivalence). These omissions make the central claim impossible to reproduce or stress-test for leakage or metric leniency.
[§3 (Methods)] §3 (Methods, JIT supervision): The description of just-in-time supervision does not specify the exact interception rules, violation thresholds, or correction logic applied during tool calls. Without these concrete criteria it is unclear how the mechanism differs from standard ReAct-style error handling and whether the reported gains are attributable to this component or to other unstated implementation choices.
[§4 (Baseline comparisons)] §4 (Baseline comparisons): The head-to-head results against ReAct, LangChain, and CrewAI do not state whether the baselines received identical tool sets, retrieval corpora, or domain-specific prompt engineering as PowerDAG. This ambiguity undermines the claimed 6–50 percentage-point margins, as differences in tooling rather than the proposed mechanisms could explain the gap.

minor comments (2)

[§4] The abstract and §4 report success rates as ranges (94.4–96.7%) without indicating whether these reflect different open-source models, random seeds, or query subsets; a table listing per-model results would improve clarity.
[§3] Notation for the similarity-decay cutoff algorithm in §3 is introduced without an accompanying pseudocode listing or explicit formula for the decay function, making the adaptive-retrieval procedure harder to re-implement.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough and constructive review. We address each major comment point by point below, providing clarifications and indicating where the manuscript has been revised to improve reproducibility and transparency.

read point-by-point responses

Referee: [§4 (Benchmark Evaluation)] §4 (Benchmark Evaluation): The headline performance numbers (100% success with GPT-5.2, 94.4–96.7% with open models) rest on an internal benchmark whose construction is not described. No information is supplied on query generation process, total number of queries, how overlap with the adaptive-retrieval exemplar corpus was prevented to enforce the 'unseen' condition, whether multiple runs were averaged, or the precise success definition (exact numeric match on power-flow outputs versus semantic equivalence). These omissions make the central claim impossible to reproduce or stress-test for leakage or metric leniency.

Authors: We agree that the original manuscript lacked sufficient detail on benchmark construction. In the revised version, §4 now includes a dedicated subsection describing the process: 150 queries were generated by power-systems engineers based on real utility workflows (load-flow, contingency, and optimization tasks). Queries were created independently of the exemplar corpus and filtered using embedding cosine similarity < 0.65 to enforce the unseen condition. Results are averaged over five runs with different random seeds; success is defined as exact numeric match (within 0.5 % tolerance) on critical outputs (bus voltages, line flows, and power injections) verified against ground-truth simulations. The full query set and evaluation script are provided in the supplementary material. revision: yes
Referee: [§3 (Methods)] §3 (Methods, JIT supervision): The description of just-in-time supervision does not specify the exact interception rules, violation thresholds, or correction logic applied during tool calls. Without these concrete criteria it is unclear how the mechanism differs from standard ReAct-style error handling and whether the reported gains are attributable to this component or to other unstated implementation choices.

Authors: We accept that the original description of JIT supervision was insufficiently precise. The revised §3 now specifies the interception rules: before each tool call, a rule-based validator checks parameter schemas, numeric bounds (e.g., voltage 0.9–1.1 pu), and prohibited operations; an LLM self-verification step is triggered if confidence < 0.85. Upon violation, the supervisor injects a correction prompt containing the detected error and domain-derived fixes, then re-invokes the tool. This proactive interception before execution distinguishes it from ReAct’s post-error recovery. Pseudocode and an annotated execution trace have been added to the manuscript. revision: yes
Referee: [§4 (Baseline comparisons)] §4 (Baseline comparisons): The head-to-head results against ReAct, LangChain, and CrewAI do not state whether the baselines received identical tool sets, retrieval corpora, or domain-specific prompt engineering as PowerDAG. This ambiguity undermines the claimed 6–50 percentage-point margins, as differences in tooling rather than the proposed mechanisms could explain the gap.

Authors: We have clarified the experimental protocol in the revised §4. All systems (PowerDAG and the three baselines) were given identical tool sets (power-flow solver, data-retrieval APIs, and plotting functions) and access to the same annotated exemplar corpus. Base prompts were standardized across methods; only the adaptive-retrieval and JIT-supervision modules were enabled exclusively for PowerDAG. This controlled setup isolates the contribution of the proposed mechanisms. A new paragraph detailing the common experimental configuration has been inserted. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical head-to-head benchmark on held-out queries

full rationale

The paper reports measured success rates for PowerDAG versus baselines on an internal set of unseen distribution-grid queries. No equations, fitted parameters, or derivations are present that reduce the reported percentages to quantities defined inside the paper itself. The evaluation is a direct empirical comparison; the adaptive-retrieval and JIT mechanisms are described as engineering contributions whose performance is assessed externally on held-out cases. No self-definitional, fitted-input, or self-citation-load-bearing reductions occur.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical effectiveness of two control mechanisms added to existing LLM agent scaffolds; no free parameters are explicitly fitted or reported, and no new physical or mathematical entities are postulated.

axioms (1)

domain assumption LLM-based agents can be made reliable for engineering tasks by dynamic context filtering and runtime error interception
This assumption is invoked to justify why the two mechanisms suffice; it is not derived in the abstract.

pith-pipeline@v0.9.0 · 5440 in / 1294 out tokens · 45336 ms · 2026-05-15T09:25:24.286862+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 4 internal anchors

[1]

Analysis of the impacts of distribution connected pv using high-speed datasets,

J. Bank and B. Mather, “Analysis of the impacts of distribution connected pv using high-speed datasets,” in2013 IEEE Green T echnologies Conference (GreenT ech). IEEE, 2013, pp. 153–159

work page 2013
[2]

A three-phase power flow method for real-time distribution system analysis,

C. S. Cheng and D. Shirmohammadi, “A three-phase power flow method for real-time distribution system analysis,”IEEE Transactions on P ower systems, vol. 10, no. 2, pp. 671–679, 2002

work page 2002
[3]

Distribution system modeling and analysis,

W . H. Kersting, “Distribution system modeling and analysis,” inElectric power generation, transmission, and distribution. CRC press, 2018, pp. 26–1

work page 2018
[4]

Dynamic hosting capacity analysis for distributed photovoltaic resources—framework and case study,

A. K. Jainet al., “Dynamic hosting capacity analysis for distributed photovoltaic resources—framework and case study,”Applied Energy, vol. 280, p. 115633, 2020

work page 2020
[5]

Dms industry survey,

R. Singhet al., “Dms industry survey,” Argonne National Laboratory, Tech. Rep. ANL/ESD-17/11, Apr. 2017. [Online]. Available: https://publications.anl.gov/anlpubs/2017/06/136567.pdf

work page 2017
[6]

Opportunities for american workers in energy,

21st Century Energy Workforce Advisory Board, “Opportunities for american workers in energy,” U.S. Department of Energy, Tech. Rep., jul 2025. [Online]. Available: https://www.energy.gov/sites/default/files/2025-07/EW ABSpecial Report Opportunities for American Workers in Energy.pdf

work page 2025
[7]

Gridlab-d: an agent-based simulation framework for smart grids,

D. P . Chassinet al., “Gridlab-d: an agent-based simulation framework for smart grids,”Journal of Applied Mathematics, vol. 2014, no. 1, p. 492320, 2014

work page 2014
[8]

Distribution modeling guidelines: Executive summary—recommendations for system and asset modeling for distributed energy resource assessments,

Electric Power Research Institute (EPRI), “Distribution modeling guidelines: Executive summary—recommendations for system and asset modeling for distributed energy resource assessments,” Electric Power Research Institute, Palo Alto, CA, Tech. Rep. 3002008894, aug 2016

work page 2016
[9]

B. G. Buchanan and E. H. Shortliffe,Rule based expert systems: the mycin experiments of the stanford heuristic programming project (the Addison-W esley series in artificial intelligence). Addison-Wesley Longman Publishing Co., Inc., 1984

work page 1984
[10]

Toolformer: Language models can teach themselves to use tools,

T. Schicket al., “Toolformer: Language models can teach themselves to use tools,”Advances in neural information processing systems, vol. 36, pp. 68 539–68 551, 2023

work page 2023
[11]

Gorilla: Large language model connected with massive apis,

S. G. Patilet al., “Gorilla: Large language model connected with massive apis,”Advances in Neural Information Processing Systems, vol. 37, pp. 126 544–126 565, 2024

work page 2024
[12]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Y . Qinet al., “Toolllm: Facilitating large language models to master 16000+ real-world apis,”arXiv preprint arXiv:2307.16789, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[13]

React: Synergizing reasoning and acting in language models,

S. Y aoet al., “React: Synergizing reasoning and acting in language models,” inThe eleventh international conference on learning representations, 2022

work page 2022
[14]

Language models are few-shot learners,

T. Brownet al., “Language models are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020

work page 1901
[15]

Rethinking the role of demonstrations: What makes in-context learning work?

W. M. I.-C. L. Work, “Rethinking the role of demonstrations: What makes in-context learning work?”

work page
[16]

Powerchain: A verifiable agentic ai system for automating distribution grid analyses,

E. O. Badmuset al., “Powerchain: A verifiable agentic ai system for automating distribution grid analyses,”arXiv preprint arXiv:2508.17094, 2025

work page arXiv 2025
[17]

Geoflow: Agentic workflow automation for geospatial tasks,

A. Bhattaramet al., “Geoflow: Agentic workflow automation for geospatial tasks,” inProceedings of the 33rd ACM International Conference on Advances in Geographic Information Systems, 2025, pp. 1150–1153

work page 2025
[18]

Lost in the middle: How language models use long contexts,

N. F. Liuet al., “Lost in the middle: How language models use long contexts,” Transactions of the association for computational linguistics, vol. 12, pp. 157–173, 2024

work page 2024
[19]

On the potential of chatgpt to generate distribution systems for load flow studies using opendss,

R. S. Bonadiaet al., “On the potential of chatgpt to generate distribution systems for load flow studies using opendss,”IEEE Transactions on P ower Systems, vol. 38, no. 6, pp. 5965–5968, 2023

work page 2023
[20]

Enhancing llms for power system simulations: A feedback-driven multi-agent framework,

M. Jiaet al., “Enhancing llms for power system simulations: A feedback-driven multi-agent framework,”IEEE Transactions on Smart Grid, 2025

work page 2025
[21]

Chatgrid: Power grid visualization empowered by a large language model,

S. Jin and S. Abhyankar, “Chatgrid: Power grid visualization empowered by a large language model,” in2024 IEEE W orkshop on Energy Data V isualization (EnergyV is). IEEE, 2024, pp. 12–17

work page 2024
[22]

Gridmind: Llms-powered agents for power system analysis and operations,

H. Jinet al., “Gridmind: Llms-powered agents for power system analysis and operations,” inProceedings of the SC’25 W orkshops of the International Conference for High P erformance Computing, Networking, Storage and Analysis, 2025, pp. 560–568

work page 2025
[23]

Grid-agent: An llm-powered multi-agent system for power grid control,

Y . Zhanget al., “Grid-agent: An llm-powered multi-agent system for power grid control,”arXiv preprint arXiv:2508.05702, 2025. 9

work page arXiv 2025
[24]

Repower: An llm-driven autonomous platform for power system data-guided research,

Y .-X. Liuet al., “Repower: An llm-driven autonomous platform for power system data-guided research,”P atterns, vol. 6, no. 4, 2025

work page 2025
[25]

X-gridagent: An llm-powered agentic ai system for assisting power grid analysis,

X. Chenet al., “X-gridagent: An llm-powered agentic ai system for assisting power grid analysis,”arXiv preprint arXiv:2512.20789, 2025

work page arXiv 2025
[26]

Retrieval-augmented generation for knowledge-intensive nlp tasks,

P . Lewiset al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,”Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020

work page 2020
[27]

Enhancing tool retrieval with iterative feedback from large language models,

Q. Xuet al., “Enhancing tool retrieval with iterative feedback from large language models,” inFindings of the Association for Computational Linguistics: EMNLP 2024, 2024, pp. 9609–9619

work page 2024
[28]

Agent Workflow Memory

Z. Z. Wanget al., “Agent workflow memory,”arXiv preprint arXiv:2409.07429, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[29]

Meta-agent-workflow: Streamlining tool usage in llms through workflow construction, retrieval, and refinement,

X. Tanet al., “Meta-agent-workflow: Streamlining tool usage in llms through workflow construction, retrieval, and refinement,” inCompanion Proceedings of the ACM on W eb Conference 2025, 2025, pp. 458–467

work page 2025
[30]

Alloy: Generating reusable agent workflows from user demonstration,

J. Liet al., “Alloy: Generating reusable agent workflows from user demonstration,”arXiv preprint arXiv:2510.10049, 2025

work page arXiv 2025
[31]

Dense passage retrieval for open-domain question answering,

V . Karpukhinet al., “Dense passage retrieval for open-domain question answering,” inProceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), 2020, pp. 6769–6781

work page 2020
[32]

Learning to retrieve prompts for in-context learning,

O. Rubinet al., “Learning to retrieve prompts for in-context learning,” in Proceedings of the 2022 conference of the North American chapter of the association for computational linguistics: human language technologies, 2022, pp. 2655–2671

work page 2022
[33]

Dr.ICL: Demonstration-retrieved in-context learning,

M. Luoet al., “Dr.ICL: Demonstration-retrieved in-context learning,” arXiv preprint arXiv:2305.14128, 2023. [Online]. Available: https://arxiv.org/abs/2305.14128

work page arXiv 2023
[34]

A survey on retrieval-augmented text generation for large language models,

Y . Huang and J. X. Huang, “A survey on retrieval-augmented text generation for large language models,”ACM Computing Surveys, 2024

work page 2024
[35]

A comprehensive survey of retrieval-augmented generation (rag): Evolution, current landscape and future directions,

S. Guptaet al., “A comprehensive survey of retrieval-augmented generation (rag): Evolution, current landscape and future directions,”arXiv preprint arXiv:2410.12837, 2024

work page arXiv 2024
[36]

In-context retrieval-augmented language models,

O. Ramet al., “In-context retrieval-augmented language models,”Transactions of the Association for Computational Linguistics, vol. 11, pp. 1316–1331, 2023

work page 2023
[37]

Prompt optimization via adversarial in-context learning,

X. L. Doet al., “Prompt optimization via adversarial in-context learning,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long P apers), 2024, pp. 7308–7327

work page 2024
[38]

Avatar: Optimizing llm agents for tool usage via contrastive reasoning,

S. Wuet al., “Avatar: Optimizing llm agents for tool usage via contrastive reasoning,”Advances in Neural Information Processing Systems, vol. 37, pp. 25 981–26 010, 2024

work page 2024
[39]

Reflexion: Language agents with verbal reinforcement learning,

N. Shinnet al., “Reflexion: Language agents with verbal reinforcement learning,”Advances in neural information processing systems, vol. 36, pp. 8634–8652, 2023

work page 2023
[40]

Toolgate: Contract-grounded and verified tool execution for llms,

Y . Liuet al., “Toolgate: Contract-grounded and verified tool execution for llms,”arXiv preprint arXiv:2601.04688, 2026

work page arXiv 2026
[41]

Pro2guard: Proactive runtime enforcement of llm agent safety via probabilistic model checking,

H. Wanget al., “Pro2guard: Proactive runtime enforcement of llm agent safety via probabilistic model checking,”arXiv preprint arXiv:2508.00500, 2025

work page arXiv 2025
[42]

AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

H. Wanget al., “Agentspec: Customizable runtime enforcement for safe and reliable llm agents,”arXiv preprint arXiv:2503.18666, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

Robust power flow and three-phase power flow analyses,

A. Pandeyet al., “Robust power flow and three-phase power flow analyses,” IEEE Transactions on P ower Systems, vol. 34, no. 1, pp. 616–626, 2018

work page 2018
[44]

Anoca: Ac network-aware optimal curtailment approach for dynamic hosting capacity,

E. O. Badmus and A. Pandey, “Anoca: Ac network-aware optimal curtailment approach for dynamic hosting capacity,” in2024 IEEE 63rd Conference on Decision and Control (CDC). IEEE, 2024, pp. 5338–5345

work page 2024
[45]

Using opf-based operating envelopes to facilitate residential der services,

M. Z. Liuet al., “Using opf-based operating envelopes to facilitate residential der services,”IEEE Transactions on Smart Grid, vol. 13, no. 6, pp. 4494–4504, 2022

work page 2022
[46]

Inexactness of second order cone relaxations for calculating operating envelopes,

H. Moring and J. L. Mathieu, “Inexactness of second order cone relaxations for calculating operating envelopes,” in2023 IEEE International Conference on Communications, Control, and Computing T echnologies for Smart Grids (SmartGridComm). IEEE, 2023, pp. 1–6

work page 2023
[47]

Fair operating envelopes under uncertainty using chance constrained optimal power flow,

Y . Yi and G. V erbiˇc, “Fair operating envelopes under uncertainty using chance constrained optimal power flow,”Electric P ower Systems Research, vol. 213, p. 108465, 2022

work page 2022
[48]

Three-phase infeasibility analysis for distribution grid studies,

E. Fosteret al., “Three-phase infeasibility analysis for distribution grid studies,” Electric P ower Systems Research, vol. 212, p. 108486, 2022

work page 2022
[49]

Solving three-phase ac infeasibility analysis to near-zero optimality gap,

B. Panthee and A. Pandey, “Solving three-phase ac infeasibility analysis to near-zero optimality gap,”arXiv preprint arXiv:2508.15937, 2025

work page arXiv 2025
[50]

Langchain agents documentation,

“Langchain agents documentation,” https://docs.langchain.com/oss/python/ langchain/agents, accessed 2026-01-26

work page 2026
[51]

Crewai concepts: Agents, crews, and flows,

“Crewai concepts: Agents, crews, and flows,” CrewAI documentation, accessed 2026-01-26. [Online]. Available: https://docs.crewai.com/en/concepts/agents

work page 2026
[52]

Gpt-4o mini,

“Gpt-4o mini,” OpenAI API Documentation, accessed 2026-01-26. [Online]. Available: https://platform.openai.com/docs/models/gpt-4o-mini

work page 2026
[53]

Gpt-5.2,

“Gpt-5.2,” OpenAI API Documentation, accessed 2026-01-26. [Online]. Available: https://platform.openai.com/docs/models/gpt-5.2

work page 2026
[54]

[Online]

“Models,” OpenAI API Documentation, accessed 2026-01-26. [Online]. Available: https://platform.openai.com/docs/models

work page 2026
[56]

Qwen/qwen3-14b model card,

“Qwen/qwen3-14b model card,” Hugging Face, accessed 2026-01-26. [Online]. Available: https://huggingface.co/Qwen/Qwen3-14B

work page 2026
[57]

gpt-oss-120b & gpt-oss-20b model card,

“gpt-oss-120b & gpt-oss-20b model card,” OpenAI, accessed 2026-01-26. [Online]. Available: https://openai.com/index/gpt-oss-model-card/

work page 2026
[58]

Openai-compatible server,

“Openai-compatible server,” vLLM Documentation, accessed 2026-01-26. [Online]. Available: https://docs.vllm.ai/en/stable/serving/openaicompatible server/

work page 2026
[59]

Nvidia h100 tensor core gpu,

“Nvidia h100 tensor core gpu,” NVIDIA Product Page, accessed 2026-01-26. [Online]. Available: https://www.nvidia.com/en-us/data-center/h100/

work page 2026
[60]

text-embedding-3-large (model documentation),

OpenAI, “text-embedding-3-large (model documentation),” https://platform. openai.com/docs/models/text-embedding-3-large, 2026, accessed: Jan. 2026

work page 2026
[61]

Evaluating Large Language Models Trained on Code

M. Chenet al., “Evaluating large language models trained on code,”arXiv preprint arXiv:2107.03374, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021