Margin Play: A Multi-Agent System For Public Policy Analysis In The Brazilian Equatorial Margin

Allan Kardec Duailibe Barros Filho; Antonio de Sousa Leit\~ao Filho; Dennys Correia da Silva; Fabr\'icio Saul Lima; Lu\'is Jorge Mesquita de Jesus; Rejani Bandeira Vieira Sousa; Selby Mykael Lima dos Santos

arxiv: 2606.02614 · v1 · pith:RFFOQXPDnew · submitted 2026-05-26 · 💻 cs.CE · cs.AI

Margin Play: A Multi-Agent System For Public Policy Analysis In The Brazilian Equatorial Margin

Antonio de Sousa Leit\~ao Filho , Fabr\'icio Saul Lima , Selby Mykael Lima dos Santos , Rejani Bandeira Vieira Sousa , Lu\'is Jorge Mesquita de Jesus , Dennys Correia da Silva , Allan Kardec Duailibe Barros Filho This is my paper

Pith reviewed 2026-07-01 15:51 UTC · model grok-4.3

classification 💻 cs.CE cs.AI

keywords multi-agent reinforcement learningpublic policyoil explorationBrazilian Equatorial Marginwelfare analysisMARLpolicy simulation

0 comments

The pith

The central policy question for Brazilian Equatorial Margin oil exploration is resolved by choosing the right public policy regime rather than balancing production against welfare.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a multi-agent reinforcement learning system to model the interactions between the federal government, the state of Maranhão, oil operators, regulators, and local communities regarding oil exploration in the Brazilian Equatorial Margin. It shows that net positive externalities for the state depend on the institutional regime chosen for the exploration. A reader would care because it suggests that policy design can achieve better welfare outcomes without increasing production volumes. The results indicate marginal welfare under baseline but significant gains under an alternative configuration with reduced environmental impact.

Core claim

The paper claims that the fundamental problem is not a trade-off between production and welfare, but the choice of public policy regime linked to exploration. Using a six-agent MARL model calibrated to Brazilian data, the reference baseline yields marginal welfare gain while the MA-Prospero configuration produces a 17.5% increase in welfare and 21.3% in community revenue with lower environmental liability.

What carries the argument

Margin Play, the multi-agent reinforcement learning system with six agents under the centralized training with decentralized execution paradigm trained using BRO-MARL.

If this is right

Under baseline conditions, welfare gains from exploration are marginal at approximately 1.68.
The MA-Prospero policy configuration increases welfare by 17.5% and community revenue by 21.3%.
This configuration also reduces environmental liability from 0.076 to 0.048.
The outcomes depend on how royalties are earmarked and how agent incentives are aligned.
Exploration can generate net positive externalities for the state under the right regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar multi-agent models could be applied to other resource extraction frontiers to test policy regimes.
Empirical data from actual exploration starting in 2026 could validate or refute the simulation results.
The approach highlights the value of modeling conflicting mandates between agencies like ANP and IBAMA.
Extending the model to include more dynamic economic variables might reveal additional policy levers.

Load-bearing premise

The six-agent MARL model under CTDE with BRO-MARL training, calibrated to Brazilian empirical data, produces outputs that reflect real agent incentives and outcomes.

What would settle it

Collecting data on actual welfare changes, royalty distributions, and environmental impacts in Maranhão after exploration begins in 2026 and comparing them to the model's predictions for different policy scenarios.

Figures

Figures reproduced from arXiv: 2606.02614 by Allan Kardec Duailibe Barros Filho, Antonio de Sousa Leit\~ao Filho, Dennys Correia da Silva, Fabr\'icio Saul Lima, Lu\'is Jorge Mesquita de Jesus, Rejani Bandeira Vieira Sousa, Selby Mykael Lima dos Santos.

**Figure 2.** Figure 2: System architecture of the Margin Play framework under the CTDE paradigm. Six institutional agents interact [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Operational sequence of a Margin Play episode. Input stage (steps 1–2), 15-step biennial time loop (steps [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: presents the empirical response to the central question stated in section 1.1. The left panel shows aggregate welfare Waval by scenario; the right panel shows the converged return of the Community agent, a direct metric of the zonal welfare of Amazonian populations [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Learning curves by agent (200-episode moving average) for the six scenarios. The six panels correspond to [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Convergence diagnostics: rolling σ (window 500) on a logarithmic scale, by agent and by scenario. No collapse or divergence is observed. Scenarios with lower economic activity (pessimistic, MA-Próspero) achieve deeper convergence [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Macroeconomic trajectories: W (welfare), Eamb (environmental liability) and R (reserves). The red dashed line at Eamb = 0.20 marks the Frade-Chevron regulatory threshold. The MA-Próspero regime combines the highest W with the lowest Eamb, refuting the hypothesis of a structural trade-off between production and welfare. maturity adopted in this regime. This finding refutes the hypothesis of a structural tra… view at source ↗

**Figure 8.** Figure 8: Amazonian communities welfare. Left: convergence of Community agent return during training. Right: empirical distribution over the last 1,000 episodes. The MA-Próspero regime yields ∆Rcom = +21.3% relative to the reference baseline. 4.5 IBAMA and Environmental Liability The analysis of the IBAMA agent is methodologically significant because it allows us to assess to what extent the procedural-environmental… view at source ↗

**Figure 9.** Figure 9: IBAMA agent behaviour and environmental liability [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: Empirical return distributions (last 1,000 episodes), decomposed by agent and scenario. [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗

**Figure 11.** Figure 11: Mean Q¯ target by scenario and agent (last 1,000 episodes). Rows: six scenarios; columns: six agents. MA-Próspero redirects oil revenue from federal collection to regional social capital, expressing itself in high Q-values in the State Gov. and Community agents. additive aggregation of isolated effects, but from the interaction between interventions acting on the revenue capture structure — through the el… view at source ↗

**Figure 12.** Figure 12: Synthesis of the MA-Próspero regime: six structural levers combined. The interpretation is multiplicative: [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗

**Figure 13.** Figure 13: Counterfactual effect of each scenario relative to the reference ( [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗

read the original abstract

The Brazilian Equatorial Margin (BEM) is Brazil's next offshore oil frontier, with operations expected to begin in 2026 in the Foz do Amazonas basin. Its assets are fiscally and territorially linked primarily to Maranhao -- the state with the lowest HDI in the Federation (0.676, IBGE 2022). This raises the central policy question: under what conditions does BEM exploration generate net positive externalities for Maranhao? The problem is intrinsically multi-agent: the Federal Government seeks revenue and energy security; the state seeks regional welfare under constitutional royalty earmarking; the operator maximizes profit under risk; ANP and IBAMA hold conflicting mandates; and Amazonian communities prioritize territorial and environmental vectors over monetary income. We present Margin Play, a Multi-Agent Reinforcement Learning (MARL) system simulating these tensions under Brazilian empirical calibration and classical economic literature. It implements six agents under the CTDE paradigm, trained with BRO-MARL. Results from 60,000 episodes across six scenarios indicate the answer is conditional on the institutional regime: under the reference baseline, the welfare gain is marginal (Waval approx. 1.68), whereas the MA-Prospero configuration yields Delta W = +17.5% and Delta Rcom = +21.3%, with a lower environmental liability (Eamb = 0.048 vs. 0.076). The fundamental problem is not a trade-off between production and welfare, but the choice of public policy regime linked to exploration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MARL model for BEM oil policy applies established methods to a new setting but rests on unvalidated reward functions and calibration.

read the letter

The paper uses a six-agent CTDE setup with BRO-MARL to simulate policy regimes for oil exploration in Brazil's Equatorial Margin, linked to Maranhao. The main result is that one regime (MA-Prospero) produces a 17.5% welfare gain and lower environmental liability compared with baseline.

What is new is the concrete mapping of agents—federal revenue, state welfare under earmarks, operator profit, ANP/IBAMA mandates, and community priorities—onto this specific low-HDI frontier. The calibration draws on Brazilian data and standard economic references, and the 60,000-episode runs across six scenarios give a structured way to compare institutional choices.

The modeling choice to treat the problem as regime selection rather than a simple production-welfare trade-off follows directly from the agent incentives they define. That framing is internally consistent with the setup.

The soft spot is validation. The abstract reports numeric deltas but supplies no out-of-sample checks, historical matching, sensitivity on reward weights, or error analysis. If the state welfare term or environmental liability is misspecified, the reported superiority is an artifact of the simulation rather than evidence about real incentives.

This is for readers working on computational policy tools or MARL applications to resource governance. Someone already using multi-agent methods for public decisions could extract the agent design and scenario structure.

It deserves a serious referee to check the calibration and reward specification. I would send it for review.

Referee Report

3 major / 2 minor

Summary. The paper introduces Margin Play, a six-agent MARL system under the CTDE paradigm trained with BRO-MARL, to simulate policy tensions around oil exploration in Brazil's Equatorial Margin (Foz do Amazonas basin). Agents represent the Federal Government, Maranhão state, operators, ANP/IBAMA, and Amazonian communities, with rewards calibrated to Brazilian data and economic literature. Across 60,000 episodes and six scenarios, the central claim is that net positive welfare externalities for Maranhão (lowest-HDI state) are conditional on institutional regime rather than an inherent production-welfare trade-off; the MA-Prospero regime yields Delta W = +17.5%, Delta Rcom = +21.3%, and Eamb = 0.048 (vs. baseline Waval ~1.68 and Eamb 0.076).

Significance. If the simulation outputs can be shown to reflect observable incentives and produce falsifiable predictions, the work would offer a novel computational framework for multi-stakeholder resource policy analysis, extending classical economic models of royalty earmarking and environmental liability into a dynamic, game-theoretic setting with potential applicability to other frontier basins.

major comments (3)

[Abstract] Abstract: the reported quantitative outcomes (Waval ~1.68, Delta W +17.5%, Eamb 0.048 vs. 0.076) rest on unvalidated simulation outputs; no error bars, sensitivity analysis on reward weights, out-of-sample historical matching, or calibration diagnostics are supplied, so it is impossible to determine whether the MA-Prospero superiority is an artifact of reward-function specification rather than evidence against a production-welfare trade-off.
[Abstract] Abstract (and implied § on agent design): the claim that the six-agent model is 'calibrated to Brazilian empirical data and classical economic literature' is load-bearing for the regime-choice conclusion, yet no explicit functional forms, parameter values, or validation steps for the Federal revenue, state welfare (earmarking), operator profit, ANP/IBAMA mandate, or community territorial reward functions are provided; without these, the equilibrium behaviors cannot be assessed for realism.
[Abstract] Abstract: the assertion that 'the fundamental problem is not a trade-off between production and welfare, but the choice of public policy regime' is directly inferred from the scenario deltas, but the manuscript supplies no robustness checks (e.g., alternative reward weightings or stochastic shock tests) that would falsify this interpretation if the environmental-liability term or state welfare weight were misspecified.

minor comments (2)

[Abstract] Abstract: numeric values are given without units or precise definitions (e.g., what exactly is 'Waval', 'Eamb', or the baseline against which +17.5% is measured).
The manuscript should include a table or appendix listing the exact reward-function equations and the empirical sources used for each agent's parameters.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive critique emphasizing validation, explicit calibration details, and robustness. These points are essential for establishing the simulation's credibility in policy analysis. We address each major comment below and will revise the manuscript to incorporate the requested elements.

read point-by-point responses

Referee: [Abstract] Abstract: the reported quantitative outcomes (Waval ~1.68, Delta W +17.5%, Eamb 0.048 vs. 0.076) rest on unvalidated simulation outputs; no error bars, sensitivity analysis on reward weights, out-of-sample historical matching, or calibration diagnostics are supplied, so it is impossible to determine whether the MA-Prospero superiority is an artifact of reward-function specification rather than evidence against a production-welfare trade-off.

Authors: We agree that the abstract and results presentation would be strengthened by explicit validation diagnostics. The methods section describes calibration against IBGE, ANP, and IBAMA data sources plus literature on fiscal federalism, but we did not report error bars across seeds or sensitivity to reward weights. In revision we will add a new robustness subsection with: (i) standard errors from 10 independent training runs, (ii) sensitivity sweeps on the environmental-liability and state-welfare weights (±20 %), and (iii) an out-of-sample comparison of simulated baseline royalty flows against observed distributions in the Campos and Santos basins. These additions will allow readers to evaluate whether regime superiority is robust. revision: yes
Referee: [Abstract] Abstract (and implied § on agent design): the claim that the six-agent model is 'calibrated to Brazilian empirical data and classical economic literature' is load-bearing for the regime-choice conclusion, yet no explicit functional forms, parameter values, or validation steps for the Federal revenue, state welfare (earmarking), operator profit, ANP/IBAMA mandate, or community territorial reward functions are provided; without these, the equilibrium behaviors cannot be assessed for realism.

Authors: The full manuscript (Section 3.2 and Appendix A) supplies the reward equations and parameter sources, but we acknowledge they are not presented in a single consolidated table. We will revise by expanding Appendix A with: (i) the exact functional forms for each of the six agents (Federal revenue = royalty share + corporate tax; state welfare = earmarked royalty fraction × HDI-weighted multiplier; operator profit = revenue – costs – liability; ANP/IBAMA dual mandate; community territorial utility), (ii) a table of all numerical parameters with citations, and (iii) a short validation paragraph comparing baseline equilibrium statistics to historical royalty earmarking outcomes reported by the National Treasury. revision: yes
Referee: [Abstract] Abstract: the assertion that 'the fundamental problem is not a trade-off between production and welfare, but the choice of public policy regime' is directly inferred from the scenario deltas, but the manuscript supplies no robustness checks (e.g., alternative reward weightings or stochastic shock tests) that would falsify this interpretation if the environmental-liability term or state welfare weight were misspecified.

Authors: We concur that the interpretive claim requires explicit falsification tests. The six scenarios vary institutional rules while holding reward weights fixed. In the revised version we will add two new experiment sets: (i) re-training under ±15 % and ±30 % perturbations to the environmental-liability and state-welfare coefficients, and (ii) stochastic oil-price shocks drawn from historical volatility. We will report whether the MA-Prospero regime retains its welfare and environmental advantages under these perturbations; if the ranking reverses under plausible misspecifications we will qualify the conclusion accordingly. revision: yes

Circularity Check

0 steps flagged

No significant circularity; simulation outputs provide independent content

full rationale

The paper derives its central claim—that the problem reduces to institutional regime choice rather than a production-welfare trade-off—from the numerical outputs of a six-agent CTDE/BRO-MARL simulation run across six explicitly defined scenarios. The abstract states that the model is calibrated to Brazilian empirical data and classical literature, then reports specific deltas (Waval ≈1.68, Delta W +17.5%, Eamb 0.048) as evidence. No equations, reward-function definitions, or self-citations are supplied that would reduce these outputs to the inputs by construction. The derivation chain therefore remains self-contained: the simulation constitutes an independent computational experiment whose results can be checked against external data or alternative calibrations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities listed. Model calibration to 'Brazilian empirical data' and 'classical economic literature' is invoked but not detailed.

pith-pipeline@v0.9.1-grok · 5854 in / 1017 out tokens · 26495 ms · 2026-07-01T15:51:05.506933+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 17 canonical work pages

[1]

11ª rodada de licitações e 5º ciclo de ofertas permanentes — documentos editais e histórico de arrematação MEB

Agência Nacional do Petróleo, Gás Natural e Biocombustíveis (ANP). 11ª rodada de licitações e 5º ciclo de ofertas permanentes — documentos editais e histórico de arrematação MEB. Technical report, ANP, Brasília, Brazil, 2025

2025
[2]

Parecer FZA-M-59 — recusa de licença de perfuração na bacia da foz do amazonas

IBAMA — Diretoria de Licenciamento Ambiental (DILIC). Parecer FZA-M-59 — recusa de licença de perfuração na bacia da foz do amazonas. Technical Report Documento Técnico nº 02/2023, IBAMA, Brasília, Brazil, 2023

2023
[3]

NDC atualizada — acordo de Paris

Ministério do Meio Ambiente e Mudança do Clima (MMA). NDC atualizada — acordo de Paris. meta −59% emissões até 2035. Technical report, MMA, Brasília, Brazil, 2024

2035
[4]

Constituição da república federativa do brasil, 1988

Brasil. Constituição da república federativa do brasil, 1988. Promulgada em 5 de outubro de 1988

1988
[5]

Resolução ANP nº 882, de 25 de maio de 2022 — programa de segurança operacional (PSO)

Agência Nacional do Petróleo, Gás Natural e Biocombustíveis (ANP). Resolução ANP nº 882, de 25 de maio de 2022 — programa de segurança operacional (PSO). Technical report, ANP, Brasília, Brazil, 2022

2022
[6]

Manav, R

Frederick van der Ploeg. Why do many resource-rich countries have negative genuine saving? Anticipation of better times or rapacious rent seeking.Resource and Energy Economics, 32(1):28–44, 2010. doi: 10.1016/j. reseneeco.2009.07.001

work page doi:10.1016/j 2010
[7]

Max Corden and J

W. Max Corden and J. Peter Neary. Booming sector and de-industrialisation in a small open economy.Economic Journal, 92(368):825–848, 1982. doi: 10.2307/2232670

work page doi:10.2307/2232670 1982
[8]

Conflitos no campo Brasil 2024

Comissão Pastoral da Terra (CPT). Conflitos no campo Brasil 2024. Technical report, CPT Nacional, Goiânia, Brazil, 2025

2024
[9]

Violência contra os povos indígenas no Brasil — dados de 2024

Conselho Indigenista Missionário (CIMI). Violência contra os povos indígenas no Brasil — dados de 2024. Technical report, CIMI, Brasília, Brazil, 2025

2024
[10]

Roland Hodler, Michael Lechner, and Paul A. Raschky. Institutions and the resource curse: New insights from causal machine learning.PLOS ONE, 18(6):e0284968, 2023. doi: 10.1371/journal.pone.0284968

work page doi:10.1371/journal.pone.0284968 2023
[11]

Robert E. Lucas. Econometric policy evaluation: A critique. In Karl Brunner and Allan H. Meltzer, editors,The Phillips Curve and Labor Markets, volume 1 ofCarnegie-Rochester Conference Series on Public Policy, pages 19–46. North-Holland, Amsterdam, 1976

1976
[12]

Is public expenditure productive?Journal of Monetary Economics, 23(2):177–200, 1989

David Alan Aschauer. Is public expenditure productive?Journal of Monetary Economics, 23(2):177–200, 1989. doi: 10.1016/0304-3932(89)90047-0

work page doi:10.1016/0304-3932(89)90047-0 1989
[13]

Form 20-F filing with the U.S

Petrobras. Form 20-F filing with the U.S. securities and exchange commission. Technical report, Petrobras, Rio de Janeiro, Brazil, 2024

2024
[14]

Robinson.Why Nations Fail: The Origins of Power, Prosperity, and Poverty

Daron Acemoglu and James A. Robinson.Why Nations Fail: The Origins of Power, Prosperity, and Poverty. Crown Business, New York, 2012

2012
[15]

Lei nº 12.858, de 9 de setembro de 2013 — vinculação de royalties (75% educação, 25% saúde), 2013

Brasil. Lei nº 12.858, de 9 de setembro de 2013 — vinculação de royalties (75% educação, 25% saúde), 2013. 19 Margin Play: MARL for Public Policy Analysis in the BEMA PREPRINT

2013
[16]

Bigger, regularized, optimistic: Scaling for compute-efficient continuous control

Michał Nauman, Maciej Ostaszewski, Krzysztof Jankowski, Piotr Miło´s, and Mateusz Cygan. Bigger, regularized, optimistic: Scaling for compute-efficient continuous control. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

2024
[17]

Multi-agent actor-critic for mixed cooperative-competitive environments

Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. InAdvances in Neural Information Processing Systems (NIPS), 2017

2017
[18]

Controlling overestimation bias with truncated mixture of continuous distributional quantile critics

Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, and Dmitry Vetrov. Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. InInternational Conference on Machine Learning (ICML), 2020

2020
[19]

Cambridge University Press, Cambridge, 2009

Yoav Shoham and Kevin Leyton-Brown.Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, Cambridge, 2009. doi: 10.1017/CBO9780511811654

work page doi:10.1017/cbo9780511811654 2009
[20]

Parkes, and Richard Socher

Stephan Zheng, Alexander Trott, Sunil Srinivasa, David C. Parkes, and Richard Socher. The AI Economist: Taxation policy design via two-level deep multiagent reinforcement learning.Science Advances, 8(18):eabk2607,
[21]

doi: 10.1126/sciadv.abk2607

work page doi:10.1126/sciadv.abk2607
[22]

Revealing robust oil and gas company macro-strategies using deep multi-agent reinforcement learning, 2022

Dylan Radovic, Lucas Kruitwagen, Christian Schroeder de Witt, Ben Caldecott, Shane Tomlinson, and Mark Workman. Revealing robust oil and gas company macro-strategies using deep multi-agent reinforcement learning, 2022

2022
[23]

PolicySpace2: Modeling markets and endogenous public policies.Journal of Artificial Societies and Social Simulation, 25(1):8, 2022

Bernardo Alves Furtado. PolicySpace2: Modeling markets and endogenous public policies.Journal of Artificial Societies and Social Simulation, 25(1):8, 2022. doi: 10.18564/jasss.4742

work page doi:10.18564/jasss.4742 2022
[24]

Lei nº 9.478, de 6 de agosto de 1997 — lei do petróleo, 1997

Brasil. Lei nº 9.478, de 6 de agosto de 1997 — lei do petróleo, 1997

1997
[25]

Ações diretas de inconstitucionalidade nº 4.916, 4.917, 4.918 e 4.920 — distribuição de royalties da lei 12.734/2012

Supremo Tribunal Federal (STF). Ações diretas de inconstitucionalidade nº 4.916, 4.917, 4.918 e 4.920 — distribuição de royalties da lei 12.734/2012. medida cautelar deferida em 18 de março de 2013, rel. min. cármen lúcia, suspendendo a eficácia da lei 12.734/2012. julgamento de mérito iniciado em 6–7 de maio de 2026; voto da relatora pela inconstituciona...

2012
[26]

Duke University Press, Durham, NC, 2008

Arturo Escobar.Territories of Difference: Place, Movements, Life, Redes. Duke University Press, Durham, NC, 2008

2008
[27]

NAEA/UFPA, Belém, Brazil, 2008

Alfredo Wagner Berno de Almeida.Terras Tradicionalmente Ocupadas: Processos de Territorialização e Movimentos Sociais. NAEA/UFPA, Belém, Brazil, 2008

2008
[28]

King Hubbert

M. King Hubbert. Nuclear energy and the fossil fuels. InDrilling and Production Practice. American Petroleum Institute, 1956

1956
[29]

Adam R. Brandt. Review of mathematical models of future oil supply.Energy, 35(9):3958–3974, 2010. doi: 10.1016/j.energy.2010.04.011

work page doi:10.1016/j.energy.2010.04.011 2010
[30]

Acórdão 2.936/2021 — plenário — auditoria operacional ANP, gargalos de aprovação de planos de desenvolvimento

Tribunal de Contas da União (TCU). Acórdão 2.936/2021 — plenário — auditoria operacional ANP, gargalos de aprovação de planos de desenvolvimento. Technical report, TCU, Brasília, Brazil, 2021

2021
[31]

Alicia H. Munnell. Why has productivity growth declined? Productivity and public investment.New England Economic Review, pages 3–22, 1990. January/February

1990
[32]

Robert E. Lucas. On the mechanics of economic development.Journal of Monetary Economics, 22(1):3–42, 1988. doi: 10.1016/0304-3932(88)90168-7

work page doi:10.1016/0304-3932(88)90168-7 1988
[33]

On the concept of health capital and the demand for health.Journal of Political Economy, 80 (2):223–255, 1972

Michael Grossman. On the concept of health capital and the demand for health.Journal of Political Economy, 80 (2):223–255, 1972. doi: 10.1086/259880

work page doi:10.1086/259880 1972
[34]

Cobb and Paul H

Charles W. Cobb and Paul H. Douglas. A theory of production.American Economic Review, 18(1):139–165, 1928

1928
[35]

Ravikumar

Gerhard Glomm and B. Ravikumar. Public versus private investment in human capital: Endogenous growth and income inequality.Journal of Political Economy, 100(4):818–834, 1992. doi: 10.1086/261841

work page doi:10.1086/261841 1992
[36]

North.Institutions, Institutional Change and Economic Performance

Douglass C. North.Institutions, Institutional Change and Economic Performance. Cambridge University Press, Cambridge, 1990. doi: 10.1017/CBO9780511808678

work page doi:10.1017/cbo9780511808678 1990
[37]

Elsevier- Campus, Rio de Janeiro, Brazil, 2014

Marcos Mendes.Por Que o Brasil Cresce Pouco? Desigualdade, Democracia e Baixo Crescimento. Elsevier- Campus, Rio de Janeiro, Brazil, 2014

2014
[38]

Sistema FINBRAS — receitas estaduais detalhadas

Secretaria do Tesouro Nacional (STN). Sistema FINBRAS — receitas estaduais detalhadas. Technical report, STN, Brasília, Brazil, 2024. 20 Margin Play: MARL for Public Policy Analysis in the BEMA PREPRINT

2024
[39]

IPEAdata — séries de PIB, FBKF estadual, deflatores

Instituto de Pesquisa Econômica Aplicada (IPEA). IPEAdata — séries de PIB, FBKF estadual, deflatores. Technical report, IPEA, Brasília, Brazil, 2024

2024
[40]

Lei complementar nº 91, de 22 de dezembro de 1997 — coeficientes do FPE, 1997

Brasil. Lei complementar nº 91, de 22 de dezembro de 1997 — coeficientes do FPE, 1997

1997
[41]

Emenda constitucional nº 108, de 26 de agosto de 2020 — FUNDEB permanente, 2020

Brasil. Emenda constitucional nº 108, de 26 de agosto de 2020 — FUNDEB permanente, 2020

2020
[42]

John W. Pratt. Risk aversion in the small and in the large.Econometrica, 32(1–2):122–136, 1964. doi: 10.2307/1913738

work page doi:10.2307/1913738 1964
[43]

Atkinson

Anthony B. Atkinson. On the measurement of inequality.Journal of Economic Theory, 2(3):244–263, 1970. doi: 10.1016/0022-0531(70)90039-6

work page doi:10.1016/0022-0531(70)90039-6 1970
[44]

Electoral manipulation via voter-friendly spending.Journal of Development Economics, 92(1):39–52, 2010

Allan Drazen and Marcela Eslava. Electoral manipulation via voter-friendly spending.Journal of Development Economics, 92(1):39–52, 2010. doi: 10.1016/j.jdeveco.2009.01.010

work page doi:10.1016/j.jdeveco.2009.01.010 2010
[45]

Equilibrium political budget cycles.American Economic Review, 80(1):21–36, 1990

Kenneth Rogoff. Equilibrium political budget cycles.American Economic Review, 80(1):21–36, 1990

1990
[46]

MIT Press, Cambridge, MA, 2000

Torsten Persson and Guido Tabellini.Political Economics: Explaining Economic Policy. MIT Press, Cambridge, MA, 2000

2000
[47]

Vicki M. Bier. Implications of the research on expert overconfidence and dependence.Reliability Engineering & System Safety, 85(1–3):321–329, 2004. doi: 10.1016/j.ress.2004.03.020

work page doi:10.1016/j.ress.2004.03.020 2004
[48]

Ação civil pública — caso frade/chevron 2011/RJ, 2011

Procuradoria-Geral da República. Ação civil pública — caso frade/chevron 2011/RJ, 2011

2011
[49]

Deep water: The gulf oil disaster and the future of offshore drilling

National Commission on the BP Deepwater Horizon Oil Spill and Offshore Drilling. Deep water: The gulf oil disaster and the future of offshore drilling. Technical report, U.S. Government Printing Office, Washington, DC, 2011

2011
[50]

Lei nº 6.938, de 31 de agosto de 1981 — política nacional do meio ambiente (PNMA), 1981

Brasil. Lei nº 6.938, de 31 de agosto de 1981 — política nacional do meio ambiente (PNMA), 1981

1981
[51]

SIPRA — sistema de informação de projetos de reforma agrária

Instituto Nacional de Colonização e Reforma Agrária (INCRA). SIPRA — sistema de informação de projetos de reforma agrária. Technical report, INCRA, Brasília, Brazil, 2024

2024
[52]

No atual ritmo, Brasil levará 2.188 anos para titular todos os territórios quilombolas com processos no INCRA, 2024

Terra de Direitos. No atual ritmo, Brasil levará 2.188 anos para titular todos os territórios quilombolas com processos no INCRA, 2024

2024
[53]

Bellemare, Will Dabney, and Rémi Munos

Marc G. Bellemare, Will Dabney, and Rémi Munos. A distributional perspective on reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, ICML, pages 449–458. PMLR, 2017

2017
[54]

Lillicrap, Jonathan J

Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. InInternational Conference on Learning Representations (ICLR), 2016

2016
[55]

Addressing function approximation error in actor-critic methods

Scott Fujimoto, Herke van Hoof, and David Meger. Addressing function approximation error in actor-critic methods. InInternational Conference on Machine Learning (ICML), 2018

2018
[56]

Venables

Frederick van der Ploeg and Anthony J. Venables. Harnessing windfall revenues: Optimal policies for resource-rich developing economies.Economic Journal, 121(551):1–30, 2011. doi: 10.1111/j.1468-0297.2010.02411.x. 21

work page doi:10.1111/j.1468-0297.2010.02411.x 2011

[1] [1]

11ª rodada de licitações e 5º ciclo de ofertas permanentes — documentos editais e histórico de arrematação MEB

Agência Nacional do Petróleo, Gás Natural e Biocombustíveis (ANP). 11ª rodada de licitações e 5º ciclo de ofertas permanentes — documentos editais e histórico de arrematação MEB. Technical report, ANP, Brasília, Brazil, 2025

2025

[2] [2]

Parecer FZA-M-59 — recusa de licença de perfuração na bacia da foz do amazonas

IBAMA — Diretoria de Licenciamento Ambiental (DILIC). Parecer FZA-M-59 — recusa de licença de perfuração na bacia da foz do amazonas. Technical Report Documento Técnico nº 02/2023, IBAMA, Brasília, Brazil, 2023

2023

[3] [3]

NDC atualizada — acordo de Paris

Ministério do Meio Ambiente e Mudança do Clima (MMA). NDC atualizada — acordo de Paris. meta −59% emissões até 2035. Technical report, MMA, Brasília, Brazil, 2024

2035

[4] [4]

Constituição da república federativa do brasil, 1988

Brasil. Constituição da república federativa do brasil, 1988. Promulgada em 5 de outubro de 1988

1988

[5] [5]

Resolução ANP nº 882, de 25 de maio de 2022 — programa de segurança operacional (PSO)

Agência Nacional do Petróleo, Gás Natural e Biocombustíveis (ANP). Resolução ANP nº 882, de 25 de maio de 2022 — programa de segurança operacional (PSO). Technical report, ANP, Brasília, Brazil, 2022

2022

[6] [6]

Manav, R

Frederick van der Ploeg. Why do many resource-rich countries have negative genuine saving? Anticipation of better times or rapacious rent seeking.Resource and Energy Economics, 32(1):28–44, 2010. doi: 10.1016/j. reseneeco.2009.07.001

work page doi:10.1016/j 2010

[7] [7]

Max Corden and J

W. Max Corden and J. Peter Neary. Booming sector and de-industrialisation in a small open economy.Economic Journal, 92(368):825–848, 1982. doi: 10.2307/2232670

work page doi:10.2307/2232670 1982

[8] [8]

Conflitos no campo Brasil 2024

Comissão Pastoral da Terra (CPT). Conflitos no campo Brasil 2024. Technical report, CPT Nacional, Goiânia, Brazil, 2025

2024

[9] [9]

Violência contra os povos indígenas no Brasil — dados de 2024

Conselho Indigenista Missionário (CIMI). Violência contra os povos indígenas no Brasil — dados de 2024. Technical report, CIMI, Brasília, Brazil, 2025

2024

[10] [10]

Roland Hodler, Michael Lechner, and Paul A. Raschky. Institutions and the resource curse: New insights from causal machine learning.PLOS ONE, 18(6):e0284968, 2023. doi: 10.1371/journal.pone.0284968

work page doi:10.1371/journal.pone.0284968 2023

[11] [11]

Robert E. Lucas. Econometric policy evaluation: A critique. In Karl Brunner and Allan H. Meltzer, editors,The Phillips Curve and Labor Markets, volume 1 ofCarnegie-Rochester Conference Series on Public Policy, pages 19–46. North-Holland, Amsterdam, 1976

1976

[12] [12]

Is public expenditure productive?Journal of Monetary Economics, 23(2):177–200, 1989

David Alan Aschauer. Is public expenditure productive?Journal of Monetary Economics, 23(2):177–200, 1989. doi: 10.1016/0304-3932(89)90047-0

work page doi:10.1016/0304-3932(89)90047-0 1989

[13] [13]

Form 20-F filing with the U.S

Petrobras. Form 20-F filing with the U.S. securities and exchange commission. Technical report, Petrobras, Rio de Janeiro, Brazil, 2024

2024

[14] [14]

Robinson.Why Nations Fail: The Origins of Power, Prosperity, and Poverty

Daron Acemoglu and James A. Robinson.Why Nations Fail: The Origins of Power, Prosperity, and Poverty. Crown Business, New York, 2012

2012

[15] [15]

Lei nº 12.858, de 9 de setembro de 2013 — vinculação de royalties (75% educação, 25% saúde), 2013

Brasil. Lei nº 12.858, de 9 de setembro de 2013 — vinculação de royalties (75% educação, 25% saúde), 2013. 19 Margin Play: MARL for Public Policy Analysis in the BEMA PREPRINT

2013

[16] [16]

Bigger, regularized, optimistic: Scaling for compute-efficient continuous control

Michał Nauman, Maciej Ostaszewski, Krzysztof Jankowski, Piotr Miło´s, and Mateusz Cygan. Bigger, regularized, optimistic: Scaling for compute-efficient continuous control. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

2024

[17] [17]

Multi-agent actor-critic for mixed cooperative-competitive environments

Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. InAdvances in Neural Information Processing Systems (NIPS), 2017

2017

[18] [18]

Controlling overestimation bias with truncated mixture of continuous distributional quantile critics

Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, and Dmitry Vetrov. Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. InInternational Conference on Machine Learning (ICML), 2020

2020

[19] [19]

Cambridge University Press, Cambridge, 2009

Yoav Shoham and Kevin Leyton-Brown.Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, Cambridge, 2009. doi: 10.1017/CBO9780511811654

work page doi:10.1017/cbo9780511811654 2009

[20] [20]

Parkes, and Richard Socher

Stephan Zheng, Alexander Trott, Sunil Srinivasa, David C. Parkes, and Richard Socher. The AI Economist: Taxation policy design via two-level deep multiagent reinforcement learning.Science Advances, 8(18):eabk2607,

[21] [21]

doi: 10.1126/sciadv.abk2607

work page doi:10.1126/sciadv.abk2607

[22] [22]

Revealing robust oil and gas company macro-strategies using deep multi-agent reinforcement learning, 2022

Dylan Radovic, Lucas Kruitwagen, Christian Schroeder de Witt, Ben Caldecott, Shane Tomlinson, and Mark Workman. Revealing robust oil and gas company macro-strategies using deep multi-agent reinforcement learning, 2022

2022

[23] [23]

PolicySpace2: Modeling markets and endogenous public policies.Journal of Artificial Societies and Social Simulation, 25(1):8, 2022

Bernardo Alves Furtado. PolicySpace2: Modeling markets and endogenous public policies.Journal of Artificial Societies and Social Simulation, 25(1):8, 2022. doi: 10.18564/jasss.4742

work page doi:10.18564/jasss.4742 2022

[24] [24]

Lei nº 9.478, de 6 de agosto de 1997 — lei do petróleo, 1997

Brasil. Lei nº 9.478, de 6 de agosto de 1997 — lei do petróleo, 1997

1997

[25] [25]

Ações diretas de inconstitucionalidade nº 4.916, 4.917, 4.918 e 4.920 — distribuição de royalties da lei 12.734/2012

Supremo Tribunal Federal (STF). Ações diretas de inconstitucionalidade nº 4.916, 4.917, 4.918 e 4.920 — distribuição de royalties da lei 12.734/2012. medida cautelar deferida em 18 de março de 2013, rel. min. cármen lúcia, suspendendo a eficácia da lei 12.734/2012. julgamento de mérito iniciado em 6–7 de maio de 2026; voto da relatora pela inconstituciona...

2012

[26] [26]

Duke University Press, Durham, NC, 2008

Arturo Escobar.Territories of Difference: Place, Movements, Life, Redes. Duke University Press, Durham, NC, 2008

2008

[27] [27]

NAEA/UFPA, Belém, Brazil, 2008

Alfredo Wagner Berno de Almeida.Terras Tradicionalmente Ocupadas: Processos de Territorialização e Movimentos Sociais. NAEA/UFPA, Belém, Brazil, 2008

2008

[28] [28]

King Hubbert

M. King Hubbert. Nuclear energy and the fossil fuels. InDrilling and Production Practice. American Petroleum Institute, 1956

1956

[29] [29]

Adam R. Brandt. Review of mathematical models of future oil supply.Energy, 35(9):3958–3974, 2010. doi: 10.1016/j.energy.2010.04.011

work page doi:10.1016/j.energy.2010.04.011 2010

[30] [30]

Acórdão 2.936/2021 — plenário — auditoria operacional ANP, gargalos de aprovação de planos de desenvolvimento

Tribunal de Contas da União (TCU). Acórdão 2.936/2021 — plenário — auditoria operacional ANP, gargalos de aprovação de planos de desenvolvimento. Technical report, TCU, Brasília, Brazil, 2021

2021

[31] [31]

Alicia H. Munnell. Why has productivity growth declined? Productivity and public investment.New England Economic Review, pages 3–22, 1990. January/February

1990

[32] [32]

Robert E. Lucas. On the mechanics of economic development.Journal of Monetary Economics, 22(1):3–42, 1988. doi: 10.1016/0304-3932(88)90168-7

work page doi:10.1016/0304-3932(88)90168-7 1988

[33] [33]

On the concept of health capital and the demand for health.Journal of Political Economy, 80 (2):223–255, 1972

Michael Grossman. On the concept of health capital and the demand for health.Journal of Political Economy, 80 (2):223–255, 1972. doi: 10.1086/259880

work page doi:10.1086/259880 1972

[34] [34]

Cobb and Paul H

Charles W. Cobb and Paul H. Douglas. A theory of production.American Economic Review, 18(1):139–165, 1928

1928

[35] [35]

Ravikumar

Gerhard Glomm and B. Ravikumar. Public versus private investment in human capital: Endogenous growth and income inequality.Journal of Political Economy, 100(4):818–834, 1992. doi: 10.1086/261841

work page doi:10.1086/261841 1992

[36] [36]

North.Institutions, Institutional Change and Economic Performance

Douglass C. North.Institutions, Institutional Change and Economic Performance. Cambridge University Press, Cambridge, 1990. doi: 10.1017/CBO9780511808678

work page doi:10.1017/cbo9780511808678 1990

[37] [37]

Elsevier- Campus, Rio de Janeiro, Brazil, 2014

Marcos Mendes.Por Que o Brasil Cresce Pouco? Desigualdade, Democracia e Baixo Crescimento. Elsevier- Campus, Rio de Janeiro, Brazil, 2014

2014

[38] [38]

Sistema FINBRAS — receitas estaduais detalhadas

Secretaria do Tesouro Nacional (STN). Sistema FINBRAS — receitas estaduais detalhadas. Technical report, STN, Brasília, Brazil, 2024. 20 Margin Play: MARL for Public Policy Analysis in the BEMA PREPRINT

2024

[39] [39]

IPEAdata — séries de PIB, FBKF estadual, deflatores

Instituto de Pesquisa Econômica Aplicada (IPEA). IPEAdata — séries de PIB, FBKF estadual, deflatores. Technical report, IPEA, Brasília, Brazil, 2024

2024

[40] [40]

Lei complementar nº 91, de 22 de dezembro de 1997 — coeficientes do FPE, 1997

Brasil. Lei complementar nº 91, de 22 de dezembro de 1997 — coeficientes do FPE, 1997

1997

[41] [41]

Emenda constitucional nº 108, de 26 de agosto de 2020 — FUNDEB permanente, 2020

Brasil. Emenda constitucional nº 108, de 26 de agosto de 2020 — FUNDEB permanente, 2020

2020

[42] [42]

John W. Pratt. Risk aversion in the small and in the large.Econometrica, 32(1–2):122–136, 1964. doi: 10.2307/1913738

work page doi:10.2307/1913738 1964

[43] [43]

Atkinson

Anthony B. Atkinson. On the measurement of inequality.Journal of Economic Theory, 2(3):244–263, 1970. doi: 10.1016/0022-0531(70)90039-6

work page doi:10.1016/0022-0531(70)90039-6 1970

[44] [44]

Electoral manipulation via voter-friendly spending.Journal of Development Economics, 92(1):39–52, 2010

Allan Drazen and Marcela Eslava. Electoral manipulation via voter-friendly spending.Journal of Development Economics, 92(1):39–52, 2010. doi: 10.1016/j.jdeveco.2009.01.010

work page doi:10.1016/j.jdeveco.2009.01.010 2010

[45] [45]

Equilibrium political budget cycles.American Economic Review, 80(1):21–36, 1990

Kenneth Rogoff. Equilibrium political budget cycles.American Economic Review, 80(1):21–36, 1990

1990

[46] [46]

MIT Press, Cambridge, MA, 2000

Torsten Persson and Guido Tabellini.Political Economics: Explaining Economic Policy. MIT Press, Cambridge, MA, 2000

2000

[47] [47]

Vicki M. Bier. Implications of the research on expert overconfidence and dependence.Reliability Engineering & System Safety, 85(1–3):321–329, 2004. doi: 10.1016/j.ress.2004.03.020

work page doi:10.1016/j.ress.2004.03.020 2004

[48] [48]

Ação civil pública — caso frade/chevron 2011/RJ, 2011

Procuradoria-Geral da República. Ação civil pública — caso frade/chevron 2011/RJ, 2011

2011

[49] [49]

Deep water: The gulf oil disaster and the future of offshore drilling

National Commission on the BP Deepwater Horizon Oil Spill and Offshore Drilling. Deep water: The gulf oil disaster and the future of offshore drilling. Technical report, U.S. Government Printing Office, Washington, DC, 2011

2011

[50] [50]

Lei nº 6.938, de 31 de agosto de 1981 — política nacional do meio ambiente (PNMA), 1981

Brasil. Lei nº 6.938, de 31 de agosto de 1981 — política nacional do meio ambiente (PNMA), 1981

1981

[51] [51]

SIPRA — sistema de informação de projetos de reforma agrária

Instituto Nacional de Colonização e Reforma Agrária (INCRA). SIPRA — sistema de informação de projetos de reforma agrária. Technical report, INCRA, Brasília, Brazil, 2024

2024

[52] [52]

No atual ritmo, Brasil levará 2.188 anos para titular todos os territórios quilombolas com processos no INCRA, 2024

Terra de Direitos. No atual ritmo, Brasil levará 2.188 anos para titular todos os territórios quilombolas com processos no INCRA, 2024

2024

[53] [53]

Bellemare, Will Dabney, and Rémi Munos

Marc G. Bellemare, Will Dabney, and Rémi Munos. A distributional perspective on reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, ICML, pages 449–458. PMLR, 2017

2017

[54] [54]

Lillicrap, Jonathan J

Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. InInternational Conference on Learning Representations (ICLR), 2016

2016

[55] [55]

Addressing function approximation error in actor-critic methods

Scott Fujimoto, Herke van Hoof, and David Meger. Addressing function approximation error in actor-critic methods. InInternational Conference on Machine Learning (ICML), 2018

2018

[56] [56]

Venables

Frederick van der Ploeg and Anthony J. Venables. Harnessing windfall revenues: Optimal policies for resource-rich developing economies.Economic Journal, 121(551):1–30, 2011. doi: 10.1111/j.1468-0297.2010.02411.x. 21

work page doi:10.1111/j.1468-0297.2010.02411.x 2011