pith. sign in

arxiv: 2606.02614 · v1 · pith:RFFOQXPDnew · submitted 2026-05-26 · 💻 cs.CE · cs.AI

Margin Play: A Multi-Agent System For Public Policy Analysis In The Brazilian Equatorial Margin

Pith reviewed 2026-07-01 15:51 UTC · model grok-4.3

classification 💻 cs.CE cs.AI
keywords multi-agent reinforcement learningpublic policyoil explorationBrazilian Equatorial Marginwelfare analysisMARLpolicy simulation
0
0 comments X

The pith

The central policy question for Brazilian Equatorial Margin oil exploration is resolved by choosing the right public policy regime rather than balancing production against welfare.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a multi-agent reinforcement learning system to model the interactions between the federal government, the state of Maranhão, oil operators, regulators, and local communities regarding oil exploration in the Brazilian Equatorial Margin. It shows that net positive externalities for the state depend on the institutional regime chosen for the exploration. A reader would care because it suggests that policy design can achieve better welfare outcomes without increasing production volumes. The results indicate marginal welfare under baseline but significant gains under an alternative configuration with reduced environmental impact.

Core claim

The paper claims that the fundamental problem is not a trade-off between production and welfare, but the choice of public policy regime linked to exploration. Using a six-agent MARL model calibrated to Brazilian data, the reference baseline yields marginal welfare gain while the MA-Prospero configuration produces a 17.5% increase in welfare and 21.3% in community revenue with lower environmental liability.

What carries the argument

Margin Play, the multi-agent reinforcement learning system with six agents under the centralized training with decentralized execution paradigm trained using BRO-MARL.

If this is right

  • Under baseline conditions, welfare gains from exploration are marginal at approximately 1.68.
  • The MA-Prospero policy configuration increases welfare by 17.5% and community revenue by 21.3%.
  • This configuration also reduces environmental liability from 0.076 to 0.048.
  • The outcomes depend on how royalties are earmarked and how agent incentives are aligned.
  • Exploration can generate net positive externalities for the state under the right regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar multi-agent models could be applied to other resource extraction frontiers to test policy regimes.
  • Empirical data from actual exploration starting in 2026 could validate or refute the simulation results.
  • The approach highlights the value of modeling conflicting mandates between agencies like ANP and IBAMA.
  • Extending the model to include more dynamic economic variables might reveal additional policy levers.

Load-bearing premise

The six-agent MARL model under CTDE with BRO-MARL training, calibrated to Brazilian empirical data, produces outputs that reflect real agent incentives and outcomes.

What would settle it

Collecting data on actual welfare changes, royalty distributions, and environmental impacts in Maranhão after exploration begins in 2026 and comparing them to the model's predictions for different policy scenarios.

Figures

Figures reproduced from arXiv: 2606.02614 by Allan Kardec Duailibe Barros Filho, Antonio de Sousa Leit\~ao Filho, Dennys Correia da Silva, Fabr\'icio Saul Lima, Lu\'is Jorge Mesquita de Jesus, Rejani Bandeira Vieira Sousa, Selby Mykael Lima dos Santos.

Figure 1
Figure 1. Figure 1: Map of the Brazilian Equatorial Margin (BEM) with the five offshore sedimentary basins (Foz do Amazonas, [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: System architecture of the Margin Play framework under the CTDE paradigm. Six institutional agents interact [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Operational sequence of a Margin Play episode. Input stage (steps 1–2), 15-step biennial time loop (steps [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: presents the empirical response to the central question stated in section 1.1. The left panel shows aggregate welfare Waval by scenario; the right panel shows the converged return of the Community agent, a direct metric of the zonal welfare of Amazonian populations [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Learning curves by agent (200-episode moving average) for the six scenarios. The six panels correspond to [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Convergence diagnostics: rolling σ (window 500) on a logarithmic scale, by agent and by scenario. No collapse or divergence is observed. Scenarios with lower economic activity (pessimistic, MA-Próspero) achieve deeper convergence [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Macroeconomic trajectories: W (welfare), Eamb (environmental liability) and R (reserves). The red dashed line at Eamb = 0.20 marks the Frade-Chevron regulatory threshold. The MA-Próspero regime combines the highest W with the lowest Eamb, refuting the hypothesis of a structural trade-off between production and welfare. maturity adopted in this regime. This finding refutes the hypothesis of a structural tra… view at source ↗
Figure 8
Figure 8. Figure 8: Amazonian communities welfare. Left: convergence of Community agent return during training. Right: empirical distribution over the last 1,000 episodes. The MA-Próspero regime yields ∆Rcom = +21.3% relative to the reference baseline. 4.5 IBAMA and Environmental Liability The analysis of the IBAMA agent is methodologically significant because it allows us to assess to what extent the procedural-environmental… view at source ↗
Figure 9
Figure 9. Figure 9: IBAMA agent behaviour and environmental liability [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Empirical return distributions (last 1,000 episodes), decomposed by agent and scenario. [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Mean Q¯ target by scenario and agent (last 1,000 episodes). Rows: six scenarios; columns: six agents. MA-Próspero redirects oil revenue from federal collection to regional social capital, expressing itself in high Q-values in the State Gov. and Community agents. additive aggregation of isolated effects, but from the interaction between interventions acting on the revenue capture structure — through the el… view at source ↗
Figure 12
Figure 12. Figure 12: Synthesis of the MA-Próspero regime: six structural levers combined. The interpretation is multiplicative: [PITH_FULL_IMAGE:figures/full_fig_p015_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Counterfactual effect of each scenario relative to the reference ( [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗
read the original abstract

The Brazilian Equatorial Margin (BEM) is Brazil's next offshore oil frontier, with operations expected to begin in 2026 in the Foz do Amazonas basin. Its assets are fiscally and territorially linked primarily to Maranhao -- the state with the lowest HDI in the Federation (0.676, IBGE 2022). This raises the central policy question: under what conditions does BEM exploration generate net positive externalities for Maranhao? The problem is intrinsically multi-agent: the Federal Government seeks revenue and energy security; the state seeks regional welfare under constitutional royalty earmarking; the operator maximizes profit under risk; ANP and IBAMA hold conflicting mandates; and Amazonian communities prioritize territorial and environmental vectors over monetary income. We present Margin Play, a Multi-Agent Reinforcement Learning (MARL) system simulating these tensions under Brazilian empirical calibration and classical economic literature. It implements six agents under the CTDE paradigm, trained with BRO-MARL. Results from 60,000 episodes across six scenarios indicate the answer is conditional on the institutional regime: under the reference baseline, the welfare gain is marginal (Waval approx. 1.68), whereas the MA-Prospero configuration yields Delta W = +17.5% and Delta Rcom = +21.3%, with a lower environmental liability (Eamb = 0.048 vs. 0.076). The fundamental problem is not a trade-off between production and welfare, but the choice of public policy regime linked to exploration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Margin Play, a six-agent MARL system under the CTDE paradigm trained with BRO-MARL, to simulate policy tensions around oil exploration in Brazil's Equatorial Margin (Foz do Amazonas basin). Agents represent the Federal Government, Maranhão state, operators, ANP/IBAMA, and Amazonian communities, with rewards calibrated to Brazilian data and economic literature. Across 60,000 episodes and six scenarios, the central claim is that net positive welfare externalities for Maranhão (lowest-HDI state) are conditional on institutional regime rather than an inherent production-welfare trade-off; the MA-Prospero regime yields Delta W = +17.5%, Delta Rcom = +21.3%, and Eamb = 0.048 (vs. baseline Waval ~1.68 and Eamb 0.076).

Significance. If the simulation outputs can be shown to reflect observable incentives and produce falsifiable predictions, the work would offer a novel computational framework for multi-stakeholder resource policy analysis, extending classical economic models of royalty earmarking and environmental liability into a dynamic, game-theoretic setting with potential applicability to other frontier basins.

major comments (3)
  1. [Abstract] Abstract: the reported quantitative outcomes (Waval ~1.68, Delta W +17.5%, Eamb 0.048 vs. 0.076) rest on unvalidated simulation outputs; no error bars, sensitivity analysis on reward weights, out-of-sample historical matching, or calibration diagnostics are supplied, so it is impossible to determine whether the MA-Prospero superiority is an artifact of reward-function specification rather than evidence against a production-welfare trade-off.
  2. [Abstract] Abstract (and implied § on agent design): the claim that the six-agent model is 'calibrated to Brazilian empirical data and classical economic literature' is load-bearing for the regime-choice conclusion, yet no explicit functional forms, parameter values, or validation steps for the Federal revenue, state welfare (earmarking), operator profit, ANP/IBAMA mandate, or community territorial reward functions are provided; without these, the equilibrium behaviors cannot be assessed for realism.
  3. [Abstract] Abstract: the assertion that 'the fundamental problem is not a trade-off between production and welfare, but the choice of public policy regime' is directly inferred from the scenario deltas, but the manuscript supplies no robustness checks (e.g., alternative reward weightings or stochastic shock tests) that would falsify this interpretation if the environmental-liability term or state welfare weight were misspecified.
minor comments (2)
  1. [Abstract] Abstract: numeric values are given without units or precise definitions (e.g., what exactly is 'Waval', 'Eamb', or the baseline against which +17.5% is measured).
  2. The manuscript should include a table or appendix listing the exact reward-function equations and the empirical sources used for each agent's parameters.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive critique emphasizing validation, explicit calibration details, and robustness. These points are essential for establishing the simulation's credibility in policy analysis. We address each major comment below and will revise the manuscript to incorporate the requested elements.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported quantitative outcomes (Waval ~1.68, Delta W +17.5%, Eamb 0.048 vs. 0.076) rest on unvalidated simulation outputs; no error bars, sensitivity analysis on reward weights, out-of-sample historical matching, or calibration diagnostics are supplied, so it is impossible to determine whether the MA-Prospero superiority is an artifact of reward-function specification rather than evidence against a production-welfare trade-off.

    Authors: We agree that the abstract and results presentation would be strengthened by explicit validation diagnostics. The methods section describes calibration against IBGE, ANP, and IBAMA data sources plus literature on fiscal federalism, but we did not report error bars across seeds or sensitivity to reward weights. In revision we will add a new robustness subsection with: (i) standard errors from 10 independent training runs, (ii) sensitivity sweeps on the environmental-liability and state-welfare weights (±20 %), and (iii) an out-of-sample comparison of simulated baseline royalty flows against observed distributions in the Campos and Santos basins. These additions will allow readers to evaluate whether regime superiority is robust. revision: yes

  2. Referee: [Abstract] Abstract (and implied § on agent design): the claim that the six-agent model is 'calibrated to Brazilian empirical data and classical economic literature' is load-bearing for the regime-choice conclusion, yet no explicit functional forms, parameter values, or validation steps for the Federal revenue, state welfare (earmarking), operator profit, ANP/IBAMA mandate, or community territorial reward functions are provided; without these, the equilibrium behaviors cannot be assessed for realism.

    Authors: The full manuscript (Section 3.2 and Appendix A) supplies the reward equations and parameter sources, but we acknowledge they are not presented in a single consolidated table. We will revise by expanding Appendix A with: (i) the exact functional forms for each of the six agents (Federal revenue = royalty share + corporate tax; state welfare = earmarked royalty fraction × HDI-weighted multiplier; operator profit = revenue – costs – liability; ANP/IBAMA dual mandate; community territorial utility), (ii) a table of all numerical parameters with citations, and (iii) a short validation paragraph comparing baseline equilibrium statistics to historical royalty earmarking outcomes reported by the National Treasury. revision: yes

  3. Referee: [Abstract] Abstract: the assertion that 'the fundamental problem is not a trade-off between production and welfare, but the choice of public policy regime' is directly inferred from the scenario deltas, but the manuscript supplies no robustness checks (e.g., alternative reward weightings or stochastic shock tests) that would falsify this interpretation if the environmental-liability term or state welfare weight were misspecified.

    Authors: We concur that the interpretive claim requires explicit falsification tests. The six scenarios vary institutional rules while holding reward weights fixed. In the revised version we will add two new experiment sets: (i) re-training under ±15 % and ±30 % perturbations to the environmental-liability and state-welfare coefficients, and (ii) stochastic oil-price shocks drawn from historical volatility. We will report whether the MA-Prospero regime retains its welfare and environmental advantages under these perturbations; if the ranking reverses under plausible misspecifications we will qualify the conclusion accordingly. revision: yes

Circularity Check

0 steps flagged

No significant circularity; simulation outputs provide independent content

full rationale

The paper derives its central claim—that the problem reduces to institutional regime choice rather than a production-welfare trade-off—from the numerical outputs of a six-agent CTDE/BRO-MARL simulation run across six explicitly defined scenarios. The abstract states that the model is calibrated to Brazilian empirical data and classical literature, then reports specific deltas (Waval ≈1.68, Delta W +17.5%, Eamb 0.048) as evidence. No equations, reward-function definitions, or self-citations are supplied that would reduce these outputs to the inputs by construction. The derivation chain therefore remains self-contained: the simulation constitutes an independent computational experiment whose results can be checked against external data or alternative calibrations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities listed. Model calibration to 'Brazilian empirical data' and 'classical economic literature' is invoked but not detailed.

pith-pipeline@v0.9.1-grok · 5854 in / 1017 out tokens · 26495 ms · 2026-07-01T15:51:05.506933+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 17 canonical work pages

  1. [1]

    11ª rodada de licitações e 5º ciclo de ofertas permanentes — documentos editais e histórico de arrematação MEB

    Agência Nacional do Petróleo, Gás Natural e Biocombustíveis (ANP). 11ª rodada de licitações e 5º ciclo de ofertas permanentes — documentos editais e histórico de arrematação MEB. Technical report, ANP, Brasília, Brazil, 2025

  2. [2]

    Parecer FZA-M-59 — recusa de licença de perfuração na bacia da foz do amazonas

    IBAMA — Diretoria de Licenciamento Ambiental (DILIC). Parecer FZA-M-59 — recusa de licença de perfuração na bacia da foz do amazonas. Technical Report Documento Técnico nº 02/2023, IBAMA, Brasília, Brazil, 2023

  3. [3]

    NDC atualizada — acordo de Paris

    Ministério do Meio Ambiente e Mudança do Clima (MMA). NDC atualizada — acordo de Paris. meta −59% emissões até 2035. Technical report, MMA, Brasília, Brazil, 2024

  4. [4]

    Constituição da república federativa do brasil, 1988

    Brasil. Constituição da república federativa do brasil, 1988. Promulgada em 5 de outubro de 1988

  5. [5]

    Resolução ANP nº 882, de 25 de maio de 2022 — programa de segurança operacional (PSO)

    Agência Nacional do Petróleo, Gás Natural e Biocombustíveis (ANP). Resolução ANP nº 882, de 25 de maio de 2022 — programa de segurança operacional (PSO). Technical report, ANP, Brasília, Brazil, 2022

  6. [6]

    Manav, R

    Frederick van der Ploeg. Why do many resource-rich countries have negative genuine saving? Anticipation of better times or rapacious rent seeking.Resource and Energy Economics, 32(1):28–44, 2010. doi: 10.1016/j. reseneeco.2009.07.001

  7. [7]

    Max Corden and J

    W. Max Corden and J. Peter Neary. Booming sector and de-industrialisation in a small open economy.Economic Journal, 92(368):825–848, 1982. doi: 10.2307/2232670

  8. [8]

    Conflitos no campo Brasil 2024

    Comissão Pastoral da Terra (CPT). Conflitos no campo Brasil 2024. Technical report, CPT Nacional, Goiânia, Brazil, 2025

  9. [9]

    Violência contra os povos indígenas no Brasil — dados de 2024

    Conselho Indigenista Missionário (CIMI). Violência contra os povos indígenas no Brasil — dados de 2024. Technical report, CIMI, Brasília, Brazil, 2025

  10. [10]

    Roland Hodler, Michael Lechner, and Paul A. Raschky. Institutions and the resource curse: New insights from causal machine learning.PLOS ONE, 18(6):e0284968, 2023. doi: 10.1371/journal.pone.0284968

  11. [11]

    Robert E. Lucas. Econometric policy evaluation: A critique. In Karl Brunner and Allan H. Meltzer, editors,The Phillips Curve and Labor Markets, volume 1 ofCarnegie-Rochester Conference Series on Public Policy, pages 19–46. North-Holland, Amsterdam, 1976

  12. [12]

    Is public expenditure productive?Journal of Monetary Economics, 23(2):177–200, 1989

    David Alan Aschauer. Is public expenditure productive?Journal of Monetary Economics, 23(2):177–200, 1989. doi: 10.1016/0304-3932(89)90047-0

  13. [13]

    Form 20-F filing with the U.S

    Petrobras. Form 20-F filing with the U.S. securities and exchange commission. Technical report, Petrobras, Rio de Janeiro, Brazil, 2024

  14. [14]

    Robinson.Why Nations Fail: The Origins of Power, Prosperity, and Poverty

    Daron Acemoglu and James A. Robinson.Why Nations Fail: The Origins of Power, Prosperity, and Poverty. Crown Business, New York, 2012

  15. [15]

    Lei nº 12.858, de 9 de setembro de 2013 — vinculação de royalties (75% educação, 25% saúde), 2013

    Brasil. Lei nº 12.858, de 9 de setembro de 2013 — vinculação de royalties (75% educação, 25% saúde), 2013. 19 Margin Play: MARL for Public Policy Analysis in the BEMA PREPRINT

  16. [16]

    Bigger, regularized, optimistic: Scaling for compute-efficient continuous control

    Michał Nauman, Maciej Ostaszewski, Krzysztof Jankowski, Piotr Miło´s, and Mateusz Cygan. Bigger, regularized, optimistic: Scaling for compute-efficient continuous control. InAdvances in Neural Information Processing Systems (NeurIPS), 2024

  17. [17]

    Multi-agent actor-critic for mixed cooperative-competitive environments

    Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. InAdvances in Neural Information Processing Systems (NIPS), 2017

  18. [18]

    Controlling overestimation bias with truncated mixture of continuous distributional quantile critics

    Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, and Dmitry Vetrov. Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. InInternational Conference on Machine Learning (ICML), 2020

  19. [19]

    Cambridge University Press, Cambridge, 2009

    Yoav Shoham and Kevin Leyton-Brown.Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, Cambridge, 2009. doi: 10.1017/CBO9780511811654

  20. [20]

    Parkes, and Richard Socher

    Stephan Zheng, Alexander Trott, Sunil Srinivasa, David C. Parkes, and Richard Socher. The AI Economist: Taxation policy design via two-level deep multiagent reinforcement learning.Science Advances, 8(18):eabk2607,

  21. [21]

    doi: 10.1126/sciadv.abk2607

  22. [22]

    Revealing robust oil and gas company macro-strategies using deep multi-agent reinforcement learning, 2022

    Dylan Radovic, Lucas Kruitwagen, Christian Schroeder de Witt, Ben Caldecott, Shane Tomlinson, and Mark Workman. Revealing robust oil and gas company macro-strategies using deep multi-agent reinforcement learning, 2022

  23. [23]

    PolicySpace2: Modeling markets and endogenous public policies.Journal of Artificial Societies and Social Simulation, 25(1):8, 2022

    Bernardo Alves Furtado. PolicySpace2: Modeling markets and endogenous public policies.Journal of Artificial Societies and Social Simulation, 25(1):8, 2022. doi: 10.18564/jasss.4742

  24. [24]

    Lei nº 9.478, de 6 de agosto de 1997 — lei do petróleo, 1997

    Brasil. Lei nº 9.478, de 6 de agosto de 1997 — lei do petróleo, 1997

  25. [25]

    Ações diretas de inconstitucionalidade nº 4.916, 4.917, 4.918 e 4.920 — distribuição de royalties da lei 12.734/2012

    Supremo Tribunal Federal (STF). Ações diretas de inconstitucionalidade nº 4.916, 4.917, 4.918 e 4.920 — distribuição de royalties da lei 12.734/2012. medida cautelar deferida em 18 de março de 2013, rel. min. cármen lúcia, suspendendo a eficácia da lei 12.734/2012. julgamento de mérito iniciado em 6–7 de maio de 2026; voto da relatora pela inconstituciona...

  26. [26]

    Duke University Press, Durham, NC, 2008

    Arturo Escobar.Territories of Difference: Place, Movements, Life, Redes. Duke University Press, Durham, NC, 2008

  27. [27]

    NAEA/UFPA, Belém, Brazil, 2008

    Alfredo Wagner Berno de Almeida.Terras Tradicionalmente Ocupadas: Processos de Territorialização e Movimentos Sociais. NAEA/UFPA, Belém, Brazil, 2008

  28. [28]

    King Hubbert

    M. King Hubbert. Nuclear energy and the fossil fuels. InDrilling and Production Practice. American Petroleum Institute, 1956

  29. [29]

    Adam R. Brandt. Review of mathematical models of future oil supply.Energy, 35(9):3958–3974, 2010. doi: 10.1016/j.energy.2010.04.011

  30. [30]

    Acórdão 2.936/2021 — plenário — auditoria operacional ANP, gargalos de aprovação de planos de desenvolvimento

    Tribunal de Contas da União (TCU). Acórdão 2.936/2021 — plenário — auditoria operacional ANP, gargalos de aprovação de planos de desenvolvimento. Technical report, TCU, Brasília, Brazil, 2021

  31. [31]

    Alicia H. Munnell. Why has productivity growth declined? Productivity and public investment.New England Economic Review, pages 3–22, 1990. January/February

  32. [32]

    Robert E. Lucas. On the mechanics of economic development.Journal of Monetary Economics, 22(1):3–42, 1988. doi: 10.1016/0304-3932(88)90168-7

  33. [33]

    On the concept of health capital and the demand for health.Journal of Political Economy, 80 (2):223–255, 1972

    Michael Grossman. On the concept of health capital and the demand for health.Journal of Political Economy, 80 (2):223–255, 1972. doi: 10.1086/259880

  34. [34]

    Cobb and Paul H

    Charles W. Cobb and Paul H. Douglas. A theory of production.American Economic Review, 18(1):139–165, 1928

  35. [35]

    Ravikumar

    Gerhard Glomm and B. Ravikumar. Public versus private investment in human capital: Endogenous growth and income inequality.Journal of Political Economy, 100(4):818–834, 1992. doi: 10.1086/261841

  36. [36]

    North.Institutions, Institutional Change and Economic Performance

    Douglass C. North.Institutions, Institutional Change and Economic Performance. Cambridge University Press, Cambridge, 1990. doi: 10.1017/CBO9780511808678

  37. [37]

    Elsevier- Campus, Rio de Janeiro, Brazil, 2014

    Marcos Mendes.Por Que o Brasil Cresce Pouco? Desigualdade, Democracia e Baixo Crescimento. Elsevier- Campus, Rio de Janeiro, Brazil, 2014

  38. [38]

    Sistema FINBRAS — receitas estaduais detalhadas

    Secretaria do Tesouro Nacional (STN). Sistema FINBRAS — receitas estaduais detalhadas. Technical report, STN, Brasília, Brazil, 2024. 20 Margin Play: MARL for Public Policy Analysis in the BEMA PREPRINT

  39. [39]

    IPEAdata — séries de PIB, FBKF estadual, deflatores

    Instituto de Pesquisa Econômica Aplicada (IPEA). IPEAdata — séries de PIB, FBKF estadual, deflatores. Technical report, IPEA, Brasília, Brazil, 2024

  40. [40]

    Lei complementar nº 91, de 22 de dezembro de 1997 — coeficientes do FPE, 1997

    Brasil. Lei complementar nº 91, de 22 de dezembro de 1997 — coeficientes do FPE, 1997

  41. [41]

    Emenda constitucional nº 108, de 26 de agosto de 2020 — FUNDEB permanente, 2020

    Brasil. Emenda constitucional nº 108, de 26 de agosto de 2020 — FUNDEB permanente, 2020

  42. [42]

    John W. Pratt. Risk aversion in the small and in the large.Econometrica, 32(1–2):122–136, 1964. doi: 10.2307/1913738

  43. [43]

    Atkinson

    Anthony B. Atkinson. On the measurement of inequality.Journal of Economic Theory, 2(3):244–263, 1970. doi: 10.1016/0022-0531(70)90039-6

  44. [44]

    Electoral manipulation via voter-friendly spending.Journal of Development Economics, 92(1):39–52, 2010

    Allan Drazen and Marcela Eslava. Electoral manipulation via voter-friendly spending.Journal of Development Economics, 92(1):39–52, 2010. doi: 10.1016/j.jdeveco.2009.01.010

  45. [45]

    Equilibrium political budget cycles.American Economic Review, 80(1):21–36, 1990

    Kenneth Rogoff. Equilibrium political budget cycles.American Economic Review, 80(1):21–36, 1990

  46. [46]

    MIT Press, Cambridge, MA, 2000

    Torsten Persson and Guido Tabellini.Political Economics: Explaining Economic Policy. MIT Press, Cambridge, MA, 2000

  47. [47]

    Vicki M. Bier. Implications of the research on expert overconfidence and dependence.Reliability Engineering & System Safety, 85(1–3):321–329, 2004. doi: 10.1016/j.ress.2004.03.020

  48. [48]

    Ação civil pública — caso frade/chevron 2011/RJ, 2011

    Procuradoria-Geral da República. Ação civil pública — caso frade/chevron 2011/RJ, 2011

  49. [49]

    Deep water: The gulf oil disaster and the future of offshore drilling

    National Commission on the BP Deepwater Horizon Oil Spill and Offshore Drilling. Deep water: The gulf oil disaster and the future of offshore drilling. Technical report, U.S. Government Printing Office, Washington, DC, 2011

  50. [50]

    Lei nº 6.938, de 31 de agosto de 1981 — política nacional do meio ambiente (PNMA), 1981

    Brasil. Lei nº 6.938, de 31 de agosto de 1981 — política nacional do meio ambiente (PNMA), 1981

  51. [51]

    SIPRA — sistema de informação de projetos de reforma agrária

    Instituto Nacional de Colonização e Reforma Agrária (INCRA). SIPRA — sistema de informação de projetos de reforma agrária. Technical report, INCRA, Brasília, Brazil, 2024

  52. [52]

    No atual ritmo, Brasil levará 2.188 anos para titular todos os territórios quilombolas com processos no INCRA, 2024

    Terra de Direitos. No atual ritmo, Brasil levará 2.188 anos para titular todos os territórios quilombolas com processos no INCRA, 2024

  53. [53]

    Bellemare, Will Dabney, and Rémi Munos

    Marc G. Bellemare, Will Dabney, and Rémi Munos. A distributional perspective on reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, ICML, pages 449–458. PMLR, 2017

  54. [54]

    Lillicrap, Jonathan J

    Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. InInternational Conference on Learning Representations (ICLR), 2016

  55. [55]

    Addressing function approximation error in actor-critic methods

    Scott Fujimoto, Herke van Hoof, and David Meger. Addressing function approximation error in actor-critic methods. InInternational Conference on Machine Learning (ICML), 2018

  56. [56]

    Venables

    Frederick van der Ploeg and Anthony J. Venables. Harnessing windfall revenues: Optimal policies for resource-rich developing economies.Economic Journal, 121(551):1–30, 2011. doi: 10.1111/j.1468-0297.2010.02411.x. 21