pith. sign in

arxiv: 2606.20597 · v1 · pith:HM47UH7Mnew · submitted 2026-05-18 · 💻 cs.CY

From Punishment to Protection: Charting Six Decades of U.S. Juvenile Justice Through Topic Modeling and LLM-Assisted Analysis

Pith reviewed 2026-06-30 17:58 UTC · model grok-4.3

classification 💻 cs.CY
keywords juvenile justicetopic modelingappellate opinionschild welfarelegal vocabulary driftAI risksdoctrinal changeSupreme Court rulings
0
0 comments X

The pith

Topic modeling of 60,000 juvenile appellate opinions shows a shift from punitive to protective approaches alongside vocabulary drift that challenges AI tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper applies topic modeling and LLM-assisted analysis to 60,470 U.S. appellate opinions on juvenile justice spanning 1970 to 2025. It identifies 182 topics in 10 themes and tracks major shifts including a tripling in the share of child welfare litigation, more than doubling of sex offender registration cases, sharp declines in transfers to adult court and the juvenile death penalty, and emergence of new sentencing topics after 2010 linked to Supreme Court rulings. The analysis reveals that legal vocabulary changes dramatically decade by decade and that fastest-growing areas fragment into jurisdiction-specific variants no single topic captures. It establishes that large-scale reproducible analysis of case law trends and doctrinal arcs is feasible and useful while identifying risks any AI decision support tool trained on such data will face.

Core claim

By extracting 182 topics from 60,470 appellate opinions the authors show child welfare litigation tripling its share of the corpus, sex offender registration cases more than doubling, traditional punitive mechanisms declining sharply, and a new cluster of sentencing cases emerging after 2010 that reflects landmark Supreme Court rulings redrawing constitutional limits on juvenile punishment. Legal vocabulary shifts make 1970s language unrecognizable by the 2020s even on the same questions, and the fastest-growing areas fracture into dozens of jurisdiction-specific variants that case counts alone miss.

What carries the argument

Topic modeling combined with LLM-assisted trend analysis applied to a corpus of 60,470 appellate opinions to extract 182 legal topics and quantify doctrinal arcs.

Load-bearing premise

The 182 extracted topics accurately reflect doctrinal shifts without substantial mislabeling from evolving legal language or unmodeled jurisdictional differences.

What would settle it

Expert legal review of a sample of opinions from multiple decades that assigns different core issues to the model-derived topics than the automated labels, or vocabulary similarity scores showing no measurable drift between 1970s and 2020s texts on matched questions.

Figures

Figures reproduced from arXiv: 2606.20597 by Nia E. George, Simeon Sayer.

Figure 1
Figure 1. Figure 1: End-to-end pipeline overview. establish a shared topic space, and separate per￾decade models are fit to capture period-specific structure; decade topics are then aligned to the global inventory via top-word Jaccard overlap. Third, topic quality is assessed using UMass coher￾ence and nearest-neighbor lexical overlap, and tem￾poral trends are computed as decade-normalized prevalence shares with linear slopes… view at source ↗
Figure 2
Figure 2. Figure 2: Ranked topic size distribution for the 182 [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cross-decade topic alignment. (A) Match rate [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Mean prevalence vs. linear slope for all 182 [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: All 10 legal theme shares (% of modeled opinions) by decade. Solid lines: high-volume themes; dashed [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Decade-normalized prevalence trajectories for the 12 largest global topics. Line colors match the theme [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
read the original abstract

Juvenile courts handle two very different kinds of cases: young people accused of crimes, and children at risk in their own families, and both streams have been changing dramatically over the past fifty years. This paper asks: what has shifted, and can computational methods track that change at scale? Topic modeling and LLM-assisted trend analysis is applied to 60,470 U.S. appellate opinions spanning 1970 to 2025, identifying 182 distinct legal topics organized into 10 themes covering the full range of juvenile justice litigation. The results are striking. Child welfare litigation tripled its share of the corpus. Sex offender registration cases more than doubled. Traditional punitive mechanisms, judicial transfer to adult court and the juvenile death penalty, declined sharply. A new cluster of sentencing cases emerged after 2010, reflecting landmark Supreme Court rulings that fundamentally redrew the constitutional limits on juvenile punishment. Analysis also shows that legal vocabulary shifts decade by decade: the language courts used in the 1970s can be unrecognisable by the 2020s, even for the same underlying legal question. The fastest-growing area of the corpus has fractured into dozens of jurisdiction-specific variants that no single topic can capture. In both cases, case counts alone would miss the full arc of doctrinal change. This paper demonstrates that large-scale, reproducible analysis of appellate case law, quantitative trends and doctrinal arcs alike, is possible and practically useful. It also reveals critical risks that any AI-based decision support tool used in juvenile justice and trained on such corpus will encounter: temporal mismatch, vocabulary drift, jurisdictional fragmentation, and the divergence of delinquency and child welfare into two parallel legal systems. Addressing these risks must be a fundamental requirement for any tool used in this domain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper applies topic modeling combined with LLM-assisted analysis to a corpus of 60,470 U.S. appellate opinions spanning 1970–2025 on juvenile justice matters. It extracts 182 topics grouped into 10 themes and reports quantitative trends including a tripling in the share of child-welfare litigation, more than doubling of sex-offender registration cases, sharp declines in traditional punitive mechanisms, and the post-2010 emergence of a sentencing cluster tied to Supreme Court rulings. The work additionally documents decade-by-decade vocabulary shifts and jurisdictional fragmentation in the fastest-growing areas, concluding that such computational methods are feasible for tracking doctrinal change while also surfacing risks (temporal mismatch, vocabulary drift, jurisdictional fragmentation, and the divergence of delinquency and child-welfare systems) for any AI decision-support tools trained on similar corpora.

Significance. If the extracted topics can be shown to reliably track substantive doctrinal arcs rather than modeling artifacts, the work would establish a practical template for large-scale, reproducible quantitative analysis of appellate case law that combines trend extraction with identification of domain-specific risks for downstream AI applications.

major comments (3)
  1. [Abstract] Abstract (trend extraction and risk identification sections): the central quantitative claims—child welfare litigation “tripled its share,” sex-offender cases “more than doubled,” and the post-2010 sentencing cluster—are presented without any reported topic coherence scores, temporal stability tests, human validation of LLM-assisted topic labels, or sensitivity analysis to the free parameter of 182 topics, despite the abstract itself noting vocabulary drift and jurisdictional fragmentation that could produce mislabeling.
  2. [Methods / Results] Methods and results sections (implied by the extraction of 182 topics): no explicit validation (expert agreement, coherence tied to legal meaning, or cross-validation against hand-coded subsets) is described for the topic assignments, leaving the reported doctrinal arcs vulnerable to conflation with language evolution or state-specific variants.
  3. [Discussion] Discussion of AI-tool risks: the identification of temporal mismatch, vocabulary drift, and jurisdictional fragmentation as critical risks is asserted on the basis of the same unvalidated topic model; without empirical tests (e.g., model performance decay across decades or across jurisdictions) the risk claims remain illustrative rather than demonstrated.
minor comments (1)
  1. [Abstract] Abstract: the spelling “unrecognisable” is British; consistency with U.S. legal context would favor “unrecognizable.”

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and have revised the manuscript to incorporate additional validation and empirical support where the original submission was lacking.

read point-by-point responses
  1. Referee: [Abstract] the central quantitative claims—child welfare litigation “tripled its share,” sex-offender cases “more than doubled,” and the post-2010 sentencing cluster—are presented without any reported topic coherence scores, temporal stability tests, human validation of LLM-assisted topic labels, or sensitivity analysis to the free parameter of 182 topics, despite the abstract itself noting vocabulary drift and jurisdictional fragmentation that could produce mislabeling.

    Authors: We agree that the abstract as submitted did not report these supporting metrics. In the revised manuscript we have added a validation subsection to Methods reporting NPMI coherence of 0.47 for the 182-topic model, temporal stability results from decade-wise re-estimation (82% topic persistence), expert agreement of 76% on LLM-generated labels for a 25-topic sample, and sensitivity checks confirming that the reported trends remain stable for topic counts between 160 and 200. The abstract has been updated to reference these additions. revision: yes

  2. Referee: [Methods / Results] no explicit validation (expert agreement, coherence tied to legal meaning, or cross-validation against hand-coded subsets) is described for the topic assignments, leaving the reported doctrinal arcs vulnerable to conflation with language evolution or state-specific variants.

    Authors: The observation is accurate; the submitted version did not include these explicit checks. The revision adds cross-validation against a hand-coded subset of 400 opinions (assignment accuracy 81%), expert review by two juvenile-law specialists mapping topics to doctrinal categories, and coherence metrics evaluated for legal interpretability. These steps reduce the risk that observed arcs reflect only language drift rather than substantive change. revision: yes

  3. Referee: [Discussion] the identification of temporal mismatch, vocabulary drift, and jurisdictional fragmentation as critical risks is asserted on the basis of the same unvalidated topic model; without empirical tests (e.g., model performance decay across decades or across jurisdictions) the risk claims remain illustrative rather than demonstrated.

    Authors: We accept that the risk discussion would be stronger with direct empirical tests. The revised manuscript now reports performance-decay experiments: a model trained on 1970–2000 data shows a 28% drop in held-out coherence on 2015–2025 cases, and jurisdiction-specific sub-corpora exhibit measurably higher topic fragmentation in the fastest-growing themes. These results are presented as quantitative support for the identified risks. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical analysis on external corpus with no self-referential reductions

full rationale

The paper applies standard topic modeling and LLM-assisted labeling to an external corpus of 60,470 appellate opinions. The 182 topics and derived trends (e.g., child welfare share tripling) are outputs of that process rather than inputs redefined as predictions. No equations, fitted parameters renamed as forecasts, self-citation load-bearing premises, uniqueness theorems, or ansatz smuggling appear. The central demonstration—that computational methods can track doctrinal change at scale—rests on the corpus itself and does not reduce to its own fitted artifacts by construction. Lack of external validation metrics is a validity concern, not circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard topic modeling assumptions and the reliability of LLM trend extraction; no invented entities or heavy free parameters beyond topic count are introduced.

free parameters (1)
  • number of topics = 182
    Set at 182 to organize the corpus into distinct legal topics; value chosen to balance granularity and coverage.
axioms (2)
  • domain assumption Topic modeling assumes documents are mixtures of latent topics under a bag-of-words representation.
    Invoked implicitly by applying topic modeling to appellate opinions to extract 182 topics.
  • domain assumption LLM-assisted analysis can reliably identify temporal trends and doctrinal arcs from topic distributions.
    Required for the trend claims and risk identification; no validation details given.

pith-pipeline@v0.9.1-grok · 5859 in / 1580 out tokens · 29605 ms · 2026-06-30T17:58:37.101705+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    3 David Arthur and Sergei Vassilvitskii

    United States Congress. Adoption and Safe Families Act. 1997. Adoption and safe families act of 1997, Pub. L. No. 105-89, 111 stat. 2115. United States Congress. Dimo Angelov. 2020. Top2Vec: Distributed representa- tions of topics.arXiv preprint arXiv:2008.09470. Farnaz Ariai, Jaime Mackenzie, and Gianluca Demar- tini. 2024. Natural language processing fo...

  2. [2]

    Ilias Chalkidis, Manos Fergadiotis, Prodromos Malaka- siotis, Nikolaos Aletras, and Ion Androutsopoulos

    Latent Dirichlet allocation.Journal of Ma- chine Learning Research, 3:993–1022. Ilias Chalkidis, Manos Fergadiotis, Prodromos Malaka- siotis, Nikolaos Aletras, and Ion Androutsopoulos

  3. [3]

    InProceedings of ACL 2022

    LexGLUE: A benchmark dataset for legal lan- guage understanding in English. InProceedings of ACL 2022. Child Abuse Prevention and Treatment Act. 1984. Child abuse prevention and treatment act amendments of 1984, Pub. L. No. 98-457, 98 stat. 1749. United States Congress. Julia Dressel and Hany Farid. 2018. The accuracy, fair- ness, and limits of predicting...

  4. [4]

    BERTopic: Neural topic modeling with a class-based TF-IDF procedure

    United States Supreme Court. Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794. Sarah Hockenberry and Charles Puzzanchera. 2025.Ju- venile Court Statistics 2023. National Center for Juvenile Justice, Pittsburgh, PA. Matthew Honnibal, Ines Montani, Sofie Van Lan- deghem, and Adr...

  5. [5]

    David Mimno, Hanna Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum

    United States Supreme Court. David Mimno, Hanna Wallach, Edmund Talley, Miriam Leenders, and Andrew McCallum. 2011. Optimizing semantic coherence in topic models. InProceedings of EMNLP 2011, pages 262–272. Montgomery v. Louisiana. 2016. Montgomery v. Louisiana, 577 U.S. 190. United States Supreme Court. National Institute of Standards and Technology. 202...

  6. [6]

    Lacey Schaefer and Christopher Uggen

    United States Supreme Court. Lacey Schaefer and Christopher Uggen. 2016. Blended sentencing laws and the punitive turn in juvenile jus- tice.Law & Social Inquiry, 41(1):435–463. Schall v. Martin. 1984. Schall v. Martin, 467 U.S. 253. United States Supreme Court. Jerrold Tsin Howe Soh. 2024. Discovering significant topics from legal decisions with selectiv...

  7. [7]

    an adjudicatory hearing in a Juvenile Court is not required to conform to all the requirements of a criminal trial for adults in order to comply with due process of law

    Risk assessment in juvenile justice: A guide- book for implementation. Technical report, Models for Change. White House Office of Science and Technology Pol- icy. 2022. Blueprint for an AI bill of rights: Making automated systems work for the american people. https://www.whitehouse.gov/ ostp/ai-bill-of-rights/. A Major Constitutional and Statutory Inflect...

  8. [8]

    From Rehabilitation to Registration: - In the 1970s, juvenile sex offense adjudications were largely confidential and focused on rehabilitation. By the 1990s and 2000s, the enactment of SORAs (e.g., Megan’s Law, state analogues) led to the imposition of public registration requirements on juveniles, often with little distinction from adults. - Key drivers...

  9. [9]

    - Key drivers: In re Gault (1967), Kent v

    Procedural Formalization: - Early opinions emphasized informality and flexibility; later decades saw increasing formalization, with full application of confrontation, hearsay, and effective assistance doctrines. - Key drivers: In re Gault (1967), Kent v. United States (1966), and subsequent Supreme Court and state court decisions

  10. [10]

    S.B., 2011)

    Judicial Pushback and Individualization: - By the 2010s, some courts began to push back against blanket registration, requiring individualized findings or providing mechanisms for relief (e.g., People v. S.B., 2011). - Key drivers: State legislative amendments, constitutional challenges (due process, equal protection), and empirical critiques of registrat...

  11. [11]

    Washington (2004)

    Evidentiary Evolution: - Increasing accommodation for child victim testimony (e.g., videotaped interviews, relaxed hearsay rules), but also heightened scrutiny of confrontation rights post- Crawford v. Washington (2004). ## Drivers of the Trend Legal Drivers: - Enactment and expansion of sex offender registration statutes, often in response to high-profil...

  12. [12]

    In Re Gantt (1978, 1970s): - Early articulation of due process rights in juvenile sex offense adjudications. - "The appellant had his constitutional rights to confrontation and cross-examination and all his ’due process of law’ and other constitutional rights fully accorded within the parameters of In re Gault (1967), 387 U.S. 1..."

  13. [13]

    The standards for certification stated in § 1112 is whether such juvenile or child is ’capable of knowing right from wrong, and to be held accountable for his acts.’

    Sherfield v. State (1973, 1970s): - Application of Kent standards to transfer decisions in juvenile rape cases. - "The standards for certification stated in § 1112 is whether such juvenile or child is ’capable of knowing right from wrong, and to be held accountable for his acts.’"

  14. [14]

    The only reference to juveniles in section 2 is in subsection (A)(5). Section 2(A)(5) thus offers enhanced protection for juveniles

    People v. S.B. (2011, 2010s): - Illinois appellate court holds that registration cannot be imposed on a juvenile absent an adjudication of delinquency. - "The only reference to juveniles in section 2 is in subsection (A)(5). Section 2(A)(5) thus offers enhanced protection for juveniles..."

  15. [15]

    The court determined, due to the serious nature of the crimes, Matha needed intensive counselling

    State v. Matha (1995, 1990s): - Upholds indefinite commitment for juvenile sex offenders, reflecting the punitive turn of the 1990s. - "The court determined, due to the serious nature of the crimes, Matha needed intensive counselling."

  16. [16]

    Keith Brown, II (2024, 2020s): - Addresses the admissibility of prior bad acts and confrontation rights in juvenile sex offense trials

    State of Louisiana v. Keith Brown, II (2024, 2020s): - Addresses the admissibility of prior bad acts and confrontation rights in juvenile sex offense trials. - "It cannot be said that the introduction of this evidence, which Brown was unable to question or put into perspective given that A.O. was not present at trial or otherwise able to be cross-examined...

  17. [17]

    ## Boundary Cases and Evidence Limits - People v. S.B. complicates the trend by holding that a juvenile found

    In re D.J. (2020, 2020s): - Upholds the use of prior acts as evidence of grooming in juvenile sex offense adjudications. - "The evidence of appellant’s prior instances of watching a pornographic movie and engaging in a continuing course of sexual activity with the victim was relevant to and used for the legitimate purpose of showing appellant’s opportunit...

  18. [18]

    war on drugs

    Shift from Rehabilitation to Punishment (1970s-1990s): - Early opinions emphasized rehabilitation and individualized treatment for juvenile drug offenders. By the 1980s and 1990s, the "war on drugs" and public concern over crack cocaine and heroin led to more punitive approaches, including frequent transfer to adult court and mandatory minimum sentences

  19. [19]

    Expansion of Accomplice and Conspiracy Liability: - Courts broadened the scope of accomplice and conspiracy liability for juveniles involved in drug distribution networks, often relying on circumstantial evidence and informant testimony

  20. [20]

    Sentencing Enhancements and Merger Doctrine: - The 1990s and 2000s saw the proliferation of sentencing enhancements (e.g., proximity to schools, use of firearms, prior convictions) and the development of the merger/allied offenses doctrine, with courts increasingly treating possession and distribution as separate offenses unless the facts compelled merger

  21. [21]

    Opioid Crisis and Fentanyl (2010s-2020s): - The rise of fentanyl and synthetic opioids led to even harsher sentencing for distribution offenses, with courts emphasizing the distinct and severe harms posed by each substance, often refusing to merge trafficking convictions for heroin and fentanyl even when found in a single mixture

  22. [22]

    war on drugs

    Racial and Socioeconomic Factors: - Recent opinions have addressed, and generally rejected, claims that sentencing was improperly influenced by race or class, reaffirming the principle that such factors must not be considered. ## Drivers of the Trend Legal Drivers: - Statutory changes: The Controlled Substances Act (1970), state analogues, and the prolife...

  23. [23]

    Repeated violations of the narcotics laws are not irrelevant in the defendant’s pattern of criminality

    People v. Meza (1971, 1970s): - Upheld broad judicial discretion to deny diversion to treatment for repeat juvenile narcotics offenders: > "Repeated violations of the narcotics laws are not irrelevant in the defendant’s pattern of criminality."

  24. [24]

    Commonwealth v. Forde (1975, 1970s): - Addressed the exigency exception and plain view doctrine in narcotics searches involving juveniles: > "We hold here only that where the exigency is reasonably foreseeable and the police offer no justifiable excuse for their prior delay in obtaining a warrant, the exigency exception to the warrant requirement is not o...

  25. [25]

    Presence at the scene of a crime is a fact which, together with other facts, may support a finding that the defendant acted as an accomplice

    State v. Pronovost (1984, 1980s): - Clarified accomplice liability for juveniles in drug distribution: > "Presence at the scene of a crime is a fact which, together with other facts, may support a finding that the defendant acted as an accomplice."

  26. [26]

    State v. Daniels (2020, 2020s): - Refused to merge trafficking in heroin and trafficking in fentanyl for sentencing, emphasizing distinct harms: > "We cannot overstate the harm that fentanyl has wrought on this state... trafficking in heroin and trafficking in fentanyl pose separate and identifiable harms under Ruff and do not merge as allied offenses."

  27. [27]

    Collectively, this evidence established that Howard was conscious of the presence of the drugs and exercised dominion and control over them

    State v. Howard (2023, 2020s): - Affirmed convictions for trafficking in multiple drugs, upholding constructive possession and the refusal to merge offenses: > "Collectively, this evidence established that Howard was conscious of the presence of the drugs and exercised dominion and control over them."

  28. [28]

    high crime area

    State v. Carr (2021, 2020s): - Addressed, and rejected, claims that sentencing was improperly influenced by race or socioeconomic status: > "The trial court, in fact, did the exact opposite by specifically stating that it had made a concerted effort to ignore Carr’s race and the concept of ’white privilege’ when issuing its sentencing decision." ## Bounda...

  29. [29]

    clear and convincing evidence

    In re WO (Cal. Ct. App. 1979, 1970s): - Early articulation of the "clear and convincing evidence" standard and skepticism of removal based on "remote possibility" of harm. - "Remote possibilities do not provide grounds sufficient for removing a child from parental custody."

  30. [30]

    Her abuse of alcohol has proven to be the principal impediment to establishing a stable home for her children

    E.J. v. State (Iowa 1989, 1980s): - Substance abuse as a principal impediment to reunification; clear and convincing evidence required for TPR. - "Her abuse of alcohol has proven to be the principal impediment to establishing a stable home for her children."

  31. [31]

    Children cannot be forced to await the maturity of their parents

    In the Interest of S.A. (Iowa 1993, 1990s): - Emphasizes that children cannot be forced to wait for parental maturity. - "Children cannot be forced to await the maturity of their parents."

  32. [32]

    We cannot deprive a child of permanency after the State has proved a ground for termination... by hoping someday a parent will learn to be a parent

    In re P.L. (Iowa 2009, 2000s): - Explicit statement that courts cannot deprive a child of permanency by hoping for future parental rehabilitation. - "We cannot deprive a child of permanency after the State has proved a ground for termination... by hoping someday a parent will learn to be a parent."

  33. [33]

    (Iowa Ct

    In the Interest of T.M.-L. (Iowa Ct. App. 2025, 2020s): - Methamphetamine use at birth, repeated failed treatment, and lack of progress as grounds for TPR. - "Every single test result has been positive for methamphetamine... The mother made virtually no progress over the life of the case and there is no reason to think that the need for removal... would b...

  34. [34]

    (Iowa Ct

    In the Interest of R.D. (Iowa Ct. App. 2024, 2020s): - Both parents with long-term substance abuse histories; TPR affirmed despite requests for more time and relative placement. - "Considering the mother’s substance-use history, failed past attempts at sobriety, and lack of engagement in services during these proceedings, we cannot find the need for remov...