{"paper":{"title":"The Expressivity Boundary of Probabilistic Circuits: A Comparison with Large Language Models","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Probabilistic circuits match transformer separation rank only on data partitions aligned with their fixed vtree structure and degrade on heterogeneous dependency topologies.","cross_cats":["cs.AI"],"primary_cat":"cs.LG","authors_text":"Anji Liu, Muhan Zhang, Xuejie Liu, Zhiyu Zhao","submitted_at":"2026-05-13T03:22:10Z","abstract_excerpt":"Probabilistic Circuits (PCs) are deep generative models that support exact and efficient probabilistic inference. Yet in autoregressive language modeling, PCs still lag behind Transformer-based large language models (LLMs), suggesting an important expressivity gap. In this work, we compare PCs and LLMs under a unified autoregressive formulation. First, an output bottleneck: PCs parameterize predictions as convex combinations in probability space, which struggles to represent the sharp distributions typical of language; adopting a logit-space parameterization substantially narrows this gap. Sec"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We prove that structured-decomposable PCs can match Transformer separation rank on vtree-aligned partitions, but show, both theoretically and empirically, that this capacity is limited to partitions aligned with the fixed routing structure, leading to severe degradation when the data exhibits heterogeneous dependency topologies. We further prove that decomposable PCs are strictly more expressive than structured-decomposable ones.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that language data exhibits heterogeneous dependency topologies that systematically misalign with any fixed vtree structure, and that the separation rank comparison fully captures the practical expressivity gap in autoregressive modeling.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Probabilistic circuits have an output bottleneck with convex probability combinations and a context bottleneck limited to fixed vtree-aligned partitions, making them less expressive than transformers for language data with heterogeneous dependencies, though decomposable PCs are strictly more capable","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Probabilistic circuits match transformer separation rank only on data partitions aligned with their fixed vtree structure and degrade on heterogeneous dependency topologies.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"dd7ad3d58e30120649e7efbf0ab270cb9ec7072fd939f55db8517cbd951dd61d"},"source":{"id":"2605.12940","kind":"arxiv","version":1},"verdict":{"id":"4853b739-465e-40c0-a2c1-59a586dc57ac","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T20:44:11.017758Z","strongest_claim":"We prove that structured-decomposable PCs can match Transformer separation rank on vtree-aligned partitions, but show, both theoretically and empirically, that this capacity is limited to partitions aligned with the fixed routing structure, leading to severe degradation when the data exhibits heterogeneous dependency topologies. We further prove that decomposable PCs are strictly more expressive than structured-decomposable ones.","one_line_summary":"Probabilistic circuits have an output bottleneck with convex probability combinations and a context bottleneck limited to fixed vtree-aligned partitions, making them less expressive than transformers for language data with heterogeneous dependencies, though decomposable PCs are strictly more capable","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that language data exhibits heterogeneous dependency topologies that systematically misalign with any fixed vtree structure, and that the separation rank comparison fully captures the practical expressivity gap in autoregressive modeling.","pith_extraction_headline":"Probabilistic circuits match transformer separation rank only on data partitions aligned with their fixed vtree structure and degrade on heterogeneous dependency topologies."},"references":{"count":47,"sample":[{"doi":"","year":2015,"title":"Learning the structure of sum-product net- works via an svd-based algorithm","work_id":"f9a4797a-09ac-4cdb-bfb0-9e492a6c61b9","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"On the sample complexity of learning sum-product networks","work_id":"c4117ff5-fdb7-4c06-b498-7e001c8717f2","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2026,"title":"The softmax bottleneck does not limit the probabilities of the most likely tokens","work_id":"b1f4cc6e-1799-47d2-999d-566ab38031ec","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1901,"title":"Language mod- els are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020","work_id":"0778578f-d352-4001-8bc9-e7a0d244210b","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Probabilistic cir- cuits: A unifying framework for tractable probabilistic models.UCLA","work_id":"1b6ce673-8036-481d-98f3-03f17300d510","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":47,"snapshot_sha256":"8426b554fe557d7c483c71ec1b5b7da3fe8f41e9008c20c65da8f72eab5d6331","internal_anchors":6},"formal_canon":{"evidence_count":2,"snapshot_sha256":"565fc375eef366b9893cc8cec43a6e2401036b9c25e569f62fc0ab17262fdb12"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}