Database Querying under Missing Values Governed by Missingness Mechanisms

Farouk Toumani; Leopoldo Bertossi; Maxime Buron

arxiv: 2604.06520 · v1 · submitted 2026-04-07 · 💻 cs.DB · cs.AI· cs.LO

Database Querying under Missing Values Governed by Missingness Mechanisms

Leopoldo Bertossi , Farouk Toumani , Maxime Buron This is my paper

Pith reviewed 2026-05-10 17:48 UTC · model grok-4.3

classification 💻 cs.DB cs.AIcs.LO

keywords database queryingmissing valuesmissingness mechanismsBayesian networksprobabilistic databasesquery answeringmissingness graph

0 comments

The pith

Missing values governed by a Bayesian network missingness graph allow construction of a block-independent probabilistic database for query answering.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses query answering on relational databases with missing values whose absence is explained by a missingness mechanism modeled as a Bayesian network, or Missingness Graph, over the attributes. This model combined with the observed data enables building a block-independent probabilistic database. Two query answering techniques are introduced that incorporate both the probabilistic nature of the uncertainty and the statistical plausibility of imputing the missing values. Complexity results are provided to determine when these techniques are computationally feasible. This framework offers a semantics for incomplete databases that goes beyond simple NULL handling by explicitly using the causes of missingness.

Core claim

By representing the missingness mechanism as a Missingness Graph (MG), which is a Bayesian network, the observed database can be used to construct a block-independent probabilistic database. This construction supports two specific query answering techniques that jointly account for probabilistic uncertainty and the statistical plausibility of implicit imputations of missing values, along with complexity characterizations of their computational feasibility.

What carries the argument

The Missingness Graph (MG), a Bayesian network that models dependencies among database attributes regarding why values are missing, which enables transforming the observed database into a block-independent probabilistic database suitable for the proposed query answering methods.

If this is right

The MG and observed DB allow building a block-independent probabilistic DB.
Two QA techniques capture both probabilistic uncertainty and statistical plausibility of MV imputation.
Complexity results characterize the feasibility of the QA approaches.
The approach departs from traditional NULL-based treatments of missing data in RDBs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Integrating domain-specific knowledge about missingness causes could enhance query accuracy in specialized databases.
This method might be extended by learning the Missingness Graph automatically from data patterns.
Block-independence could allow efficient query optimization techniques in larger systems.

Load-bearing premise

The missingness mechanism can be accurately modeled as a Bayesian network involving the database attributes, and the observed data together with this model is sufficient to build a block-independent probabilistic database for query answering.

What would settle it

Finding a dataset where the missingness follows a known mechanism but queries answered via the constructed probabilistic DB give different results from the true complete data distribution, or where the Bayesian network fails to represent the actual dependencies in missingness.

Figures

Figures reproduced from arXiv: 2604.06520 by Farouk Toumani, Leopoldo Bertossi, Maxime Buron.

**Figure 1.** Figure 1: Missingness Graph. Example 2. (Ex. 1 cont.) For simplicity, assume now that attribute A is fully observed and denoted with Ao . The MG in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Three Missingness Graphs. The MG in [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: (a) Missingness graph. (b) Resulting BID. For this MG, the probabilities are calculated as in Example 6, obtaining, for example, for the tuples in the first block of Rp : p(τ 1 1 ) = PM(Mm 1 = 1|A = a, B = b) × PM(A = a, B = b), p(τ 1 2 ) = PM(Mm 1 = 0|A = a, B = b) × PM(A = a, B = b). Accordingly, we define the probabilities in the MG in such a way that p(τ 1 1 ) = p1, p(τ 1 2 ) = (1 − p1), etc. Now, con… view at source ↗

**Figure 4.** Figure 4: A arbitrary BID B To construct the proof, we start with a BID D over a schema S = {T} and, in polynomial time, build an incomplete database D∗ , an MG M, and a join distribution P M. From this construction, we derive a BID instance Dp ≡ D. The database D∗ is defined over the schema {T ∗ , A∗ 1 , . . . , A∗ n}. The attribute T has domain dom(T) = {t1, . . . , tn}, while each attribute Ai (for i ∈ {1, . . . … view at source ↗

**Figure 7.** Figure 7: The incomplete database D∗ tuple_id Tm Am 1 . . . Am n A∗ 1 I A1 . . . A∗ n I An T∗ I T Prob J 1 B1 t1 in(t1, B1) . . . in(t1, Bn) 1 0 . . . na 1 na 1 p 1 1 n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J n B1 tn in(tn, B1) . . . in(tn, Bn) 1 0 . . . na 1 na 1 p n 1 n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J n Bn tn in(tn, Bn) . . . i… view at source ↗

**Figure 8.** Figure 8: The MG M The join distribution PM, illustrated in [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

read the original abstract

We address the problems of giving a semantics to- and doing query answering (QA) on a relational database (RDB) that has missing values (MVs). The causes for the latter are governed by a Missingness Mechanism that is modelled as a Bayesian Network, which represents a Missingness Graph (MG) and involves the DB attributes. Our approach considerable departs from the treatment of RDBs with NULL (values). The MG together with the observed DB allow to build a block-independent probabilistic DB, on which basis we propose two QA techniques that jointly capture probabilistic uncertainty and statistical plausibility of the implicit imputation of MVs. We obtain complexity results that characterize the computational feasibility of those approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper models missingness in databases via Bayesian networks called Missingness Graphs, builds a block-independent probabilistic DB from the observed data plus the graph, and reduces query answering to two techniques with PTIME/#P complexity splits.

read the letter

The main takeaway is that this work treats the causes of missing values as a first-class modeling choice using a Bayesian network over attributes, then turns the observed database plus that network into a block-independent probabilistic database for querying. From there it defines two query methods that mix probabilistic uncertainty with statistical plausibility of imputations and gives complexity results that track known bounds for block-independent PDBs.

Referee Report

0 major / 2 minor

Summary. The paper claims that missing values in relational databases can be handled by modeling their missingness mechanism as a Bayesian Network (the Missingness Graph or MG) over database attributes. Combined with the observed database, this MG yields a block-independent probabilistic database via factorization of the joint distribution over possible worlds. Two query-answering techniques are then defined on this structure that jointly capture probabilistic uncertainty and the statistical plausibility of implicit imputations; complexity results (PTIME vs. #P-hard) follow from known results on block-independent PDBs.

Significance. If the central construction holds, the work supplies a principled, mechanism-aware semantics for missing data that departs from NULL-based treatments and directly reduces query answering to standard probabilistic evaluation on block-independent PDBs. The resulting complexity characterizations are immediately useful for assessing feasibility, and the explicit use of the missingness graph as a generative model offers a clear route to incorporating domain knowledge about why values are missing.

minor comments (2)

[Abstract] Abstract: the phrase 'Our approach considerable departs' is grammatically incorrect and should read 'Our approach considerably departs'.
[Section 3 (construction)] The factorization of the joint distribution over possible worlds (used to establish block independence) is central; a short illustrative example in the main text would help readers verify that the MG induces the claimed independence structure without additional assumptions.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The report accurately captures the core contribution of modeling missingness mechanisms via a Bayesian network (Missingness Graph) to obtain a block-independent probabilistic database for query answering. No specific major comments or requested changes were provided in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper models the missingness mechanism explicitly as a Bayesian Network (Missingness Graph) over DB attributes, then defines the block-independent probabilistic DB via the factorization of the joint distribution over possible worlds consistent with the observed data and the MG. The two QA techniques are reductions to standard probabilistic query evaluation on this structure, and the PTIME/#P-hard complexity results follow directly from known results on block-independent PDBs once the construction is granted. No equation or claim reduces by construction to a fitted parameter, self-definition, or load-bearing self-citation; the central steps are formally defined and self-contained against external probabilistic database benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the domain assumption that missingness mechanisms are accurately representable as Bayesian networks and that this representation plus observed data yields a usable probabilistic database. No free parameters are mentioned. Two new modeling entities are introduced without independent evidence outside the paper.

axioms (1)

domain assumption Missing values in the database are governed by a missingness mechanism that can be modeled as a Bayesian Network represented by a Missingness Graph involving the database attributes.
This is the foundational modeling choice stated in the abstract as the basis for building the probabilistic DB.

invented entities (2)

Missingness Graph (MG) no independent evidence
purpose: To represent the missingness mechanism as a Bayesian network for the database attributes.
New construct introduced to model causes of missing values.
block-independent probabilistic DB no independent evidence
purpose: To serve as the basis for query answering that captures uncertainty and plausibility of imputations.
Constructed from the MG and observed database for the QA techniques.

pith-pipeline@v0.9.0 · 5414 in / 1527 out tokens · 53007 ms · 2026-05-10T17:48:45.807341+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages

[1]

K., Magnanti, T

Ahuja, R. K., Magnanti, T. L. and Orlin, J.B. \ Network Flows: Theory, Algorithms, and Applications . Prentice Hall, 1993

work page 1993
[2]

Ali, S. M. and Silvey, S. D. A General Class of Coefficients of Divergence of one Distribution from Another. Journal of the Royal Statistical Society, Series B , 1996, 28(1):131--142

work page 1996
[3]

Allison, P. D. \ Maximum Likelihood Estimation . \ \ Quantitative Applications in the Social Sciences, 96. Sage Publications, 1993

work page 1993
[4]

Allison, P. D. \ Missing Data . \ Quantitative Applications in the Social Sciences, 136. Sage Publications, 2002

work page 2002
[5]

and Li, L

Bertossi, L. and Li, L. \ Achieving Data Privacy through Secrecy Views and Null-Based Virtual Updates. \ IEEE Transactions on Knowledge and Data Engineering , 2013, 25(5):987-1000

work page 2013
[6]

and Bravo, L

Bertossi, L. and Bravo, L. \ Consistency and Trust in Peer Data Exchange Systems. Theory and Practice of Logic Programming , 2017, 17(2):148-204

work page 2017
[7]

Bishop, C. M. \ Pattern Recognition and Machine Learning . \ Springer, 2006

work page 2006
[8]

\ A Fast Algorithm for Enumerating Bipartite Perfect Matchings., ISAAC 2001 (LNCS vol 2223), 2001

Takeaki, U. \ A Fast Algorithm for Enumerating Bipartite Perfect Matchings., ISAAC 2001 (LNCS vol 2223), 2001

work page 2001
[9]

P., Peng, R., Gutenberg, M

Chen, L., Kyng, R., Liu, Y. P., Peng, R., Gutenberg, M. P., & Sachdeva, S. (2022). Minimum cost flows, MDPs, and \( _1\)-regression in nearly linear time for separable convex instances. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing (STOC 2022)

work page 2022
[10]

and Peterfreund, L

Console, M, Libkin, L. and Peterfreund, L. \ Querying Incomplete Numerical Data: Between Certain and Possible Answers. 2022, CoRR abs/2210.15395

work page arXiv 2022
[11]

and Suciu, D

Dalvi, N. and Suciu, D. \ Efficient Query Evaluation on Probabilistic Databases. In Proc VLDB 2004

work page 2004
[12]

and Widom, J

Das Sarma, A., Benjelloun, O., Halevy, A. and Widom, J. Working Models for Uncertain Data. Proc ICDE 2006

work page 2006
[13]

L., Berk, K

Devore, J. L., Berk, K. N. and Carlton, M. A. \ Modern Mathematical Statistics with Applications . 3rd Ed., Springer, 2021

work page 2021
[14]

\ Bayesian Networks

Darwiche, A. \ Bayesian Networks. Communications of the ACM , 2010, 53(12):80-90

work page 2010
[15]

\ Reasoning with Probabilistic and Deterministic Graphical Models

Dechter, R. \ Reasoning with Probabilistic and Deterministic Graphical Models . \ 2nd Ed., Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan & Claypool Pubs., 2019

work page 2019
[16]

and Rekatsinas, T

De Sa, C., Ilyas, I., Kimelfeld, B., Re, C. and Rekatsinas, T. \ A Formal Framework for Probabilistic Unclean Databases. Proc. ICDT 2019, pp. 6:1-6:18

work page 2019
[17]

Dinic, E. A. Algorithm for Solution of a Problem of Maximum Flow in a Network with Power Estimation. Soviet Mathematics Doklady, 1970, 11:1277--1280

work page 1970
[18]

aGrUM/pyAgrum: a toolbox to build models and algorithms for Probabilistic Graphical Models in Python

Ducamp, G., Gonzales, C., Wuillemin, P.-H. aGrUM/pyAgrum: a toolbox to build models and algorithms for Probabilistic Graphical Models in Python. Proc. International Conference on Probabilistic Graphical Models, 2020, PMLR

work page 2020
[19]

\ Aggregation in Probabilistic Databases via Knowledge Compilation

Fink , R., Han, L., Olteanu, D. \ Aggregation in Probabilistic Databases via Knowledge Compilation. Proc. VLDB Endow. 5(5): 490-501 (2012)

work page 2012
[20]

and Hill, J

Gelman, A. and Hill, J. \ Data Analysis Using Regression and Multilevel/Hierarchical Models . Cambridge Univ. Press, 2007

work page 2007
[21]

and Sirangelo, C

Gheerbrant, A., Libkin, L., Rogova, A. and Sirangelo, C. \ Querying Incomplete Data: Complexity and Tractability via Datalog and First-Order Rewritings. \ Theory and Practice of Logic Programming. \ Published online on 28 November 2023

work page 2023
[22]

and Kimelfeld, B

Gilad, A., Imber, A. and Kimelfeld, B. \ The Consistency of Probabilistic Databases with Independent Cells. \ Proc. ICDT, 2023

work page 2023
[23]

and Spezzano, F

Greco, S., Molinaro, C. and Spezzano, F. \ Incomplete Data and Data Dependencies in Relational Databases. Synthesis Lectures in Data Management, Morgan & Claypool Pubs., 2012

work page 2012
[24]

and Suciu, D

Gribkoff, E., Van den Broeck, G. and Suciu, D. \ The Most Probable Database Problem. \ Proc. BUDA 2014, collocated with SIGMOD, 2014

work page 2014
[25]

, Matsui, T

Fukuda, K. , Matsui, T. \ Finding all the perfect matchings in bipartite graphs. Applied Mathematics Letters . 1994

work page 1994
[26]

and Friedman, J

Hastie, T., Tibshirani, R. and Friedman, J. \ The Elements of Statistical Learning . 2nd Ed., Springer, 2017

work page 2017
[27]

and Lipski Jr., W

Imielinski, T. and Lipski Jr., W. \ Incomplete Information in Relational Databases. \ Journal of the ACM , 1984, 31(4):761-791

work page 1984
[28]

and Van den Broeck, G

Khosravi, P., Liang, Y., Choi, Y. and Van den Broeck, G. \ What to Expect of Classifiers? Reasoning about Logistic Regression with Missing Features. \ Proc. IJCAI, 2019, pp. 2716-2724

work page 2019
[29]

Kuhn, H. W. \ The Hungarian Method for the Assignment Problem. \ Naval Research Logistics Quarterly , 1955, 2(1-2):83-97

work page 1955
[30]

and Dan Suciu, D

Kumar Jha, A. and Dan Suciu, D. \ Probabilistic Databases with MarkoViews. Proc. VLDB 5(11):1160-1171, 2012

work page 2012
[31]

Little, R. J. and Rubin, D. B. \ Statistical Analysis with Missing Data . \ John Wiley & Sons, 3rd Ed., 2019

work page 2019
[32]

Littman, M. L. and Goldsmith, J. and Mundhenk, M. \ The Computational Complexity of Probabilistic Planning , JAIR, 1998

work page 1998
[33]

MacKay, D. J. \ Information Theory, Inference, and Learning Algorithms . Cambridge University Press, 2003

work page 2003
[34]

Solving Integer Minimum Cost Flows with Separable Convex Cost Objective Polynomially

Minoux, M.. Solving Integer Minimum Cost Flows with Separable Convex Cost Objective Polynomially. In: Gallo, G., Sandi, C. (eds) `Netflow at Pisa'. Mathematical Programming Studies, Springer 1986, vol 26

work page 1986
[35]

and Tian, J

Mohan, K., Pearl, J. and Tian, J. \ Graphical Models for Inference with Missing Data. \ Proc NIPS , 2013, pp. 1277-1285

work page 2013
[36]

On the Testability of Models with Missing Data

Mohan, K and Pearl, J. On the Testability of Models with Missing Data. \ Proc. AISTAT, 2014

work page 2014
[37]

\ Graphical Models for Inference with Missing Data

Mohan, K. \ Graphical Models for Inference with Missing Data. \ PhD Thesis, UCLA, 2017. https://escholarship.org/uc/item/6mk2b174

work page 2017
[38]

and Pearl, J

Mohan, K. and Pearl, J. \ Graphical Models for Processing Missing Data. \ Journal of the American Statistical Association , 2021, 116(534):1023–1037

work page 2021
[39]

and Z\' a vodn\' y , J

Olteanu, D. and Z\' a vodn\' y , J. \ Size Bounds for Factorised Representations of Query Results. \ ACM TODS , 2015, 40(1)

work page 2015
[40]

\ Computational Complexity , Addison-Wesley, Reading, MA, 1994

Papadimitriou, C. \ Computational Complexity , Addison-Wesley, Reading, MA, 1994

work page 1994
[41]

\ Causality: Models, Reasoning and Inference

Pearl, J. \ Causality: Models, Reasoning and Inference . \ Cambridge Univ. Press, 2nd edition, 2009

work page 2009
[42]

\ The Foundations of Causal Inference

Pearl, J. \ The Foundations of Causal Inference. \ Sociological Methodology , 2010, 40:75–149. http://www.jstor.org/stable/41336883

work page arXiv 2010
[43]

and Suciu, D

R\' e , C., Dalvi, N. and Suciu, D. \ Efficient Top-k Query Evaluation on Probabilistic Data. \ Proc. ICDE, 2007, pp. 886-895

work page 2007
[44]

and Naughton, J

R\'e, C., Dalvi, N. and Naughton, J. F. \ Trio: A System for Integrated Management of Data, Accuracy, and Lineage. Proc. CIDR, 2005, pp. 262–276

work page 2005
[45]

Reiter, R.\ A Sound and Sometimes Complete Query Evaluation Algorithm for Relational Databases with Null Values. \ J. of the ACM , 1986, 33(2):349-370

work page 1986
[46]

\ Factorised representations of query results: size bounds and readability.\ Proc ICDT 2012

Olteanu, Dan and Z\' a vodn\' y , Jakub. \ Factorised representations of query results: size bounds and readability.\ Proc ICDT 2012

work page 2012
[47]

\ The Trichotomy of HAVING Queries on a Probabilistic Database

R\' e , C., Suciu, D. \ The Trichotomy of HAVING Queries on a Probabilistic Database. \ VLDB J. , 2009, 18(5):1091-1116

work page 2009
[48]

Royden, H. L. Real Analysis , 3rd ed. Macmillan, New York, 1988

work page 1988
[49]

and Suciu, D

Salimi, B., Rodriguez, L., Howe, B. and Suciu, D. \ Interventional Fairness: Causal Database Repair for Algorithmic Fairness. \ Proc. SIGMOD, 2019, pp. 793–810

work page 2019
[50]

and Moslemi, M

Salimi, B., Milani, M., Pirhadi, A., Cloninger, A. and Moslemi, M. H. \ OTClean: Data Cleaning for Conditional Independence Violations using Optimal Transport. \ Proc. SIGMOD, 2024

work page 2024
[51]

and Ramusat, Y

Senellart, P., Jachiet, L., Maniu, S. and Ramusat, Y. \ ProvSQL: Provenance and Probability Management in PostgreSQL. \ Proc VLDB, 2018

work page 2018
[52]

\ Top-k Query Processing in Uncertain Databases

Soliman, M., Ilyas, I and Chang, K. \ Top-k Query Processing in Uncertain Databases. In Proc. ICDE 2007

work page 2007
[53]

and Chang, K

Soliman, M., Ilyas, I. and Chang, K. \ Probabilistic Top-k and Ranking-Aggregate Queries. ACM TODS , 2008, 33(3)

work page 2008
[54]

Enumeration complexity: Incremental time, delay and space (habilitation thesis).arXiv preprint arXiv:2309.17042,

Strozecki, Y. \ Enumeration Complexity: Incremental Time, Delay and Space. arXiv, 2023., https://arxiv.org/abs/2309.17042

work page arXiv 2023
[55]

and Koch, C

Suciu, D., Olteanu, D., R\'e, C. and Koch, C. \ Probabilistic Databases . \ Synthesis Lectures on Data Management, Morgan & Claypool Pubs., 2011

work page 2011
[56]

Complexity Classes Defined by Counting Quantifiers

Tor\' a n, J. Complexity Classes Defined by Counting Quantifiers. J. ACM. , 1991

work page 1991
[57]

and Sequeda, J

Toussaint, E., Guagliardo, P., Libkin, L. and Sequeda, J. \ Troubles with Nulls, Views from the Users. Proc. VLDB Endow., 2022, 15(11):2613-2625

work page 2022
[58]

and Pearl, J

Van den Broeck, G., Mohan, K., and Choi, A., Darwiche, A. and Pearl, J. Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data. Proc. UAI, 2013

work page 2013
[59]

and Suciu, D

Van den Broeck, G. and Suciu, D. \ Query Processing on Probabilistic Data: A Survey. \ Foundations and Trends in Databases , 2015, 7(3-4):197–341. \ NOW Publishers

work page 2015
[60]

Valiant, L. G. \ The Complexity of Computing the Permanent. Theoretical Computer Science , 8(2):189--201, 1979

work page 1979
[61]

\ All of Statistics

Wasserman, L. \ All of Statistics . \ Springer, 2010

work page 2010
[62]

Végh, L. A. (2016). A strongly polynomial algorithm for a class of minimum-cost flow problems with separable convex objectives. SIAM Journal on Computing, 45(5), 1729--1761

work page 2016
[63]

Wong, S. K. M. and Wang, Z. W. \ On Axiomatization of Probabilistic Conditional Independence. \ Proc. Conf. Uncertainty in Artificial Intelligence, 1994, pp. 591–597

work page 1994
[64]

Wong, S. K. M., Butz, C. J. and Wu, D. \ On the Implication Problem for Probabilistic Conditional Independency. \ IEEE Transactions on Systems, Man, and Cybernetics, Part A , 2000, 30(6):785-805

work page 2000
[65]

https://stats.oarc.ucla.edu/stata/seminars/mi_in_stata_pt1_new

work page

[1] [1]

K., Magnanti, T

Ahuja, R. K., Magnanti, T. L. and Orlin, J.B. \ Network Flows: Theory, Algorithms, and Applications . Prentice Hall, 1993

work page 1993

[2] [2]

Ali, S. M. and Silvey, S. D. A General Class of Coefficients of Divergence of one Distribution from Another. Journal of the Royal Statistical Society, Series B , 1996, 28(1):131--142

work page 1996

[3] [3]

Allison, P. D. \ Maximum Likelihood Estimation . \ \ Quantitative Applications in the Social Sciences, 96. Sage Publications, 1993

work page 1993

[4] [4]

Allison, P. D. \ Missing Data . \ Quantitative Applications in the Social Sciences, 136. Sage Publications, 2002

work page 2002

[5] [5]

and Li, L

Bertossi, L. and Li, L. \ Achieving Data Privacy through Secrecy Views and Null-Based Virtual Updates. \ IEEE Transactions on Knowledge and Data Engineering , 2013, 25(5):987-1000

work page 2013

[6] [6]

and Bravo, L

Bertossi, L. and Bravo, L. \ Consistency and Trust in Peer Data Exchange Systems. Theory and Practice of Logic Programming , 2017, 17(2):148-204

work page 2017

[7] [7]

Bishop, C. M. \ Pattern Recognition and Machine Learning . \ Springer, 2006

work page 2006

[8] [8]

\ A Fast Algorithm for Enumerating Bipartite Perfect Matchings., ISAAC 2001 (LNCS vol 2223), 2001

Takeaki, U. \ A Fast Algorithm for Enumerating Bipartite Perfect Matchings., ISAAC 2001 (LNCS vol 2223), 2001

work page 2001

[9] [9]

P., Peng, R., Gutenberg, M

Chen, L., Kyng, R., Liu, Y. P., Peng, R., Gutenberg, M. P., & Sachdeva, S. (2022). Minimum cost flows, MDPs, and \( _1\)-regression in nearly linear time for separable convex instances. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing (STOC 2022)

work page 2022

[10] [10]

and Peterfreund, L

Console, M, Libkin, L. and Peterfreund, L. \ Querying Incomplete Numerical Data: Between Certain and Possible Answers. 2022, CoRR abs/2210.15395

work page arXiv 2022

[11] [11]

and Suciu, D

Dalvi, N. and Suciu, D. \ Efficient Query Evaluation on Probabilistic Databases. In Proc VLDB 2004

work page 2004

[12] [12]

and Widom, J

Das Sarma, A., Benjelloun, O., Halevy, A. and Widom, J. Working Models for Uncertain Data. Proc ICDE 2006

work page 2006

[13] [13]

L., Berk, K

Devore, J. L., Berk, K. N. and Carlton, M. A. \ Modern Mathematical Statistics with Applications . 3rd Ed., Springer, 2021

work page 2021

[14] [14]

\ Bayesian Networks

Darwiche, A. \ Bayesian Networks. Communications of the ACM , 2010, 53(12):80-90

work page 2010

[15] [15]

\ Reasoning with Probabilistic and Deterministic Graphical Models

Dechter, R. \ Reasoning with Probabilistic and Deterministic Graphical Models . \ 2nd Ed., Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan & Claypool Pubs., 2019

work page 2019

[16] [16]

and Rekatsinas, T

De Sa, C., Ilyas, I., Kimelfeld, B., Re, C. and Rekatsinas, T. \ A Formal Framework for Probabilistic Unclean Databases. Proc. ICDT 2019, pp. 6:1-6:18

work page 2019

[17] [17]

Dinic, E. A. Algorithm for Solution of a Problem of Maximum Flow in a Network with Power Estimation. Soviet Mathematics Doklady, 1970, 11:1277--1280

work page 1970

[18] [18]

aGrUM/pyAgrum: a toolbox to build models and algorithms for Probabilistic Graphical Models in Python

Ducamp, G., Gonzales, C., Wuillemin, P.-H. aGrUM/pyAgrum: a toolbox to build models and algorithms for Probabilistic Graphical Models in Python. Proc. International Conference on Probabilistic Graphical Models, 2020, PMLR

work page 2020

[19] [19]

\ Aggregation in Probabilistic Databases via Knowledge Compilation

Fink , R., Han, L., Olteanu, D. \ Aggregation in Probabilistic Databases via Knowledge Compilation. Proc. VLDB Endow. 5(5): 490-501 (2012)

work page 2012

[20] [20]

and Hill, J

Gelman, A. and Hill, J. \ Data Analysis Using Regression and Multilevel/Hierarchical Models . Cambridge Univ. Press, 2007

work page 2007

[21] [21]

and Sirangelo, C

Gheerbrant, A., Libkin, L., Rogova, A. and Sirangelo, C. \ Querying Incomplete Data: Complexity and Tractability via Datalog and First-Order Rewritings. \ Theory and Practice of Logic Programming. \ Published online on 28 November 2023

work page 2023

[22] [22]

and Kimelfeld, B

Gilad, A., Imber, A. and Kimelfeld, B. \ The Consistency of Probabilistic Databases with Independent Cells. \ Proc. ICDT, 2023

work page 2023

[23] [23]

and Spezzano, F

Greco, S., Molinaro, C. and Spezzano, F. \ Incomplete Data and Data Dependencies in Relational Databases. Synthesis Lectures in Data Management, Morgan & Claypool Pubs., 2012

work page 2012

[24] [24]

and Suciu, D

Gribkoff, E., Van den Broeck, G. and Suciu, D. \ The Most Probable Database Problem. \ Proc. BUDA 2014, collocated with SIGMOD, 2014

work page 2014

[25] [25]

, Matsui, T

Fukuda, K. , Matsui, T. \ Finding all the perfect matchings in bipartite graphs. Applied Mathematics Letters . 1994

work page 1994

[26] [26]

and Friedman, J

Hastie, T., Tibshirani, R. and Friedman, J. \ The Elements of Statistical Learning . 2nd Ed., Springer, 2017

work page 2017

[27] [27]

and Lipski Jr., W

Imielinski, T. and Lipski Jr., W. \ Incomplete Information in Relational Databases. \ Journal of the ACM , 1984, 31(4):761-791

work page 1984

[28] [28]

and Van den Broeck, G

Khosravi, P., Liang, Y., Choi, Y. and Van den Broeck, G. \ What to Expect of Classifiers? Reasoning about Logistic Regression with Missing Features. \ Proc. IJCAI, 2019, pp. 2716-2724

work page 2019

[29] [29]

Kuhn, H. W. \ The Hungarian Method for the Assignment Problem. \ Naval Research Logistics Quarterly , 1955, 2(1-2):83-97

work page 1955

[30] [30]

and Dan Suciu, D

Kumar Jha, A. and Dan Suciu, D. \ Probabilistic Databases with MarkoViews. Proc. VLDB 5(11):1160-1171, 2012

work page 2012

[31] [31]

Little, R. J. and Rubin, D. B. \ Statistical Analysis with Missing Data . \ John Wiley & Sons, 3rd Ed., 2019

work page 2019

[32] [32]

Littman, M. L. and Goldsmith, J. and Mundhenk, M. \ The Computational Complexity of Probabilistic Planning , JAIR, 1998

work page 1998

[33] [33]

MacKay, D. J. \ Information Theory, Inference, and Learning Algorithms . Cambridge University Press, 2003

work page 2003

[34] [34]

Solving Integer Minimum Cost Flows with Separable Convex Cost Objective Polynomially

Minoux, M.. Solving Integer Minimum Cost Flows with Separable Convex Cost Objective Polynomially. In: Gallo, G., Sandi, C. (eds) `Netflow at Pisa'. Mathematical Programming Studies, Springer 1986, vol 26

work page 1986

[35] [35]

and Tian, J

Mohan, K., Pearl, J. and Tian, J. \ Graphical Models for Inference with Missing Data. \ Proc NIPS , 2013, pp. 1277-1285

work page 2013

[36] [36]

On the Testability of Models with Missing Data

Mohan, K and Pearl, J. On the Testability of Models with Missing Data. \ Proc. AISTAT, 2014

work page 2014

[37] [37]

\ Graphical Models for Inference with Missing Data

Mohan, K. \ Graphical Models for Inference with Missing Data. \ PhD Thesis, UCLA, 2017. https://escholarship.org/uc/item/6mk2b174

work page 2017

[38] [38]

and Pearl, J

Mohan, K. and Pearl, J. \ Graphical Models for Processing Missing Data. \ Journal of the American Statistical Association , 2021, 116(534):1023–1037

work page 2021

[39] [39]

and Z\' a vodn\' y , J

Olteanu, D. and Z\' a vodn\' y , J. \ Size Bounds for Factorised Representations of Query Results. \ ACM TODS , 2015, 40(1)

work page 2015

[40] [40]

\ Computational Complexity , Addison-Wesley, Reading, MA, 1994

Papadimitriou, C. \ Computational Complexity , Addison-Wesley, Reading, MA, 1994

work page 1994

[41] [41]

\ Causality: Models, Reasoning and Inference

Pearl, J. \ Causality: Models, Reasoning and Inference . \ Cambridge Univ. Press, 2nd edition, 2009

work page 2009

[42] [42]

\ The Foundations of Causal Inference

Pearl, J. \ The Foundations of Causal Inference. \ Sociological Methodology , 2010, 40:75–149. http://www.jstor.org/stable/41336883

work page arXiv 2010

[43] [43]

and Suciu, D

R\' e , C., Dalvi, N. and Suciu, D. \ Efficient Top-k Query Evaluation on Probabilistic Data. \ Proc. ICDE, 2007, pp. 886-895

work page 2007

[44] [44]

and Naughton, J

R\'e, C., Dalvi, N. and Naughton, J. F. \ Trio: A System for Integrated Management of Data, Accuracy, and Lineage. Proc. CIDR, 2005, pp. 262–276

work page 2005

[45] [45]

Reiter, R.\ A Sound and Sometimes Complete Query Evaluation Algorithm for Relational Databases with Null Values. \ J. of the ACM , 1986, 33(2):349-370

work page 1986

[46] [46]

\ Factorised representations of query results: size bounds and readability.\ Proc ICDT 2012

Olteanu, Dan and Z\' a vodn\' y , Jakub. \ Factorised representations of query results: size bounds and readability.\ Proc ICDT 2012

work page 2012

[47] [47]

\ The Trichotomy of HAVING Queries on a Probabilistic Database

R\' e , C., Suciu, D. \ The Trichotomy of HAVING Queries on a Probabilistic Database. \ VLDB J. , 2009, 18(5):1091-1116

work page 2009

[48] [48]

Royden, H. L. Real Analysis , 3rd ed. Macmillan, New York, 1988

work page 1988

[49] [49]

and Suciu, D

Salimi, B., Rodriguez, L., Howe, B. and Suciu, D. \ Interventional Fairness: Causal Database Repair for Algorithmic Fairness. \ Proc. SIGMOD, 2019, pp. 793–810

work page 2019

[50] [50]

and Moslemi, M

Salimi, B., Milani, M., Pirhadi, A., Cloninger, A. and Moslemi, M. H. \ OTClean: Data Cleaning for Conditional Independence Violations using Optimal Transport. \ Proc. SIGMOD, 2024

work page 2024

[51] [51]

and Ramusat, Y

Senellart, P., Jachiet, L., Maniu, S. and Ramusat, Y. \ ProvSQL: Provenance and Probability Management in PostgreSQL. \ Proc VLDB, 2018

work page 2018

[52] [52]

\ Top-k Query Processing in Uncertain Databases

Soliman, M., Ilyas, I and Chang, K. \ Top-k Query Processing in Uncertain Databases. In Proc. ICDE 2007

work page 2007

[53] [53]

and Chang, K

Soliman, M., Ilyas, I. and Chang, K. \ Probabilistic Top-k and Ranking-Aggregate Queries. ACM TODS , 2008, 33(3)

work page 2008

[54] [54]

Enumeration complexity: Incremental time, delay and space (habilitation thesis).arXiv preprint arXiv:2309.17042,

Strozecki, Y. \ Enumeration Complexity: Incremental Time, Delay and Space. arXiv, 2023., https://arxiv.org/abs/2309.17042

work page arXiv 2023

[55] [55]

and Koch, C

Suciu, D., Olteanu, D., R\'e, C. and Koch, C. \ Probabilistic Databases . \ Synthesis Lectures on Data Management, Morgan & Claypool Pubs., 2011

work page 2011

[56] [56]

Complexity Classes Defined by Counting Quantifiers

Tor\' a n, J. Complexity Classes Defined by Counting Quantifiers. J. ACM. , 1991

work page 1991

[57] [57]

and Sequeda, J

Toussaint, E., Guagliardo, P., Libkin, L. and Sequeda, J. \ Troubles with Nulls, Views from the Users. Proc. VLDB Endow., 2022, 15(11):2613-2625

work page 2022

[58] [58]

and Pearl, J

Van den Broeck, G., Mohan, K., and Choi, A., Darwiche, A. and Pearl, J. Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data. Proc. UAI, 2013

work page 2013

[59] [59]

and Suciu, D

Van den Broeck, G. and Suciu, D. \ Query Processing on Probabilistic Data: A Survey. \ Foundations and Trends in Databases , 2015, 7(3-4):197–341. \ NOW Publishers

work page 2015

[60] [60]

Valiant, L. G. \ The Complexity of Computing the Permanent. Theoretical Computer Science , 8(2):189--201, 1979

work page 1979

[61] [61]

\ All of Statistics

Wasserman, L. \ All of Statistics . \ Springer, 2010

work page 2010

[62] [62]

Végh, L. A. (2016). A strongly polynomial algorithm for a class of minimum-cost flow problems with separable convex objectives. SIAM Journal on Computing, 45(5), 1729--1761

work page 2016

[63] [63]

Wong, S. K. M. and Wang, Z. W. \ On Axiomatization of Probabilistic Conditional Independence. \ Proc. Conf. Uncertainty in Artificial Intelligence, 1994, pp. 591–597

work page 1994

[64] [64]

Wong, S. K. M., Butz, C. J. and Wu, D. \ On the Implication Problem for Probabilistic Conditional Independency. \ IEEE Transactions on Systems, Man, and Cybernetics, Part A , 2000, 30(6):785-805

work page 2000

[65] [65]

https://stats.oarc.ucla.edu/stata/seminars/mi_in_stata_pt1_new

work page