pith. machine review for the scientific record. sign in

arxiv: 2604.21537 · v1 · submitted 2026-04-23 · 💻 cs.AI · cond-mat.stat-mech· cs.GT· cs.SI· physics.data-an

Recognition: unknown

The CriticalSet problem: Identifying Critical Contributors in Bipartite Dependency Networks

Authors on Pith no claims yet

Pith reviewed 2026-05-09 21:20 UTC · model grok-4.3

classification 💻 cs.AI cond-mat.stat-mechcs.GTcs.SIphysics.data-an
keywords bipartite graphscritical nodescentrality measuresShapley valueNP-hard problemsdependency networksset function maximizationnetwork mining
0
0 comments X

The pith

MinCov identifies near-optimal sets of critical contributors in bipartite dependency networks in linear time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines the CriticalSet problem as selecting a set of k contributors to remove from a bipartite graph so that the largest number of items become isolated. It proves the problem is NP-hard because the coverage objective is a supermodular set function, which means standard greedy selection offers no approximation guarantees. By recasting the problem as a coalitional game, the authors obtain a closed-form centrality score, ShapleyCov, that equals the expected number of items a contributor would isolate if removed at random. They then introduce MinCov, an iterative peeling procedure that repeatedly drops the contributor with the highest count of uniquely supported items after removing redundant connections. Experiments on synthetic graphs and a Wikipedia network with hundreds of millions of edges show MinCov stays within 0.02 AUC of a stochastic hill-climbing metaheuristic while running orders of magnitude faster.

Core claim

The CriticalSet problem asks for the k contributors whose removal isolates the largest number of items in a bipartite dependency graph. The objective function is supermodular, so the problem is NP-hard and standard forward-greedy algorithms provide no guarantees. Modeling the selection as a coalitional game yields the ShapleyCov centrality, defined as the expected number of isolated items attributable to a single contributor. MinCov is a linear-time iterative peeling algorithm that explicitly accounts for connection redundancy by always selecting the contributor with the most unique item supports at each step, producing solutions within 0.02 AUC of a stochastic hill-climbing metaheuristic on

What carries the argument

MinCov, a linear-time iterative peeling algorithm that selects contributors by counting their unique item supports after removing redundant connections.

If this is right

  • MinCov runs in linear time and scales to graphs with over 250 million edges.
  • MinCov stays within 0.02 AUC of a stochastic hill-climbing metaheuristic while remaining several orders of magnitude faster.
  • Both MinCov and the derived ShapleyCov centrality outperform traditional node-centrality baselines on synthetic and real bipartite dependency networks.
  • The supermodular nature of the objective explains why forward-greedy methods fail to give approximation guarantees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The coalitional-game reduction may supply similar closed-form centralities for other supermodular coverage problems where direct greedy selection lacks guarantees.
  • MinCov's explicit handling of redundant connections suggests it could be adapted to identify single points of failure in supply-chain or infrastructure networks that admit a bipartite dependency representation.
  • The small remaining optimality gap invites lightweight local-search post-processing that could close the gap further without sacrificing the linear-time scaling.

Load-bearing premise

That treating the isolation count as the value function in a coalitional game produces a Shapley value whose ranking reliably guides selection toward high-performing sets on the original CriticalSet objective.

What would settle it

On a collection of bipartite graphs, run both MinCov and the stochastic hill-climbing metaheuristic to completion and check whether the AUC gap between the two exceeds 0.05 or whether MinCov loses its orders-of-magnitude speed advantage.

Figures

Figures reproduced from arXiv: 2604.21537 by Andrea Tagarelli, Sebastiano A. Piccolo.

Figure 1
Figure 1. Figure 1 [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Coverage curves on real datasets (selection) [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Zoomed-in comparison between MinCov, ShapleyCov, Forward Greedy and SHC. The letter indicates the synthetic graph configuration (refer to [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Running time comparison between MinCov, ShapleyCov, FG, and SHC. remains highly competitive (within 0.05 of SHC), which is remarkable given it is a single-pass centrality measure. In contrast, all other baselines exhibit significantly larger approximation gaps. These results also highlight the inadequacy of standard centralities for CriticalSet: Forward Greedy (FG) is consistently outperformed, and Between… view at source ↗
read the original abstract

Identifying critical nodes in complex networks is a fundamental task in graph mining. Yet, methods addressing an all-or-nothing coverage mechanics in a bipartite dependency network, a graph with two types of nodes where edges represent dependency relationships across the two groups only, remain largely unexplored. We formalize the CriticalSet problem: given an arbitrary bipartite graph modeling dependencies of items on contributors, identify the set of k contributors whose removal isolates the largest number of items. We prove that this problem is NP-hard and requires maximizing a supermodular set function, for which standard forward greedy algorithms provide no approximation guarantees. Consequently, we model CriticalSet as a coalitional game, deriving a closed-form centrality, ShapleyCov, based on the Shapley value. This measure can be interpreted as the expected number of items isolated by a contributor's departure. Leveraging these insights, we propose MinCov, a linear-time iterative peeling algorithm that explicitly accounts for connection redundancy, prioritizing contributors who uniquely support many items. Extensive experiments on synthetic and large-scale real datasets, including a Wikipedia graph with over 250 million edges, reveal that MinCov and ShapleyCov significantly outperform traditional baselines. Notably, MinCov achieves near-optimal performance, within 0.02 AUC of a Stochastic Hill Climbing metaheuristic, while remaining several orders of magnitude faster.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The manuscript formalizes the CriticalSet problem of selecting a set of k contributors in a bipartite dependency graph to maximize the number of isolated items. It proves NP-hardness and supermodularity of the objective function. The authors model the problem as a coalitional game and derive the closed-form ShapleyCov centrality measure (expected isolation count). They propose MinCov, a linear-time iterative peeling algorithm that prioritizes contributors with unique support. Experiments on synthetic graphs and real-world datasets (including a Wikipedia graph with >250M edges) show MinCov and ShapleyCov outperforming baselines, with MinCov within 0.02 AUC of Stochastic Hill Climbing while being orders of magnitude faster.

Significance. If the central claims hold, the work supplies a practical, scalable method for critical contributor identification in bipartite networks with all-or-nothing coverage, grounded in hardness results and cooperative game theory. Strengths include the NP-hardness and supermodularity proofs, the closed-form ShapleyCov derivation, and experiments demonstrating scalability to graphs with hundreds of millions of edges. This could inform applications in dependency analysis, knowledge graphs, and network resilience. The significance is reduced by the absence of exact optimality verification for the key empirical claim.

major comments (1)
  1. [§5] §5 (Experimental Evaluation) and abstract: the central claim that MinCov achieves near-optimal performance (within 0.02 AUC of Stochastic Hill Climbing) rests on comparison to a single metaheuristic without approximation guarantees. No results are reported comparing either method to exact optima (via ILP or enumeration) on small synthetic instances where such computation is feasible. Because CriticalSet is NP-hard and the objective is supermodular maximization, distance to SHC does not establish closeness to the true optimum.
minor comments (3)
  1. [§3.2] §3.2: the value function v(S) of the coalitional game is defined implicitly; an explicit equation would improve readability when deriving the ShapleyCov formula.
  2. [§4.1] §4.1: the description of MinCov's peeling step could clarify how redundancy is quantified (e.g., via explicit pseudocode or a small worked example).
  3. [Table 2] Table 2 and §5.3: the AUC metric is used for ranking quality but its precise definition (e.g., how isolated-item curves are constructed) is not stated, complicating reproduction.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their detailed review and constructive feedback. The primary concern regarding the experimental evaluation of near-optimality is addressed point-by-point below, along with planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§5] §5 (Experimental Evaluation) and abstract: the central claim that MinCov achieves near-optimal performance (within 0.02 AUC of Stochastic Hill Climbing) rests on comparison to a single metaheuristic without approximation guarantees. No results are reported comparing either method to exact optima (via ILP or enumeration) on small synthetic instances where such computation is feasible. Because CriticalSet is NP-hard and the objective is supermodular maximization, distance to SHC does not establish closeness to the true optimum.

    Authors: We agree that proximity to Stochastic Hill Climbing (SHC) alone does not rigorously establish closeness to the global optimum for this NP-hard supermodular maximization problem. To directly address this, we will revise §5 to include a new set of experiments comparing MinCov and SHC against exact optima. Specifically, we will solve small synthetic instances (with |C| ≤ 20 contributors) using an ILP formulation of CriticalSet via a standard solver such as Gurobi, reporting optimality gaps for both methods across varying graph densities and k values. This addition will provide quantitative evidence of how close the reported solutions are to true optima on instances where exact computation is tractable. For larger instances, we retain SHC as a strong practical baseline while noting its lack of guarantees. revision: yes

Circularity Check

0 steps flagged

No circularity; ShapleyCov applies standard game-theoretic formula to defined game and MinCov is an independent heuristic

full rationale

The paper defines CriticalSet as an NP-hard supermodular maximization problem (external proof), then models it as a coalitional game and computes the Shapley value in closed form to obtain ShapleyCov as expected isolation count. This is a direct, non-circular application of the standard Shapley formula from cooperative game theory to the newly defined characteristic function; the result is not equivalent to the input by construction. MinCov is introduced separately as a linear-time peeling heuristic that accounts for redundancy, with performance claims resting on empirical comparisons to baselines and Stochastic Hill Climbing rather than any fitted parameter or self-citation chain. No load-bearing step reduces to a prior result by the same authors or renames a known pattern as a derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, the central claims rest on standard graph-theoretic definitions of bipartite dependency and the standard definition of the Shapley value; no additional free parameters, ad-hoc axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5547 in / 1201 out tokens · 45378 ms · 2026-05-09T21:20:52.524278+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 4 canonical work pages

  1. [1]

    Albert,R.,Jeong,H.,Barabási,A.L.:Errorandattacktoleranceofcomplexnetworks.nature406(6794),378–382 (2000)

  2. [2]

    Nature Reviews Physics6(2), 114–131 (2024)

    Artime, O., Grassia, M., De Domenico, M., Gleeson, J.P., Makse, H.A., Mangioni, G., Perc, M., Radicchi, F.: Robustness and resilience of complex networks. Nature Reviews Physics6(2), 114–131 (2024)

  3. [3]

    In: Proc

    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A nucleus for a web of open data. In: Proc. Int. Semant. Web Conf. pp. 722–735 (2008)

  4. [4]

    4 Guilherme Avelino, Leonardo Passos, Andre Hora, and Marco Tulio Valente

    Avelino, G., Passos, L., Hora, A., Valente, M.T.: A novel approach for estimating truck factors. In: 2016 IEEE 24th International Conference on Program Comprehension (ICPC). pp. 1–10 (2016). https://doi.org/10.1109/ICPC.2016.7503718

  5. [5]

    Banerjee,S.,Jenamani,M.,Pratihar,D.K.:Asurveyoninfluencemaximizationinasocialnetwork.Knowledge and Information Systems62(9), 3417–3455 (2020)

  6. [6]

    In: International workshop on web and internet economics

    Bharathi, S., Kempe, D., Salek, M.: Competitive influence maximization in social networks. In: International workshop on web and internet economics. pp. 306–311. Springer (2007)

  7. [7]

    In: Proceedings of the forty-second ACM symposium on Theory of computing

    Bhaskara, A., Charikar, M., Chlamtac, E., Feige, U., Vijayaraghavan, A.: Detecting high log-densities: an o (n 1/4) approximation for densest k-subgraph. In: Proceedings of the forty-second ACM symposium on Theory of computing. pp. 201–210 (2010)

  8. [8]

    Social networks19(3), 243–269 (1997)

    Borgatti, S.P., Everett, M.G.: Network analysis of 2-mode data. Social networks19(3), 243–269 (1997)

  9. [9]

    In: Proceedings of Computational Com- plexity

    Crescenzi, P.: A short guide to approximation preserving reductions. In: Proceedings of Computational Com- plexity. Twelfth Annual IEEE Conference. pp. 262–273. IEEE (1997)

  10. [10]

    Feige, U., Seltser, M.: On the densest k-subgraph problems (1997)

  11. [11]

    Algorithmica29(3), 410–421 (2001)

    Feige, U., Peleg, D., Kortsarz, G.: The dense k-subgraph problem. Algorithmica29(3), 410–421 (2001)

  12. [12]

    Goyal,A.,Bonchi,F.,Lakshmanan,L.V.:Adata-basedapproachtosocialinfluencemaximization.Proceedings of the VLDB Endowment5(1) (2011)

  13. [13]

    Acm transactions on interactive intelligent systems (tiis)5(4), 1–19 (2015)

    Harper, F.M., Konstan, J.A.: The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis)5(4), 1–19 (2015)

  14. [14]

    Hogg, T., Lerman, K.: Social dynamics of Digg. Eur. Phys. J. Data Sci.1(5) (2012)

  15. [15]

    Jabrayilzade, E., Evtikhiev, M., Tüzün, E., Kovalenko, V.: Bus factor in practice. In: Proceedings of the 44th InternationalConferenceonSoftwareEngineering:SoftwareEngineeringinPractice.p.97–106.ICSE-SEIP’22, AssociationforComputingMachinery,NewYork,NY,USA(2022).https://doi.org/10.1145/3510457.3513082, https://doi.org/10.1145/3510457.3513082 14 S.A. Picco...

  16. [16]

    In: Proc

    Jindal, N., Liu, B.: Opinion spam and analysis. In: Proc. Int. Conf. on Web Search and Web Data Min. pp. 219–230 (2008)

  17. [17]

    In: Pro- ceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

    Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through a social network. In: Pro- ceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 137–146 (2003)

  18. [18]

    SIAM Journal on Computing36(4), 1025–1071 (2006)

    Khot, S.: Ruling out ptas for graph min-bisection, dense k-subgraph, and bipartite clique. SIAM Journal on Computing36(4), 1025–1071 (2006)

  19. [19]

    In: Proceedings of the 22nd international conference on world wide web

    Kunegis, J.: Konect: the koblenz network collection. In: Proceedings of the 22nd international conference on world wide web. pp. 1343–1350 (2013)

  20. [20]

    ACM Computing Surveys56(8), 1–40 (2024)

    Lanciano,T.,Miyauchi,A.,Fazzone,A.,Bonchi,F.:Asurveyonthedensestsubgraphproblemanditsvariants. ACM Computing Surveys56(8), 1–40 (2024)

  21. [21]

    Lichman, M.: UCI Machine Learning Repository (2013), http://archive.ics.uci.edu/ml

  22. [22]

    Liu,J.,Xiong,Q.,Shi,W.,Shi,X.,Wang,K.:Evaluatingtheimportanceofnodesincomplexnetworks.Physica A: Statistical Mechanics and its Applications452, 209–219 (2016)

  23. [23]

    Knowledge-Based Systems84, 56–66 (2015)

    Liu,Z.,Jiang,C.,Wang,J.,Yu,H.:Thenodeimportanceinactualcomplexnetworksbasedonamulti-attribute ranking method. Knowledge-Based Systems84, 56–66 (2015)

  24. [24]

    Manurangsi,P.:Almost-polynomialratioeth-hardnessofapproximatingdensestk-subgraph.In:Proceedingsof the 49th Annual ACM SIGACT Symposium on Theory of Computing. pp. 954–961 (2017)

  25. [25]

    In: Proc

    Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: Proc. Internet Measurement Conf. (2007)

  26. [26]

    Proceedings of the national academy of sciences101, 5200–5205 (2004)

    Newman, M.E.: Coauthorship networks and patterns of scientific collaboration. Proceedings of the national academy of sciences101, 5200–5205 (2004)

  27. [27]

    LNCS, Springer (2024)

    Piccolo, S.A., De Meo, P., Terracina, G.: Evaluating and improving projects’ bus-factor: a network analytical framework.In:16thInternationalConference,ASONAM2024,Rende,Italy,September2–5,2024,Proceedings, Part I. LNCS, Springer (2024)

  28. [28]

    Design Science4, e1 (2018)

    Piccolo,S.A.,Lehmann,S.,Maier,A.:Designprocessrobustness:abipartitenetworkanalysisrevealsthecentral importance of people. Design Science4, e1 (2018)

  29. [29]

    In: Proceedings of the Forty-Second ACM Symposium on Theory of Computing

    Raghavendra, P., Steurer, D.: Graph expansion and the unique games conjecture. In: Proceedings of the Forty-Second ACM Symposium on Theory of Computing. p. 755–764. STOC ’10, Associa- tion for Computing Machinery, New York, NY, USA (2010). https://doi.org/10.1145/1806689.1806792, https://doi.org/10.1145/1806689.1806792

  30. [30]

    In: Proceedings of the AAAI conference on artificial intelligence (2015)

    Rossi, R., Ahmed, N.: The network data repository with interactive graph analytics and visualization. In: Proceedings of the AAAI conference on artificial intelligence (2015)

  31. [31]

    In: Proc

    Schelter, S., Kunegis, J.: Tracking the trackers: A large-scale analysis of embedded web trackers. In: Proc. Int. Conf. on Web and Soc. Media (2016)

  32. [32]

    In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

    Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 990–998 (2008)

  33. [33]

    arXiv preprint arXiv:1801.00218 (2017)

    Tarkowski, M.K., Michalak, T.P., Rahwan, T., Wooldridge, M.: Game-theoretic network centrality: A review. arXiv preprint arXiv:1801.00218 (2017)