arxiv: 2604.21537 · v1 · submitted 2026-04-23 · 💻 cs.AI · cond-mat.stat-mech· cs.GT· cs.SI· physics.data-an

Recognition: unknown

The CriticalSet problem: Identifying Critical Contributors in Bipartite Dependency Networks

Sebastiano A. Piccolo , Andrea Tagarelli

Authors on Pith no claims yet

Pith reviewed 2026-05-09 21:20 UTC · model grok-4.3

classification 💻 cs.AI cond-mat.stat-mechcs.GTcs.SIphysics.data-an

keywords bipartite graphscritical nodescentrality measuresShapley valueNP-hard problemsdependency networksset function maximizationnetwork mining

0 comments

The pith

MinCov identifies near-optimal sets of critical contributors in bipartite dependency networks in linear time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines the CriticalSet problem as selecting a set of k contributors to remove from a bipartite graph so that the largest number of items become isolated. It proves the problem is NP-hard because the coverage objective is a supermodular set function, which means standard greedy selection offers no approximation guarantees. By recasting the problem as a coalitional game, the authors obtain a closed-form centrality score, ShapleyCov, that equals the expected number of items a contributor would isolate if removed at random. They then introduce MinCov, an iterative peeling procedure that repeatedly drops the contributor with the highest count of uniquely supported items after removing redundant connections. Experiments on synthetic graphs and a Wikipedia network with hundreds of millions of edges show MinCov stays within 0.02 AUC of a stochastic hill-climbing metaheuristic while running orders of magnitude faster.

Core claim

The CriticalSet problem asks for the k contributors whose removal isolates the largest number of items in a bipartite dependency graph. The objective function is supermodular, so the problem is NP-hard and standard forward-greedy algorithms provide no guarantees. Modeling the selection as a coalitional game yields the ShapleyCov centrality, defined as the expected number of isolated items attributable to a single contributor. MinCov is a linear-time iterative peeling algorithm that explicitly accounts for connection redundancy by always selecting the contributor with the most unique item supports at each step, producing solutions within 0.02 AUC of a stochastic hill-climbing metaheuristic on

What carries the argument

MinCov, a linear-time iterative peeling algorithm that selects contributors by counting their unique item supports after removing redundant connections.

If this is right

MinCov runs in linear time and scales to graphs with over 250 million edges.
MinCov stays within 0.02 AUC of a stochastic hill-climbing metaheuristic while remaining several orders of magnitude faster.
Both MinCov and the derived ShapleyCov centrality outperform traditional node-centrality baselines on synthetic and real bipartite dependency networks.
The supermodular nature of the objective explains why forward-greedy methods fail to give approximation guarantees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The coalitional-game reduction may supply similar closed-form centralities for other supermodular coverage problems where direct greedy selection lacks guarantees.
MinCov's explicit handling of redundant connections suggests it could be adapted to identify single points of failure in supply-chain or infrastructure networks that admit a bipartite dependency representation.
The small remaining optimality gap invites lightweight local-search post-processing that could close the gap further without sacrificing the linear-time scaling.

Load-bearing premise

That treating the isolation count as the value function in a coalitional game produces a Shapley value whose ranking reliably guides selection toward high-performing sets on the original CriticalSet objective.

What would settle it

On a collection of bipartite graphs, run both MinCov and the stochastic hill-climbing metaheuristic to completion and check whether the AUC gap between the two exceeds 0.05 or whether MinCov loses its orders-of-magnitude speed advantage.

Figures

Figures reproduced from arXiv: 2604.21537 by Andrea Tagarelli, Sebastiano A. Piccolo.

**Figure 2.** Figure 2: Coverage curves on real datasets (selection) [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Zoomed-in comparison between MinCov, ShapleyCov, Forward Greedy and SHC. The letter indicates the synthetic graph configuration (refer to [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Running time comparison between MinCov, ShapleyCov, FG, and SHC. remains highly competitive (within 0.05 of SHC), which is remarkable given it is a single-pass centrality measure. In contrast, all other baselines exhibit significantly larger approximation gaps. These results also highlight the inadequacy of standard centralities for CriticalSet: Forward Greedy (FG) is consistently outperformed, and Between… view at source ↗

read the original abstract

Identifying critical nodes in complex networks is a fundamental task in graph mining. Yet, methods addressing an all-or-nothing coverage mechanics in a bipartite dependency network, a graph with two types of nodes where edges represent dependency relationships across the two groups only, remain largely unexplored. We formalize the CriticalSet problem: given an arbitrary bipartite graph modeling dependencies of items on contributors, identify the set of k contributors whose removal isolates the largest number of items. We prove that this problem is NP-hard and requires maximizing a supermodular set function, for which standard forward greedy algorithms provide no approximation guarantees. Consequently, we model CriticalSet as a coalitional game, deriving a closed-form centrality, ShapleyCov, based on the Shapley value. This measure can be interpreted as the expected number of items isolated by a contributor's departure. Leveraging these insights, we propose MinCov, a linear-time iterative peeling algorithm that explicitly accounts for connection redundancy, prioritizing contributors who uniquely support many items. Extensive experiments on synthetic and large-scale real datasets, including a Wikipedia graph with over 250 million edges, reveal that MinCov and ShapleyCov significantly outperform traditional baselines. Notably, MinCov achieves near-optimal performance, within 0.02 AUC of a Stochastic Hill Climbing metaheuristic, while remaining several orders of magnitude faster.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces CriticalSet for bipartite dependency graphs and a fast linear-time MinCov heuristic that scales to 250M-edge instances, but its near-optimality claim only compares against another heuristic.

read the letter

The paper's core offering is a clean formalization of the CriticalSet problem—selecting k contributors to remove in order to isolate the maximum number of items in a bipartite dependency graph—plus a linear-time peeling algorithm called MinCov that explicitly handles redundant connections. They prove NP-hardness and supermodularity of the objective, then derive ShapleyCov as the Shapley value of the corresponding coalitional game, which has a closed-form interpretation as expected isolated items per contributor. MinCov is a simple iterative procedure that prioritizes unique supporters, and the experiments show it beating standard centrality baselines on both synthetic data and a real Wikipedia graph with over 250 million edges while staying within 0.02 AUC of a stochastic hill-climbing metaheuristic and running orders of magnitude faster. That scale and the practical runtime are the strongest parts of the work. The main limitation is that the near-optimality statement rests entirely on distance to the hill-climbing heuristic; the paper does not report exact optima via ILP or enumeration on any small instances where that would be feasible. Without that check, the 0.02 gap only tells us MinCov is close to one other heuristic, not that it is close to the true optimum. The Shapley construction itself is a direct application of standard cooperative game theory, so the novelty sits more in the problem definition and the scalable algorithm than in new theory. This is useful reading for people working on network mining or dependency analysis in supply chains, software, or knowledge bases. A reader who needs a fast heuristic for this style of all-or-nothing coverage on large bipartite graphs will get concrete value from the algorithm and the large-scale results. I would send it for peer review; the formal problem statement, the explicit hardness and supermodularity arguments, and the real-world scale of the experiments are enough to justify referee time, even if the empirical validation against exact optima needs strengthening on modest instances.

Referee Report

1 major / 3 minor

Summary. The manuscript formalizes the CriticalSet problem of selecting a set of k contributors in a bipartite dependency graph to maximize the number of isolated items. It proves NP-hardness and supermodularity of the objective function. The authors model the problem as a coalitional game and derive the closed-form ShapleyCov centrality measure (expected isolation count). They propose MinCov, a linear-time iterative peeling algorithm that prioritizes contributors with unique support. Experiments on synthetic graphs and real-world datasets (including a Wikipedia graph with >250M edges) show MinCov and ShapleyCov outperforming baselines, with MinCov within 0.02 AUC of Stochastic Hill Climbing while being orders of magnitude faster.

Significance. If the central claims hold, the work supplies a practical, scalable method for critical contributor identification in bipartite networks with all-or-nothing coverage, grounded in hardness results and cooperative game theory. Strengths include the NP-hardness and supermodularity proofs, the closed-form ShapleyCov derivation, and experiments demonstrating scalability to graphs with hundreds of millions of edges. This could inform applications in dependency analysis, knowledge graphs, and network resilience. The significance is reduced by the absence of exact optimality verification for the key empirical claim.

major comments (1)

[§5] §5 (Experimental Evaluation) and abstract: the central claim that MinCov achieves near-optimal performance (within 0.02 AUC of Stochastic Hill Climbing) rests on comparison to a single metaheuristic without approximation guarantees. No results are reported comparing either method to exact optima (via ILP or enumeration) on small synthetic instances where such computation is feasible. Because CriticalSet is NP-hard and the objective is supermodular maximization, distance to SHC does not establish closeness to the true optimum.

minor comments (3)

[§3.2] §3.2: the value function v(S) of the coalitional game is defined implicitly; an explicit equation would improve readability when deriving the ShapleyCov formula.
[§4.1] §4.1: the description of MinCov's peeling step could clarify how redundancy is quantified (e.g., via explicit pseudocode or a small worked example).
[Table 2] Table 2 and §5.3: the AUC metric is used for ranking quality but its precise definition (e.g., how isolated-item curves are constructed) is not stated, complicating reproduction.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their detailed review and constructive feedback. The primary concern regarding the experimental evaluation of near-optimality is addressed point-by-point below, along with planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§5] §5 (Experimental Evaluation) and abstract: the central claim that MinCov achieves near-optimal performance (within 0.02 AUC of Stochastic Hill Climbing) rests on comparison to a single metaheuristic without approximation guarantees. No results are reported comparing either method to exact optima (via ILP or enumeration) on small synthetic instances where such computation is feasible. Because CriticalSet is NP-hard and the objective is supermodular maximization, distance to SHC does not establish closeness to the true optimum.

Authors: We agree that proximity to Stochastic Hill Climbing (SHC) alone does not rigorously establish closeness to the global optimum for this NP-hard supermodular maximization problem. To directly address this, we will revise §5 to include a new set of experiments comparing MinCov and SHC against exact optima. Specifically, we will solve small synthetic instances (with |C| ≤ 20 contributors) using an ILP formulation of CriticalSet via a standard solver such as Gurobi, reporting optimality gaps for both methods across varying graph densities and k values. This addition will provide quantitative evidence of how close the reported solutions are to true optima on instances where exact computation is tractable. For larger instances, we retain SHC as a strong practical baseline while noting its lack of guarantees. revision: yes

Circularity Check

0 steps flagged

No circularity; ShapleyCov applies standard game-theoretic formula to defined game and MinCov is an independent heuristic

full rationale

The paper defines CriticalSet as an NP-hard supermodular maximization problem (external proof), then models it as a coalitional game and computes the Shapley value in closed form to obtain ShapleyCov as expected isolation count. This is a direct, non-circular application of the standard Shapley formula from cooperative game theory to the newly defined characteristic function; the result is not equivalent to the input by construction. MinCov is introduced separately as a linear-time peeling heuristic that accounts for redundancy, with performance claims resting on empirical comparisons to baselines and Stochastic Hill Climbing rather than any fitted parameter or self-citation chain. No load-bearing step reduces to a prior result by the same authors or renames a known pattern as a derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, the central claims rest on standard graph-theoretic definitions of bipartite dependency and the standard definition of the Shapley value; no additional free parameters, ad-hoc axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5547 in / 1201 out tokens · 45378 ms · 2026-05-09T21:20:52.524278+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 4 canonical work pages

[1]

Albert,R.,Jeong,H.,Barabási,A.L.:Errorandattacktoleranceofcomplexnetworks.nature406(6794),378–382 (2000)

2000
[2]

Nature Reviews Physics6(2), 114–131 (2024)

Artime, O., Grassia, M., De Domenico, M., Gleeson, J.P., Makse, H.A., Mangioni, G., Perc, M., Radicchi, F.: Robustness and resilience of complex networks. Nature Reviews Physics6(2), 114–131 (2024)

2024
[3]

In: Proc

Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A nucleus for a web of open data. In: Proc. Int. Semant. Web Conf. pp. 722–735 (2008)

2008
[4]

4 Guilherme Avelino, Leonardo Passos, Andre Hora, and Marco Tulio Valente

Avelino, G., Passos, L., Hora, A., Valente, M.T.: A novel approach for estimating truck factors. In: 2016 IEEE 24th International Conference on Program Comprehension (ICPC). pp. 1–10 (2016). https://doi.org/10.1109/ICPC.2016.7503718

work page doi:10.1109/icpc.2016.7503718 2016
[5]

Banerjee,S.,Jenamani,M.,Pratihar,D.K.:Asurveyoninfluencemaximizationinasocialnetwork.Knowledge and Information Systems62(9), 3417–3455 (2020)

2020
[6]

In: International workshop on web and internet economics

Bharathi, S., Kempe, D., Salek, M.: Competitive influence maximization in social networks. In: International workshop on web and internet economics. pp. 306–311. Springer (2007)

2007
[7]

In: Proceedings of the forty-second ACM symposium on Theory of computing

Bhaskara, A., Charikar, M., Chlamtac, E., Feige, U., Vijayaraghavan, A.: Detecting high log-densities: an o (n 1/4) approximation for densest k-subgraph. In: Proceedings of the forty-second ACM symposium on Theory of computing. pp. 201–210 (2010)

2010
[8]

Social networks19(3), 243–269 (1997)

Borgatti, S.P., Everett, M.G.: Network analysis of 2-mode data. Social networks19(3), 243–269 (1997)

1997
[9]

In: Proceedings of Computational Com- plexity

Crescenzi, P.: A short guide to approximation preserving reductions. In: Proceedings of Computational Com- plexity. Twelfth Annual IEEE Conference. pp. 262–273. IEEE (1997)

1997
[10]

Feige, U., Seltser, M.: On the densest k-subgraph problems (1997)

1997
[11]

Algorithmica29(3), 410–421 (2001)

Feige, U., Peleg, D., Kortsarz, G.: The dense k-subgraph problem. Algorithmica29(3), 410–421 (2001)

2001
[12]

Goyal,A.,Bonchi,F.,Lakshmanan,L.V.:Adata-basedapproachtosocialinfluencemaximization.Proceedings of the VLDB Endowment5(1) (2011)

2011
[13]

Acm transactions on interactive intelligent systems (tiis)5(4), 1–19 (2015)

Harper, F.M., Konstan, J.A.: The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis)5(4), 1–19 (2015)

2015
[14]

Hogg, T., Lerman, K.: Social dynamics of Digg. Eur. Phys. J. Data Sci.1(5) (2012)

2012
[15]

Jabrayilzade, E., Evtikhiev, M., Tüzün, E., Kovalenko, V.: Bus factor in practice. In: Proceedings of the 44th InternationalConferenceonSoftwareEngineering:SoftwareEngineeringinPractice.p.97–106.ICSE-SEIP’22, AssociationforComputingMachinery,NewYork,NY,USA(2022).https://doi.org/10.1145/3510457.3513082, https://doi.org/10.1145/3510457.3513082 14 S.A. Picco...

work page doi:10.1145/3510457.3513082 2022
[16]

In: Proc

Jindal, N., Liu, B.: Opinion spam and analysis. In: Proc. Int. Conf. on Web Search and Web Data Min. pp. 219–230 (2008)

2008
[17]

In: Pro- ceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through a social network. In: Pro- ceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 137–146 (2003)

2003
[18]

SIAM Journal on Computing36(4), 1025–1071 (2006)

Khot, S.: Ruling out ptas for graph min-bisection, dense k-subgraph, and bipartite clique. SIAM Journal on Computing36(4), 1025–1071 (2006)

2006
[19]

In: Proceedings of the 22nd international conference on world wide web

Kunegis, J.: Konect: the koblenz network collection. In: Proceedings of the 22nd international conference on world wide web. pp. 1343–1350 (2013)

2013
[20]

ACM Computing Surveys56(8), 1–40 (2024)

Lanciano,T.,Miyauchi,A.,Fazzone,A.,Bonchi,F.:Asurveyonthedensestsubgraphproblemanditsvariants. ACM Computing Surveys56(8), 1–40 (2024)

2024
[21]

Lichman, M.: UCI Machine Learning Repository (2013), http://archive.ics.uci.edu/ml

2013
[22]

Liu,J.,Xiong,Q.,Shi,W.,Shi,X.,Wang,K.:Evaluatingtheimportanceofnodesincomplexnetworks.Physica A: Statistical Mechanics and its Applications452, 209–219 (2016)

2016
[23]

Knowledge-Based Systems84, 56–66 (2015)

Liu,Z.,Jiang,C.,Wang,J.,Yu,H.:Thenodeimportanceinactualcomplexnetworksbasedonamulti-attribute ranking method. Knowledge-Based Systems84, 56–66 (2015)

2015
[24]

Manurangsi,P.:Almost-polynomialratioeth-hardnessofapproximatingdensestk-subgraph.In:Proceedingsof the 49th Annual ACM SIGACT Symposium on Theory of Computing. pp. 954–961 (2017)

2017
[25]

In: Proc

Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: Proc. Internet Measurement Conf. (2007)

2007
[26]

Proceedings of the national academy of sciences101, 5200–5205 (2004)

Newman, M.E.: Coauthorship networks and patterns of scientific collaboration. Proceedings of the national academy of sciences101, 5200–5205 (2004)

2004
[27]

LNCS, Springer (2024)

Piccolo, S.A., De Meo, P., Terracina, G.: Evaluating and improving projects’ bus-factor: a network analytical framework.In:16thInternationalConference,ASONAM2024,Rende,Italy,September2–5,2024,Proceedings, Part I. LNCS, Springer (2024)

2024
[28]

Design Science4, e1 (2018)

Piccolo,S.A.,Lehmann,S.,Maier,A.:Designprocessrobustness:abipartitenetworkanalysisrevealsthecentral importance of people. Design Science4, e1 (2018)

2018
[29]

In: Proceedings of the Forty-Second ACM Symposium on Theory of Computing

Raghavendra, P., Steurer, D.: Graph expansion and the unique games conjecture. In: Proceedings of the Forty-Second ACM Symposium on Theory of Computing. p. 755–764. STOC ’10, Associa- tion for Computing Machinery, New York, NY, USA (2010). https://doi.org/10.1145/1806689.1806792, https://doi.org/10.1145/1806689.1806792

work page doi:10.1145/1806689.1806792 2010
[30]

In: Proceedings of the AAAI conference on artificial intelligence (2015)

Rossi, R., Ahmed, N.: The network data repository with interactive graph analytics and visualization. In: Proceedings of the AAAI conference on artificial intelligence (2015)

2015
[31]

In: Proc

Schelter, S., Kunegis, J.: Tracking the trackers: A large-scale analysis of embedded web trackers. In: Proc. Int. Conf. on Web and Soc. Media (2016)

2016
[32]

In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 990–998 (2008)

2008
[33]

arXiv preprint arXiv:1801.00218 (2017)

Tarkowski, M.K., Michalak, T.P., Rahwan, T., Wooldridge, M.: Game-theoretic network centrality: A review. arXiv preprint arXiv:1801.00218 (2017)

work page arXiv 2017