pith. sign in

arxiv: 1906.08541 · v1 · pith:UAZQ5XVLnew · submitted 2019-06-20 · 💻 cs.LG · stat.ML

Regional based query in graph active learning

Pith reviewed 2026-05-25 19:50 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords active learninggraph neural networksnode classificationuncertainty samplingpage rankregional queries
0
0 comments X

The pith

Neighbor-based uncertainty criteria outperform single-node uncertainty for active learning on graphs once labeled nodes reach one per average degree.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops two selection criteria for choosing which nodes to query in graph node classification with graph convolution networks. Because node classes tend to correlate with neighbors, it replaces the usual per-node uncertainty with measures that look at uncertainty in neighboring nodes. One criterion extends standard uncertainty sampling regionally; the other adapts the page-rank algorithm to the same goal. The page-rank version is shown to be best when very few nodes are labeled, while the regional uncertainty version surpasses all prior methods once the labeled fraction reaches roughly one over the average degree. The approach is presented as potentially applicable to any classification setting that supplies a distance between samples.

Core claim

We propose two regional query criteria for active learning on graphs. The page-rank extension is optimal at low fractions of tagged nodes. When the tagged fraction grows to one over the average degree, the regional uncertainty performs better than all existing methods.

What carries the argument

Regional uncertainty, which selects nodes by the uncertainty in the class of their neighbors rather than by their own class uncertainty.

If this is right

  • The page-rank extension gives optimal performance when the fraction of tagged nodes is low.
  • Regional uncertainty becomes the best choice once the labeled fraction reaches one over the average degree.
  • The same regional criteria can be applied to any classification problem supplied with a distance metric between samples.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Active learning on graphs may gain more from local structure than current per-sample uncertainty methods capture.
  • A switching rule that uses page-rank at very low label counts and regional uncertainty at moderate counts could combine the strengths of both.
  • Defining suitable distances on non-graph data might let the same neighbor-based selection improve active learning outside networks.

Load-bearing premise

The class of each node is often related to the class of its neighbors.

What would settle it

An experiment on a graph whose node labels are assigned independently of neighbor labels, showing that neither proposed method beats standard uncertainty sampling, would falsify the advantage.

Figures

Figures reproduced from arXiv: 1906.08541 by Roy Abel, Yoram Louzoun.

Figure 1
Figure 1. Figure 1: Schematic figures: Lower plot - Regional uncertainty: Given the classes’ probability of each node marked as P(y = class1)|P(y = class2) (resulting from the classification or from a prior), local uncertainty techniques would give the highest score to nodes 3 and 10 (marked in red). However, knowing the label of these nodes would have a minimal effect on their region. We wish to reveal the label of the node … view at source ↗
Figure 2
Figure 2. Figure 2: Accuracy, Micro and Macro F1 as a function of training set fraction in 3 datasets for different learning methods. We tested for three reported datasets multiple precision estimates as a function of the training set fraction. We have tested four algorithms: GXBoost, FFN, GCN and RF. For each algorithm, we tested three types of input, the neighbors tag, topological features of the node and the combination of… view at source ↗
Figure 3
Figure 3. Figure 3: Upper plot. Accuracy as a function of sampling fraction for different datasets and different local AL methods. results lower than the random sampling show AL algorithms that are worse than random sampling. The two subplots surrounded by a box are the results with the BOW. Lower plot difference between random sampling and each of the AL methods at the last time point sampled (15 % when no input is used and … view at source ↗
Figure 4
Figure 4. Figure 4: Upper plot and lower plot follow figure 3. The only difference is that the upper plot in this figure is the difference with random, so positive values represent higher accuracy than the one obtained in random sampling, while negative accuracy are worse than random sampling. adaptive page rank. Those two measures can be combined with existing active learning techniques to achieve even better performance. Wh… view at source ↗
Figure 5
Figure 5. Figure 5: Distance to randomly sampled nodes as a function of the fraction of sampled fraction. Within a sampling fraction of 5 %, the average distance to a sampled node reaches a fixed distance. [2] D. Berberidis and G. B. Giannakis, “Data-adaptive active sampling for efficient graph-cognizant classification,” IEEE Transactions on Signal Processing, vol. 66, no. 19, pp. 5167–5179, 2018. [3] X. Zhu, Z. Ghahramani, a… view at source ↗
read the original abstract

Graph convolution networks (GCN) have emerged as the leading method to classify node classes in networks, and have reached the highest accuracy in multiple node classification tasks. In the absence of available tagged samples, active learning methods have been developed to obtain the highest accuracy using the minimal number of queries to an oracle. The current best active learning methods use the sample class uncertainty as selection criteria. However, in graph based classification, the class of each node is often related to the class of its neighbors. As such, the uncertainty in the class of a node's neighbor may be a more appropriate selection criterion. We here propose two such criteria, one extending the classical uncertainty measure, and the other extending the page-rank algorithm. We show that the latter is optimal when the fraction of tagged nodes is low, and when this fraction grows to one over the average degree, the regional uncertainty performs better than all existing methods. While we have tested this methods on graphs, such methods can be extended to any classification problem, where a distance metrics can be defined between the input samples. All the code used can be accessed at : https://github.com/louzounlab/graph-al All the datasets used can be accessed at : https://github.com/louzounlab/DataSets

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes two neighborhood-aware active learning query strategies for GCN-based node classification on graphs. One extends classical uncertainty sampling to a regional measure over a node's neighbors; the other extends the PageRank algorithm. The central empirical claim is that the PageRank variant is optimal at low labeled-node fractions, while the regional-uncertainty variant outperforms existing methods once the labeled fraction reaches approximately one over the average degree. The approach rests on the standard homophily assumption and is accompanied by public code and datasets.

Significance. If the reported crossover in performance holds under proper controls, the work supplies a practical, easily implemented improvement to active learning on graphs that exploits local structure rather than single-node uncertainty. The public release of code and datasets is a clear strength, directly supporting reproducibility of the claimed transition between the two proposed criteria.

major comments (2)
  1. [Experiments] Experiments section (and associated figures/tables): the abstract asserts that the PageRank extension is 'optimal' at low label fractions and that regional uncertainty 'performs better than all existing methods' at the 1/degree threshold, yet no table or figure is referenced that reports accuracy-vs-label-fraction curves, lists the exact baselines (random, uncertainty, core-set, etc.), or supplies error bars or statistical tests across repeated trials; without these the load-bearing performance claims cannot be evaluated.
  2. [Section 3] Section 3 (method definitions): the regional uncertainty and PageRank-extension criteria are described only at a high level; an explicit equation or pseudocode for each selection score is required so that the claimed optimality statements can be verified independently of the released code.
minor comments (3)
  1. [Abstract] Abstract: 'this methods' should be 'these methods'.
  2. [Abstract] The final sentence claims the methods 'can be extended to any classification problem where a distance metric can be defined'; this extension is asserted but never demonstrated or even sketched, so the sentence should be qualified or removed.
  3. [Introduction] The manuscript should add a short related-work paragraph citing recent graph active-learning baselines (e.g., AGE, ANRMAB, or core-set methods) to situate the two new criteria.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and commit to revisions that improve clarity and verifiability while preserving the core contributions.

read point-by-point responses
  1. Referee: [Experiments] Experiments section (and associated figures/tables): the abstract asserts that the PageRank extension is 'optimal' at low label fractions and that regional uncertainty 'performs better than all existing methods' at the 1/degree threshold, yet no table or figure is referenced that reports accuracy-vs-label-fraction curves, lists the exact baselines (random, uncertainty, core-set, etc.), or supplies error bars or statistical tests across repeated trials; without these the load-bearing performance claims cannot be evaluated.

    Authors: We agree that the experimental claims require clearer support in the text. The manuscript contains figures and tables comparing accuracy versus labeled fraction for the proposed methods against baselines including random sampling and uncertainty sampling, with results averaged over repeated trials. We will revise the Experiments section to explicitly reference these figures/tables, list all baselines, report error bars, and note the number of trials. The public code repository already enables independent reproduction of the curves. revision: yes

  2. Referee: [Section 3] Section 3 (method definitions): the regional uncertainty and PageRank-extension criteria are described only at a high level; an explicit equation or pseudocode for each selection score is required so that the claimed optimality statements can be verified independently of the released code.

    Authors: We acknowledge that the current description in Section 3 is high-level. We will add explicit mathematical definitions for the regional uncertainty score (extending per-node uncertainty over neighbors) and the PageRank-extension score, together with pseudocode for the selection procedure. These additions will allow direct verification of the optimality claims without sole reliance on the released code. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central claims rest on the standard homophily assumption in graphs (node labels correlated with neighbors) combined with empirical comparisons of selection criteria at varying label fractions. No equations, derivations, or first-principles results are shown that reduce the claimed optimality of the page-rank extension or regional uncertainty to fitted parameters, self-definitions, or self-citation chains. The methods are presented as extensions of existing uncertainty and PageRank ideas, with results supported by publicly available code and datasets for direct reproduction. This is a standard empirical active-learning study without load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the domain assumption that neighbor class information improves query selection; no free parameters or invented entities are mentioned.

axioms (1)
  • domain assumption The class of each node is often related to the class of its neighbors.
    Explicitly invoked in the abstract as the motivation for shifting from single-node to regional or neighbor-based criteria.

pith-pipeline@v0.9.0 · 5748 in / 1111 out tokens · 22051 ms · 2026-05-25T19:50:24.843713+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 5 internal anchors

  1. [1]

    A variance minimization criterion to active learning on graphs,

    M. Ji and J. Han, “A variance minimization criterion to active learning on graphs,” in Artificial Intelligence and Statistics , pp. 556–564, 2012. Table II RESULTS algorithm Cora CiteSeer accuracy macro-F1 micro-F1 accuracy macro-F1 micro-F1 our random 0.803 0.779 0.803 0.697 0.627 0.697 Kipf (random) [11] 0.801 - - 0.679 - - HNE [30] 0.59* - - - - - TV/MS...

  2. [2]

    Data-adaptive active sampling for efficient graph-cognizant classification,

    D. Berberidis and G. B. Giannakis, “Data-adaptive active sampling for efficient graph-cognizant classification,” IEEE Transactions on Signal Processing, vol. 66, no. 19, pp. 5167–5179, 2018

  3. [3]

    Semi-supervised learning using gaussian fields and harmonic functions,

    X. Zhu, Z. Ghahramani, and J. D. Lafferty, “Semi-supervised learning using gaussian fields and harmonic functions,” inProceedings of the 20th International conference on Machine learning (ICML-03) , pp. 912–919, 2003

  4. [4]

    Combining active learning and semi-supervised learning using gaussian fields and harmonic functions,

    X. Zhu, J. Lafferty, and Z. Ghahramani, “Combining active learning and semi-supervised learning using gaussian fields and harmonic functions,” in ICML 2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining , vol. 3, 2003

  5. [5]

    Beyond the point cloud: from transductive to semi-supervised learning,

    V . Sindhwani, P. Niyogi, and M. Belkin, “Beyond the point cloud: from transductive to semi-supervised learning,” in Proceedings of the 22nd international conference on Machine learning, pp. 824–831, ACM, 2005

  6. [6]

    Semi-supervised learning on riemannian manifolds,

    M. Belkin and P. Niyogi, “Semi-supervised learning on riemannian manifolds,” Machine learning, vol. 56, no. 1-3, pp. 209–239, 2004

  7. [7]

    Normalized cuts and image segmentation,

    J. Shi and J. Malik, “Normalized cuts and image segmentation,” De- partmental Papers (CIS), p. 107, 2000

  8. [8]

    Community detection in net- works with node attributes,

    J. Yang, J. McAuley, and J. Leskovec, “Community detection in net- works with node attributes,” in2013 IEEE 13th International Conference on Data Mining , pp. 1151–1156, IEEE, 2013

  9. [9]

    Topological similarity as a proxy to content similarity,

    Y . Rosen and Y . Louzoun, “Topological similarity as a proxy to content similarity,” Journal of Complex Networks, vol. 4, no. 1, pp. 38–60, 2015

  10. [10]

    Edge sign prediction based on a combination of network structural topology and sign propagation,

    R. Naaman, K. Cohen, and Y . Louzoun, “Edge sign prediction based on a combination of network structural topology and sign propagation,” Journal of Complex Networks , vol. 7, no. 1, pp. 54–66, 2018

  11. [11]

    Semi-Supervised Classification with Graph Convolutional Networks

    T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907 , 2016

  12. [12]

    Modeling relational data with graph convolutional net- works,

    M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling, “Modeling relational data with graph convolutional net- works,” in European Semantic Web Conference, pp. 593–607, Springer, 2018

  13. [13]

    Topological based classification of paper domains using graph convolutional networks

    I. Benami, K. Cohen, O. Nagar, and Y . Louzoun, “Topological based classification of paper domains using graph convolutional networks,” arXiv preprint arXiv:1904.07787 , 2019

  14. [14]

    Active learning literature survey,

    B. Settles, “Active learning literature survey,” tech. rep., University of Wisconsin-Madison Department of Computer Sciences, 2009

  15. [15]

    Heterogeneous uncertainty sampling for supervised learning,

    D. D. Lewis and J. Catlett, “Heterogeneous uncertainty sampling for supervised learning,” in Machine learning proceedings 1994 , pp. 148– 156, Elsevier, 1994

  16. [16]

    Reducing labeling effort for structured prediction tasks,

    A. Culotta and A. McCallum, “Reducing labeling effort for structured prediction tasks,” in AAAI, vol. 5, pp. 746–751, 2005

  17. [17]

    An analysis of active learning strategies for sequence labeling tasks,

    B. Settles and M. Craven, “An analysis of active learning strategies for sequence labeling tasks,” in Proceedings of the conference on empirical methods in natural language processing , pp. 1070–1079, Association for Computational Linguistics, 2008

  18. [18]

    Graph Convolutional Matrix Completion

    R. v. d. Berg, T. N. Kipf, and M. Welling, “Graph convolutional matrix completion,” arXiv preprint arXiv:1706.02263 , 2017

  19. [19]

    node2vec: Scalable feature learning for networks,

    A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining , pp. 855–864, ACM, 2016

  20. [20]

    Active hidden markov models for information extraction,

    T. Scheffer, C. Decomain, and S. Wrobel, “Active hidden markov models for information extraction,” in International Symposium on Intelligent Data Analysis, pp. 309–318, Springer, 2001

  21. [21]

    A mathematical theory of communication,

    C. E. Shannon, “A mathematical theory of communication,” Bell system technical journal, vol. 27, no. 3, pp. 379–423, 1948

  22. [22]

    Selective sampling for example-based word sense disambiguation,

    A. Fujii, T. Tokunaga, K. Inui, and H. Tanaka, “Selective sampling for example-based word sense disambiguation,” Computational Linguistics, vol. 24, no. 4, pp. 573–597, 1998

  23. [23]

    Active learning using pre-clustering,

    H. T. Nguyen and A. Smeulders, “Active learning using pre-clustering,” in Proceedings of the twenty-first international conference on Machine learning, p. 79, ACM, 2004

  24. [24]

    Active learning with sampling by uncertainty and density for data annotations,

    J. Zhu, H. Wang, B. K. Tsou, and M. Ma, “Active learning with sampling by uncertainty and density for data annotations,” IEEE Transactions on audio, speech, and language processing , vol. 18, no. 6, pp. 1323–1331, 2009

  25. [25]

    Using graph-based metrics with empirical risk min- imization to speed up active learning on networked data,

    S. A. Macskassy, “Using graph-based metrics with empirical risk min- imization to speed up active learning on networked data,” in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining , pp. 597–606, ACM, 2009

  26. [26]

    Batch mode active learning for node classification in assortative and disassortative networks,

    S. Ping, D. Liu, B. Yang, Y . Zhu, H. Chen, and Z. Wang, “Batch mode active learning for node classification in assortative and disassortative networks,” IEEE Access, vol. 6, pp. 4750–4758, 2017

  27. [27]

    Active Learning for Graph Embedding

    H. Cai, V . W. Zheng, and K. C.-C. Chang, “Active learning for graph embedding,” arXiv preprint arXiv:1705.05085 , 2017

  28. [28]

    σ-optimality for active learning on gaussian random fields,

    Y . Ma, R. Garnett, and J. Schneider, “σ-optimality for active learning on gaussian random fields,” in Advances in Neural Information Processing Systems, pp. 2751–2759, 2013

  29. [29]

    Active learning of gaussian processes with manifold-preserving graph reduction,

    J. Zhou and S. Sun, “Active learning of gaussian processes with manifold-preserving graph reduction,” Neural Computing and Applic- ations, vol. 25, no. 7-8, pp. 1615–1625, 2014

  30. [30]

    ActiveHNE: Active Heterogeneous Network Embedding

    X. Chen, G. Yu, J. Wang, C. Domeniconi, Z. Li, and X. Zhang, “Activehne: Active heterogeneous network embedding,” arXiv preprint arXiv:1905.05659, 2019

  31. [31]

    Self-emergence of knowledge trees: Extraction of the wikipedia hierarchies,

    L. Muchnik, R. Itzhack, S. Solomon, and Y . Louzoun, “Self-emergence of knowledge trees: Extraction of the wikipedia hierarchies,” Physical Review E, vol. 76, no. 1, p. 016106, 2007

  32. [32]

    Locating influ- ential nodes in complex networks,

    F. D. Malliaros, M.-E. G. Rossi, and M. Vazirgiannis, “Locating influ- ential nodes in complex networks,” Scientific reports, vol. 6, p. 19307, 2016

  33. [33]

    The pagerank citation ranking: Bringing order to the web.,

    L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation ranking: Bringing order to the web.,” tech. rep., Stanford InfoLab, 1999

  34. [34]

    Automating the construction of internet portals with machine learning,

    A. K. McCallum, K. Nigam, J. Rennie, and K. Seymore, “Automating the construction of internet portals with machine learning,” Information Retrieval, vol. 3, no. 2, pp. 127–163, 2000

  35. [35]

    Collective classification in network data,

    P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi- Rad, “Collective classification in network data,” AI magazine , vol. 29, no. 3, pp. 93–93, 2008

  36. [36]

    Citeseer: An automatic citation indexing system.,

    C. L. Giles, K. D. Bollacker, and S. Lawrence, “Citeseer: An automatic citation indexing system.,” in ACM DL, pp. 89–98, 1998

  37. [37]

    Graph evolution: Dens- ification and shrinking diameters,

    J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graph evolution: Dens- ification and shrinking diameters,” ACM Transactions on Knowledge Discovery from Data (TKDD) , vol. 1, no. 1, p. 2, 2007

  38. [38]

    SNAP Datasets: Stanford large network dataset collection

    J. Leskovec and A. Krevl, “SNAP Datasets: Stanford large network dataset collection.” http://snap.stanford.edu/data, June 2014

  39. [39]

    Local higher- order graph clustering,

    H. Yin, A. R. Benson, J. Leskovec, and D. F. Gleich, “Local higher- order graph clustering,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pp. 555–564, ACM, 2017

  40. [40]

    Cora citation network dataset – KONECT,

    “Cora citation network dataset – KONECT,” Apr. 2017

  41. [41]

    Human wayfinding in information networks,

    R. West and J. Leskovec, “Human wayfinding in information networks,” in Proceedings of the 21st international conference on World Wide Web, pp. 619–628, ACM, 2012

  42. [42]

    Wikispeedia: An online game for inferring semantic distances between concepts,

    R. West, J. Pineau, and D. Precup, “Wikispeedia: An online game for inferring semantic distances between concepts,” in Twenty-First International Joint Conference on Artificial Intelligence , 2009

  43. [43]

    Relational active learning for link-based classific- ation,

    L. K. McDowell, “Relational active learning for link-based classific- ation,” in 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA) , pp. 1–10, IEEE, 2015

  44. [44]

    Active learning for networked data,

    M. Bilgic, L. Mihalkova, and L. Getoor, “Active learning for networked data,” in Proceedings of the 27th international conference on machine learning (ICML-10), pp. 79–86, 2010