pith. sign in

arxiv: 1906.11727 · v1 · pith:FNN65BRAnew · submitted 2019-06-27 · 💻 cs.SI · physics.soc-ph

Link weights recovery in heterogeneous information networks

Pith reviewed 2026-05-25 13:45 UTC · model grok-4.3

classification 💻 cs.SI physics.soc-ph
keywords heterogeneous information networkslink weight recoverymeta-pathspath-constrained random walksnetwork reconstructionTwitter databibliographic networks
0
0 comments X

The pith

Linear combinations of meta-path random-walk probabilities recover the weights of links between nodes in heterogeneous information networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to recover the weight of an unobserved link between any two nodes by drawing on the structure of other networks that share the same set of entities. Path-constrained random walks are run along meta-paths that encode particular sequences of node and edge types, each producing a probability distribution over possible connections. These distributions are then added together with learned coefficients to produce an estimate of the target link weight. A sympathetic reader would care because many real systems contain multiple overlapping networks whose co-dependencies can be exploited to fill in missing interaction data without new measurements. The approach is shown to be general across node types and is demonstrated on Twitter and bibliographic examples.

Core claim

Given a pair of nodes in a heterogeneous information network, the exact weight of the incident link can be recovered by linearly combining the probability distributions that result from path-constrained random walks along meta-paths capable of capturing specific semantics.

What carries the argument

Linear combination of probability distributions produced by path-constrained random walks on meta-paths, used to approximate the target link weight from the rest of the network structure.

If this is right

  • The same procedure recovers link weights between any pair of node types present in the network.
  • Information from co-dependent networks can be transferred to complete missing parts of a target network.
  • The method produces an approximation to the exact observed weight when some links are already known.
  • Experiments on Twitter interaction data and bibliographic citation networks confirm that the linear combination yields usable estimates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The technique could be used to estimate unobserved interactions in any domain where multiple entity types are recorded together, such as user-item-rating systems.
  • If the linear model proves too restrictive, replacing it with a nonlinear aggregator of the same meta-path distributions would be a direct next test.
  • Applying the recovery procedure to temporal snapshots of a network would allow checking whether the same meta-paths continue to predict weight changes.

Load-bearing premise

The true link weight between the chosen nodes is accurately approximated by a linear combination of the probability distributions obtained from the selected meta-path walks.

What would settle it

Hold out a set of known link weights, compute the linear combination from the remaining network using the chosen meta-paths, and measure whether the approximation error stays low; persistently high error would falsify the recovery claim.

Figures

Figures reproduced from arXiv: 1906.11727 by Hong-Lan Botterman, Robin Lamarche-Perrin.

Figure 1
Figure 1. Figure 1: (a) Example of HIN composed of multiple node types, represented by diverse shapes, an multiple link types. Nodes are already grouped by shapes. (b) Its associated network schema composed by five nodes and twenty links. Each node corresponds to a set of nodes in the corresponding HIN. In the same way, each link is a set of links in the corresponding HIN. See for instance the paths and meta-path of length tw… view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the construction of graphs based on Twitter interactions where four users interact with each other [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: b shows r 2 scores when we combine variables of dif￾ferent lengths related to the same link type in the model. Actu￾ally, the r 2 associated to n number of variables is related to the model whose predictors are all meta-paths of length smaller or equals to n + 1 and whose the steps except the last are in the same type of links. For instance, for 3 variables, the pre￾dictors are RT-UH, RT-RT-UH and RT-RT-RT… view at source ↗
Figure 4
Figure 4. Figure 4: Typical example of reply case focused on user [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Density plot of observed versus estimated values for [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: illustrates an example of such networks where one can distinguish four types of nodes that is authors, papers, venues and topics; and four types of links (eight when we differentiate a type from its inverse) that is write, publish, cite and belong to. The HIN we analyse in this article is constructed from DBLP publications [1]. The data set contains 95,855 authors with 1,537,407 co-author relationships and… view at source ↗
Figure 8
Figure 8. Figure 8: Coefficient values of the final models of the different [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
read the original abstract

Socio-technical systems usually consists of many intertwined networks, each connecting different types of objects (or actors) through a variety of means. As these networks are co-dependent, one can take advantage of this entangled structure to study interaction patterns in a particular network from the information provided by other related networks. A method is hence proposed and tested to recover the weights of missing or unobserved links in heterogeneous information networks (HIN) - abstract representations of systems composed of multiple types of entities and their relations. Given a pair of nodes in a HIN, this work aims at recovering the exact weight of the incident link to these two nodes, knowing some other links present in the HIN. To do so, probability distributions resulting from path-constrained random walks i.e., random walks where the walker is forced to follow only a specific sequence of node types and edge types, capable to capture specific semantics and commonly called a meta-path, are combined in a linearly fashion in order to approximate the desired result. This method is general enough to compute the link weight between any types of nodes. Experiments on Twitter and bibliographic data show the applicability of the method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that in heterogeneous information networks (HINs), the exact weights of incident links between pairs of nodes can be recovered by linearly combining probability distributions resulting from path-constrained random walks on meta-paths, with the approach being general across node types and demonstrated via experiments on Twitter and bibliographic data.

Significance. If the linear combination can be placed on a sound footing with explicit coefficient derivation and quantitative validation, the method would extend meta-path techniques from HIN mining to weighted link recovery, offering a way to exploit co-dependent network structure for imputation tasks in socio-technical systems.

major comments (2)
  1. [Abstract / Proposed Method] Abstract and Proposed Method: the central claim that meta-path probability distributions 'are combined in a linearly fashion in order to approximate the desired result' supplies neither a derivation for the linear model nor any statement of how the coefficients are obtained (first-principles, external benchmarks, or post-hoc fitting to the target network).
  2. [Experiments] Experiments: the abstract states that experiments 'show the applicability of the method' yet provides no quantitative metrics, error bars, baseline comparisons, or description of coefficient estimation, rendering the empirical support impossible to evaluate.
minor comments (1)
  1. [Abstract] Abstract: 'Socio-technical systems usually consists' should read 'consist'; 'combined in a linearly fashion' should read 'combined in a linear fashion'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract / Proposed Method] Abstract and Proposed Method: the central claim that meta-path probability distributions 'are combined in a linearly fashion in order to approximate the desired result' supplies neither a derivation for the linear model nor any statement of how the coefficients are obtained (first-principles, external benchmarks, or post-hoc fitting to the target network).

    Authors: The referee correctly identifies that neither the abstract nor the method description supplies a formal derivation of the linear combination or the procedure for determining its coefficients. The manuscript presents the linear aggregation as a direct way to fuse semantic signals carried by different meta-paths, but does not derive the model from first principles or specify whether coefficients are obtained by optimization on observed links, by external calibration, or by another route. We will revise the Proposed Method section to include an explicit justification for the linear form together with a clear statement of the coefficient estimation procedure. revision: yes

  2. Referee: [Experiments] Experiments: the abstract states that experiments 'show the applicability of the method' yet provides no quantitative metrics, error bars, baseline comparisons, or description of coefficient estimation, rendering the empirical support impossible to evaluate.

    Authors: We agree that the abstract (and, by extension, the current presentation of the results) does not report quantitative performance measures, uncertainty estimates, baseline comparisons, or details of coefficient fitting. Although the manuscript describes experiments on Twitter and bibliographic networks, the lack of these elements prevents proper assessment. We will expand the Experiments section to report concrete metrics, error bars or confidence intervals, comparisons against suitable baselines, and the precise method used to obtain the linear coefficients. revision: yes

Circularity Check

0 steps flagged

Empirical linear approximation method with no derivation chain reducing to inputs

full rationale

The paper presents an approximation method that linearly combines meta-path random-walk distributions to recover link weights, validated through experiments on Twitter and bibliographic networks. No equations or procedures are described that define the target weights in terms of the fitted coefficients or that rename a fit as a first-principles prediction. The central claim is explicitly an empirical approximation whose success is measured externally on held-out links, with no self-citation load-bearing steps or ansatz smuggled via prior work. The derivation is therefore self-contained as a data-driven heuristic rather than a closed tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract; the ledger is therefore incomplete and provisional. The central claim rests on the unstated details of how linear coefficients are chosen and on the domain assumption that meta-paths encode transferable semantics across network types.

free parameters (1)
  • linear combination coefficients
    The weights used to combine the different meta-path probability distributions are not specified as fixed constants or derived; they function as free parameters whose values determine the approximation.
axioms (2)
  • domain assumption Meta-paths capture specific semantics that allow information transfer between related networks in a HIN
    Invoked when the method selects meta-paths to produce usable probability distributions for link recovery.
  • domain assumption The networks within the HIN are sufficiently co-dependent that partial observations in one part inform missing links elsewhere
    Stated in the opening motivation for using other networks to study a target network.

pith-pipeline@v0.9.0 · 5728 in / 1394 out tokens · 41320 ms · 2026-05-25T13:45:03.091172+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    https://aminer.org/citation

  2. [2]

    Link prediction using supervised learning

    Mohammad Al Hasan, Vineet Chaoji, Saeed Salem, and Mohammed Zaki. Link prediction using supervised learning. In SDM06: workshop on link analysis, counter- terrorism and security, 2006

  3. [3]

    A survey of cross- validation procedures for model selection

    Sylvain Arlot, Alain Celisse, et al. A survey of cross- validation procedures for model selection. Statistics sur- veys, 4:40–79, 2010

  4. [4]

    Semantic proximity search on graphs with metagraph-based learn- ing

    Yuan Fang, Wenqing Lin, Vincent W Zheng, Min Wu, Kevin Chen-Chuan Chang, and Xiao-Li Li. Semantic proximity search on graphs with metagraph-based learn- ing. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pages 277–288. IEEE, 2016

  5. [5]

    Dprel: a meta-path based relevance measure for mining heterogeneous networks

    Mukul Gupta, Pradeep Kumar, and Bharat Bhasker. Dprel: a meta-path based relevance measure for mining heterogeneous networks. Information Systems Frontiers, pages 1–17, 2017

  6. [6]

    Exploiting transitive similarity and temporal dynamics for similarity search in heterogeneous information networks

    Jiazhen He, James Bailey, and Rui Zhang. Exploiting transitive similarity and temporal dynamics for similarity search in heterogeneous information networks. In Inter- national Conference on Database Systems for Advanced Applications, pages 141–155. Springer, 2014

  7. [7]

    Meta structure: Computing relevance in large heterogeneous information networks

    Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. Meta structure: Computing relevance in large heterogeneous information networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1595–1604. ACM, 2016

  8. [8]

    Relational retrieval using a combination of path-constrained random walks

    Ni Lao and William W Cohen. Relational retrieval using a combination of path-constrained random walks. Ma- chine learning, 81(1):53–67, 2010

  9. [9]

    Stream graphs and link streams for the modeling of interactions over time

    Matthieu Latapy, Tiphaine Viard, and Clémence Mag- nien. Stream graphs and link streams for the modeling of interactions over time. Social Network Analysis and Mining, 8(1):61, 2018

  10. [10]

    Cold start link prediction

    Vincent Leroy, B Barla Cambazoglu, and Francesco Bonchi. Cold start link prediction. In Proceedings of the 16th ACM SIGKDD international conference on Knowl- edge discovery and data mining , pages 393–402. ACM, 2010

  11. [11]

    Link prediction in complex networks: A survey

    Linyuan Lü and Tao Zhou. Link prediction in complex networks: A survey. Physica A: statistical mechanics and its applications, 390(6):1150–1170, 2011

  12. [12]

    On the study of social interactions in twitter

    Sofus A Macskassy. On the study of social interactions in twitter. In Sixth International AAAI Conference on We- blogs and Social Media, 2012

  13. [13]

    A survey of link prediction in complex networks

    Víctor Martínez, Fernando Berzal, and Juan-Carlos Cubero. A survey of link prediction in complex networks. ACM Computing Surveys (CSUR), 49(4):69, 2017

  14. [14]

    Random walks and diffusion on networks

    Naoki Masuda, Mason A Porter, and Renaud Lambiotte. Random walks and diffusion on networks. Physics re- ports, 716:1–58, 2017

  15. [15]

    Relevance measure in large-scale heterogeneous networks

    Xiaofeng Meng, Chuan Shi, Yitong Li, Lei Zhang, and Bin Wu. Relevance measure in large-scale heterogeneous networks. In Asia-Pacific Web Conference, pages 636–

  16. [16]

    Link predic- tion via matrix factorization

    Aditya Krishna Menon and Charles Elkan. Link predic- tion via matrix factorization. In Joint european confer- ence on machine learning and knowledge discovery in databases, pages 437–452. Springer, 2011

  17. [17]

    Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs

    Rudy Raymond and Hisashi Kashima. Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs. In Joint european conference on machine learning and knowledge discovery in databases, pages 131–147. Springer, 2010

  18. [18]

    Hetesim: A general framework for relevance measure in heterogeneous networks

    Chuan Shi, Xiangnan Kong, Yue Huang, S Yu Philip, and Bin Wu. Hetesim: A general framework for relevance measure in heterogeneous networks. IEEE Transac- tions on Knowledge and Data Engineering, 26(10):2479– 2492, 2014

  19. [19]

    A survey of heterogeneous information net- work analysis

    Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S Yu Philip. A survey of heterogeneous information net- work analysis. IEEE Transactions on Knowledge and Data Engineering, 29(1):17–37, 2016

  20. [20]

    Co-author relationship prediction in heterogeneous bibliographic networks

    Yizhou Sun, Rick Barber, Manish Gupta, Charu C Aggar- wal, and Jiawei Han. Co-author relationship prediction in heterogeneous bibliographic networks. In 2011 Interna- tional Conference on Advances in Social Networks Anal- ysis and Mining, pages 121–128. IEEE, 2011

  21. [21]

    Pathsim: Meta path-based top-k similarity search in heterogeneous information networks

    Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and Tianyi Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceed- ings of the VLDB Endowment, 4(11):992–1003, 2011

  22. [22]

    Local probabilistic models for link prediction

    Chao Wang, Venu Satuluri, and Srinivasan Parthasarathy. Local probabilistic models for link prediction. InSeventh IEEE international conference on data mining (ICDM 2007), pages 322–331. IEEE, 2007. 13

  23. [23]

    Monte carlo cross val- idation

    Qing-Song Xu and Yi-Zeng Liang. Monte carlo cross val- idation. Chemometrics and Intelligent Laboratory Sys- tems, 56(1):1–11, 2001

  24. [24]

    Pathsimext: revisiting pathsim in heterogeneous information networks

    Kun Yao, Hoi Fong Mak, et al. Pathsimext: revisiting pathsim in heterogeneous information networks. InInter- national Conference on Web-Age Information Manage- ment, pages 38–42. Springer, 2014

  25. [25]

    Recurrent meta-structure for robust similarity measure in heterogeneous information networks

    Yu Zhou, Jianbin Huang, Heli Sun, and Yizhou Sun. Recurrent meta-structure for robust similarity measure in heterogeneous information networks. arXiv preprint, 2017