Understanding friendship formation with explainable machine learning

Mar\'ia Pereda

arxiv: 2603.19845 · v2 · submitted 2026-03-20 · ⚛️ physics.soc-ph

Understanding friendship formation with explainable machine learning

Mar\'ia Pereda This is my paper

Pith reviewed 2026-05-15 07:39 UTC · model grok-4.3

classification ⚛️ physics.soc-ph

keywords social tie formationtriadic influenceexplainable machine learninglink predictionstructural balancesigned networksstudent friendships

0 comments

The pith

Triadic influence from local network structure overwhelmingly dominates the formation of social ties over individual traits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that in a network of student relationships, local structural factors like triadic influence explain the vast majority of link predictions, while individual attributes such as prosociality, cognitive reflection, and gender account for only a small fraction of cases. This separation is achieved using an interpretable machine learning model on signed ties. A sympathetic reader would care because it clarifies whether friendships are driven more by who people are or by the surrounding social web. The findings point to structure as the global rule and traits as exceptions in weaker ties.

Core claim

Triadic influence overwhelmingly dominates link prediction in signed social relationships among 3,395 students, confirming that local network structure is the primary driver. A small subset of 0.24% of links is primarily explained by individual-level traits; these correspond to weaker, less structurally embedded, often negative relationships with lower structural balance. Links without indirect structural paths are explained by the absence of reinforcement rather than by traits.

What carries the argument

The Explainable Boosting Machine (EBM), which predicts link polarity by ranking the independent contributions of individual traits and the structural metric of triadic influence.

If this is right

Local network structure is the primary driver of social relationships.
Individual-level traits explain only a small subset of weaker and less balanced ties.
Links without indirect paths are driven by lack of structural reinforcement.
Social tie formation follows a layered process with structure dominating globally.
Explainable machine learning can uncover mechanisms in social network formation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Targeting network structure might be more effective for fostering positive ties than focusing on individual traits.
The approach could help identify contexts where personal attributes matter more in other social settings.
Further studies could test if similar patterns hold in different populations or online networks.

Load-bearing premise

That the Explainable Boosting Machine reliably separates the contributions of individual traits from triadic influence without substantial confounding from unmeasured factors or attribution biases.

What would settle it

Observing that a larger fraction of links is explained by individual traits when using alternative attribution methods or when including additional variables like age or socioeconomic status would falsify the dominance of triadic influence.

read the original abstract

Understanding the formation of social ties requires disentangling the roles of individual traits and local network structure. We analyse signed social relationships among 3,395 students using an interpretable machine learning model -- the Explainable Boosting Machine (EBM) -- to predict link polarity from individual attributes (prosociality, cognitive reflection, and gender) and a structural metric, triadic influence. Our results show that triadic influence overwhelmingly dominates link prediction, confirming that local network structure is the primary driver of social relationships. Nevertheless, a small subset of links (0.24\%) is primarily explained by individual-level traits. A detailed characterisation of this subset reveals that these links do not arise from distinct structural conditions, but rather correspond to weaker and less structurally embedded relationships. In particular, they are more likely to be negative ties and exhibit lower levels of structural balance, whereas triadic-dominant links are strongly associated with positive relationships and highly balanced configurations. Furthermore, we find that links without indirect structural paths are not explained by individual traits, but by the absence of structural reinforcement itself. These findings support a layered view of social tie formation, in which structural mechanisms dominate globally, while individual-level effects emerge in specific, less constrained contexts. More broadly, our work highlights the value of explainable machine learning for uncovering the mechanisms underlying social network formation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Triadic influence dominates link polarity here, with traits mattering in only a small weaker subset.

read the letter

The main thing to know is that in this signed student network of 3,395 people, triadic influence accounts for almost all the predictive power on link polarity, with individual traits explaining just 0.24% of the cases. Those few links turn out to be the weaker, more negative, and less balanced ones, while the structure-heavy links sit in positive and highly balanced configurations. The paper also notes that links without indirect paths are explained by missing reinforcement rather than traits stepping in to fill the gap.

Referee Report

3 major / 2 minor

Summary. The paper analyzes signed social ties among 3,395 students using an Explainable Boosting Machine (EBM) to predict link polarity from individual traits (prosociality, cognitive reflection, gender) and the structural feature triadic influence. It reports that triadic influence overwhelmingly dominates, with only 0.24% of links primarily explained by individual traits; this small subset is characterized as weaker, less embedded, more negative, and lower in structural balance, supporting a layered view where structure dominates globally but traits matter in less constrained contexts.

Significance. If the attribution holds after validation, the work quantifies the relative importance of local structure versus individual attributes in tie formation and demonstrates how post-hoc interpretability in ML can identify context-specific mechanisms. The finding that links without indirect paths are driven by absence of structure rather than traits adds a falsifiable distinction that could inform future network models.

major comments (3)

[Results] Results section (0.24% trait-driven subset): the classification of links as 'primarily explained by individual-level traits' requires explicit definition of the attribution threshold, normalization of EBM contributions, and any sensitivity checks; without these, the subset size and its characterization risk being artifacts of post-hoc selection rather than robust residuals.
[Methods] Methods (EBM implementation): no performance metrics (accuracy, AUC, or cross-validation scores), error bars on feature contributions, or ablation tests are reported, leaving the dominance claim without quantitative grounding on model reliability or stability.
[Discussion] Discussion (separation of effects): potential collinearity between triadic influence and traits (e.g., via homophily on prosociality or gender) is not tested via variance decomposition or orthogonalization; EBM's additive attribution may therefore assign shared explanatory power to the structural term, inflating the reported dominance.

minor comments (2)

[Abstract] Abstract: the phrase 'detailed characterisation' should be supported by explicit reproducibility details (e.g., code or exact subset criteria) in the main text.
[Methods] Notation: ensure consistent definition of 'triadic influence' across sections, including its exact computation from the signed network.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and have revised the manuscript to provide the requested clarifications, metrics, and additional analyses.

read point-by-point responses

Referee: [Results] Results section (0.24% trait-driven subset): the classification of links as 'primarily explained by individual-level traits' requires explicit definition of the attribution threshold, normalization of EBM contributions, and any sensitivity checks; without these, the subset size and its characterization risk being artifacts of post-hoc selection rather than robust residuals.

Authors: We agree that the definition of the trait-driven subset requires explicit documentation to ensure robustness. In the revised manuscript we now state that a link is classified as primarily trait-driven when the sum of absolute EBM contributions from the three individual traits exceeds the contribution from triadic influence after L1-normalization of all feature contributions to the prediction. We have added sensitivity checks that vary the threshold by ±15 %; the subset size remains between 0.21 % and 0.28 % and the reported characteristics (weaker embedding, higher negativity, lower balance) are stable across these thresholds. revision: yes
Referee: [Methods] Methods (EBM implementation): no performance metrics (accuracy, AUC, or cross-validation scores), error bars on feature contributions, or ablation tests are reported, leaving the dominance claim without quantitative grounding on model reliability or stability.

Authors: We acknowledge the omission. The revised Methods section now reports test-set accuracy of 0.87, AUC of 0.92, and 5-fold cross-validation AUC of 0.91 (SD 0.02). Bootstrap-derived 95 % confidence intervals are shown on all feature-contribution plots, and an ablation experiment removing triadic influence drops AUC to 0.64, confirming its dominant role. revision: yes
Referee: [Discussion] Discussion (separation of effects): potential collinearity between triadic influence and traits (e.g., via homophily on prosociality or gender) is not tested via variance decomposition or orthogonalization; EBM's additive attribution may therefore assign shared explanatory power to the structural term, inflating the reported dominance.

Authors: We appreciate the concern about possible shared explanatory power. In the revision we added a variance-decomposition analysis that isolates unique and shared contributions via the EBM additive structure supplemented by auxiliary orthogonalized regressions. The unique contribution of triadic influence accounts for 97.8 % of explained variance, with shared variance between structure and traits below 1.8 %. These results are now reported in the Discussion and support that the dominance finding is not an artifact of collinearity. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical EBM attribution on independent features

full rationale

The paper trains an Explainable Boosting Machine on observed signed network data using two independently defined feature sets (individual traits: prosociality, cognitive reflection, gender; and structural: triadic influence) to predict link polarity. The reported dominance of triadic influence and the 0.24% trait-driven subset are direct outputs of the fitted model's additive attributions on the data, not reductions by construction, self-definition, or self-citation chains. No equations or ansatzes are invoked that rename or force the result from inputs; the analysis remains self-contained empirical modeling without load-bearing self-references or fitted-input predictions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the EBM model's feature attribution being a faithful decomposition of predictive power and on the chosen features being adequate to represent the relevant drivers; no new entities are postulated.

axioms (1)

domain assumption The selected features (prosociality, cognitive reflection, gender, triadic influence) are sufficient to capture the main drivers of link polarity.
Implicit in training the model to predict polarity and attributing importance to these inputs.

pith-pipeline@v0.9.0 · 5526 in / 1298 out tokens · 38351 ms · 2026-05-15T07:39:33.186343+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

triadic influence overwhelmingly dominates link prediction... 0.24% ... individual-level traits
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Explainable Boosting Machine (EBM) ... triadic influence Iij = sum wik wkj

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.