Probabilistic Salary Prediction with Graph Attention Networks and a Mixture Density Network

F.W. Takes; Mohammad Shokri; N. van Weeren; Zhipei Qin

arxiv: 2606.11663 · v1 · pith:MKGSNZOQnew · submitted 2026-06-10 · 💻 cs.SI · cs.LG

Probabilistic Salary Prediction with Graph Attention Networks and a Mixture Density Network

Zhipei Qin , Mohammad Shokri , N. van Weeren , F.W. Takes This is my paper

Pith reviewed 2026-06-27 07:52 UTC · model grok-4.3

classification 💻 cs.SI cs.LG

keywords salary predictiongraph attention networksmixture density networksprobabilistic modelinggraph neural networksjob market analysisDutch labor dataconditional distributions

0 comments

The pith

GAT-MDN produces full conditional salary distributions by running graph attention over hierarchical and similarity graphs on job attributes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that salary data exhibits both inherent uncertainty and multi-modality while being shaped by relational structures among locations, occupations, and industries. It shows that constructing domain-specific graphs with parent-child containment edges and Sentence-Transformer similarity edges, then processing them with parallel graph attention networks, supplies richer features than treating attributes as independent categories. These features feed a mixture density network head that outputs the parameters of a Gaussian mixture, yielding a full probability distribution over possible salaries rather than a single point estimate. On a dataset of more than one million Dutch job postings the resulting model records lower negative log-likelihood and mean squared error than an otherwise identical non-graph baseline. A reader would care because accurate distributional forecasts can reduce information asymmetry between employers and job seekers in labor markets.

Core claim

The authors claim that their GAT-MDN framework, which builds three separate multi-relational graphs (one each for location, occupation, and industry), learns node embeddings via edge-feature-aware graph attention, assembles a composite vector through priority-based hierarchical selection, and maps the result to Gaussian mixture parameters, produces strictly better negative log-likelihood and mean squared error than a non-graph MLP-MDN baseline when evaluated on over one million real Dutch job-posting records.

What carries the argument

Domain-specific graphs whose edges combine hierarchical parent-child containment with weighted Sentence-Transformer similarity links, processed by parallel edge-feature-aware Graph Attention Networks whose outputs are fed to a Mixture Density Network head.

If this is right

The model returns a full conditional distribution over salaries instead of a point prediction, directly quantifying uncertainty and multi-modality.
A priority-based selection module allows graceful handling of missing or coarse-grained attribute values without retraining.
Parallel GAT processing of location, occupation, and industry graphs captures cross-domain relational effects that independent categorical encodings miss.
The learned representations improve both likelihood of observed salaries and squared error on point forecasts simultaneously.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph-plus-MDN pattern could be tested on other compensation or pricing tasks that involve hierarchical taxonomies and semantic similarity among categories.
If the graph edges prove robust, the approach might extend to predicting other continuous outcomes such as house prices or project costs where relational structure among features matters.
Randomizing or ablating the similarity edges versus the hierarchy edges on the same data would isolate which relational signal contributes most to the performance gain.

Load-bearing premise

The graphs assembled from hierarchical containment relations and sentence-transformer similarities correctly encode the semantic and structural factors that actually govern salary levels.

What would settle it

Running the same GAT-MDN and MLP-MDN models on the identical Dutch dataset but with all graph edges replaced by random connections yields no statistically significant improvement in NLL or MSE.

Figures

Figures reproduced from arXiv: 2606.11663 by F.W. Takes, Mohammad Shokri, N. van Weeren, Zhipei Qin.

**Figure 2.** Figure 2: Training and validation NLL (left) and MSE (right) [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

Accurate salary prediction is critical for bridging the information gap between employers and job seekers in modern labor markets. Existing approaches predominantly yield a single point estimate and treat job attributes such as location, occupation, and industry as independent categorical features, ignoring both the inherent uncertainty and multi-modality of real-world compensation data and the rich hierarchical and semantic-similarity relationships that govern pay norms. In this paper we propose GAT-MDN, a unified framework that addresses both limitations simultaneously. For each of the three attribute domains we construct a domain-specific graph whose edges encode (i) hierarchical parent-child containment and (ii) weighted similarity links derived from a pre-trained Sentence-Transformer. Parallel Graph Attention Networks (GATs) with edge-feature-aware attention learn rich, context-sensitive node representations from these multi-relational graphs. A priority-based hierarchical selection module then assembles a composite feature vector that gracefully handles missing or coarse attributes, and a Mixture Density Network (MDN) head maps this vector to the parameters of a Gaussian Mixture Model (GMM), yielding a full conditional salary distribution. Extensive experiments on a real-world Dutch job-posting dataset of over 1 million records demonstrate that GAT-MDN significantly outperforms a non-graph MLP-MDN baseline in both Negative Log-Likelihood (NLL) and Mean Squared Error (MSE).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GAT-MDN puts together familiar pieces for salary distributions on attribute graphs, but the similarity edges are not shown to track pay norms and the experiments lack basic controls.

read the letter

The paper's main contribution is a pipeline that builds three domain graphs from job attributes, connects nodes by hierarchy plus Sentence-Transformer similarity, runs edge-aware GATs, applies a priority selection step for missing values, and feeds the result to an MDN head that outputs a Gaussian mixture over salary. The abstract reports lower NLL and MSE than a plain MLP-MDN on a Dutch dataset of more than a million records.

The architecture itself is a reasonable assembly for this setting. The hierarchical selection module is a straightforward way to handle coarse or absent attributes, and modeling the full conditional distribution rather than a point estimate is the right move for salary data. The specific combination with domain graphs has not appeared in the cited baselines, so the application is new even if the components are standard.

The weakest part is the assumption that the similarity edges actually help with salary norms. The construction uses a generic pre-trained transformer on attribute text; nothing in the description checks whether those edges connect attributes whose salary distributions are close. If they mostly link unrelated pay levels, the GAT attention is not learning the claimed structure and any gains could come from the selection module, extra capacity, or chance. The abstract also gives no information on train-test splits, hyper-parameter search, statistical testing, or missing-data handling, so the central empirical claim stays hard to evaluate.

This is for people doing applied work on labor-market data or graph models for tabular features with natural hierarchies. A reader looking for a concrete example of MDN plus GAT on real postings could extract useful implementation details. It is coherent enough on its own terms to deserve a serious referee, though the authors would need to strengthen the graph-construction argument and supply the missing experimental controls before publication.

Referee Report

3 major / 2 minor

Summary. The paper proposes GAT-MDN, a framework that builds three domain-specific graphs (for location, occupation, industry) with hierarchical containment edges and weighted Sentence-Transformer similarity edges, processes them with parallel edge-aware GATs, applies a priority-based hierarchical selection module to handle missing attributes, and feeds the resulting representation to an MDN head that outputs parameters of a Gaussian mixture model for conditional salary distributions. On a Dutch job-posting dataset exceeding 1 million records, the model is reported to achieve lower NLL and MSE than a non-graph MLP-MDN baseline.

Significance. If the reported gains are robust and attributable to the graph structure rather than capacity or selection effects, the work would demonstrate a practical way to inject relational inductive bias into probabilistic regression for labor-market data. The scale of the real-world dataset and the explicit handling of multi-modality and missing attributes are positive features; however, the absence of experimental controls leaves the magnitude and source of improvement difficult to assess.

major comments (3)

[Abstract, §4] Abstract and experimental section: the central claim of statistically significant outperformance in NLL and MSE rests on a comparison to an MLP-MDN baseline, yet no information is supplied on train/test splits, hyper-parameter search protocol, number of random seeds, or any statistical test for the reported differences; without these details the empirical result cannot be evaluated as load-bearing evidence for the GAT component.
[Graph construction paragraph] Section describing graph construction: the edges derived from a pre-trained Sentence-Transformer are justified only by generic semantic similarity of attribute text; no analysis, ablation, or external validation is provided showing that these edges correlate with similarity of conditional salary distributions rather than unrelated textual similarity, which directly undermines the interpretation that the GATs are learning “pay norms.”
[Priority-based hierarchical selection module] Methodology section on the priority-based selection module: because the module is present only in the GAT-MDN pipeline and absent from the MLP-MDN baseline, any performance difference could be driven by the selection logic or by the additional parameters it introduces rather than by the graph attention layers; an ablation isolating the GAT contribution is required to support the central attribution.

minor comments (2)

[GAT description] Notation for the edge-feature-aware attention mechanism should be defined explicitly before its first use to avoid ambiguity with standard GAT formulations.
[MDN head] The number of Gaussian components in the MDN is listed among the free parameters but no sensitivity analysis or selection criterion is reported.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments raise important points about the robustness of our empirical evaluation and the attribution of performance gains to the graph components. We provide point-by-point responses below and commit to revisions where appropriate to address these concerns.

read point-by-point responses

Referee: [Abstract, §4] Abstract and experimental section: the central claim of statistically significant outperformance in NLL and MSE rests on a comparison to an MLP-MDN baseline, yet no information is supplied on train/test splits, hyper-parameter search protocol, number of random seeds, or any statistical test for the reported differences; without these details the empirical result cannot be evaluated as load-bearing evidence for the GAT component.

Authors: We agree that these experimental details are essential for reproducibility and to support the significance claims. In the revised manuscript, we will expand the experimental section to specify the train/test split (80/20 random split stratified by salary range), the hyper-parameter search protocol (Bayesian optimization over learning rate, number of GAT layers, etc.), the use of 5 random seeds with reported means and standard deviations, and the application of a paired t-test to confirm statistical significance (p < 0.05). This addresses the concern directly. revision: yes
Referee: [Graph construction paragraph] Section describing graph construction: the edges derived from a pre-trained Sentence-Transformer are justified only by generic semantic similarity of attribute text; no analysis, ablation, or external validation is provided showing that these edges correlate with similarity of conditional salary distributions rather than unrelated textual similarity, which directly undermines the interpretation that the GATs are learning “pay norms.”

Authors: The manuscript motivates the use of Sentence-Transformer embeddings for edge weights by noting that semantic similarity in job attributes is expected to reflect shared pay norms in the labor market. While we did not include a dedicated correlation analysis in the original submission, we agree this would bolster the claim. We will add such an analysis in the revision, for example by measuring the relationship between edge weights and salary variance or mean differences across connected nodes, to validate that the edges capture relevant structure for salary prediction. revision: yes
Referee: [Priority-based hierarchical selection module] Methodology section on the priority-based selection module: because the module is present only in the GAT-MDN pipeline and absent from the MLP-MDN baseline, any performance difference could be driven by the selection logic or by the additional parameters it introduces rather than by the graph attention layers; an ablation isolating the GAT contribution is required to support the central attribution.

Authors: We recognize that the priority-based hierarchical selection module is unique to the GAT-MDN architecture in the current experiments. To better isolate the contribution of the graph attention networks, we will perform an ablation study in the revised manuscript by equipping the MLP-MDN baseline with an equivalent selection mechanism. This will allow us to attribute performance differences more confidently to the relational inductive bias provided by the GATs. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claim is externally evaluated

full rationale

The paper's central result is an empirical demonstration that GAT-MDN yields lower NLL and MSE than an MLP-MDN baseline on a held-out real-world Dutch job-posting dataset of >1M records. Graph construction uses fixed hierarchical containment plus edges from a pre-trained external Sentence-Transformer; the MDN head produces a GMM whose parameters are learned from data. No equation reduces the reported performance metrics to quantities defined by the model's own fitted parameters, no self-citation supplies a uniqueness theorem or ansatz that the present work depends upon, and the evaluation protocol (explicit non-graph baseline, external data) is independent of the model's internal definitions. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on several modeling choices whose justification is not derived from first principles within the abstract: graph construction rules, number of mixture components, and the assumption that sentence-transformer similarity reflects salary-relevant semantics.

free parameters (2)

number of Gaussian components in the MDN
Determines the number of modes in the output salary distribution; chosen to fit the data.
graph edge weighting parameters from Sentence-Transformer
Similarity thresholds or weights used to add edges are not derived and must be set.

axioms (2)

domain assumption Job attributes form meaningful hierarchical containment and semantic-similarity relations that influence salary norms
Invoked when the three domain-specific graphs are constructed.
domain assumption A Gaussian mixture is an adequate parametric family for conditional salary distributions
Assumed when the MDN head is chosen.

pith-pipeline@v0.9.1-grok · 5771 in / 1485 out tokens · 20160 ms · 2026-06-27T07:52:42.187030+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 9 canonical work pages

[1]

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A Next-generation Hyperparameter Optimization Frame- work.arXiv preprint arXiv:1907.10902(2019). arXiv:1907.10902

Pith/arXiv arXiv 2019
[2]

2025.Community Structure Analysis from Social Networks

Sajid Yousuf Bhat, Fouzia Jan, and Muhammad Abulaish. 2025.Community Structure Analysis from Social Networks. CRC Press. https://doi.org/10.1201/ 9781003508724

2025
[3]

Christopher M. Bishop. 1994.Mixture Density Networks. Technical Report NCRG/94/004. Aston University

1994
[4]

2020.Modelling and Predicting Individual Salaries in the United Kingdom with Graph Convolutional Network

Long Chen, Yeran Sun, and Piyushimita Thakuriah. 2020.Modelling and Predicting Individual Salaries in the United Kingdom with Graph Convolutional Network. Springer, 61–74. https://doi.org/10.1007/978-3-030-14347-3_7

work page doi:10.1007/978-3-030-14347-3_7 2020
[5]

Zhikai Chen, Haitao Mao, Hang Li, Wei Jin, Hongzhi Wen, Xiaochi Wei, Shuaiqiang Wang, Dawei Yin, Wenqi Fan, Hui Liu, and Jiliang Tang. 2024. Ex- ploring the Potential of Large Language Models (LLMs) in Learning on Graphs. arXiv:2307.03393 [cs.LG] https://arxiv.org/abs/2307.03393

arXiv 2024
[6]

Eurostat. 2008. NACE Rev. 2: Statistical Classification of Economic Activities in the European Community. https://ec.europa.eu/eurostat/web/nace-rev2. Based on Regulation (EC) No 1893/2006 of the European Parliament and of the Council

2008
[7]

Alex Graves. 2013. Generating Sequences with Recurrent Neural Networks.arXiv preprint arXiv:1308.0850(2013). arXiv:1308.0850 [cs.NE]

Pith/arXiv arXiv 2013
[8]

Zhuxi Jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, and Hanning Zhou
[9]

arXiv:1611.05148

Variational Deep Embedding: A Generative Approach to Clustering.arXiv preprint arXiv:1611.05148(2016). arXiv:1611.05148

Pith/arXiv arXiv 2016
[10]

Jobdigger. 2025. Real-time Labour Market Data and Analysis. https://www. jobdigger.nl/. Data provided for research purposes

2025
[11]

Keisuke Kinoshita, Marc Delcroix, Atsunori Ogawa, Takuya Higuchi, and To- mohiro Nakatani. 2017. Deep mixture density network for statistical model- based feature enhancement. InProceedings of the 2017 IEEE International Con- ference on Acoustics, Speech and Signal Processing (ICASSP). 251–255. https: //doi.org/10.1109/ICASSP.2017.7952156

work page doi:10.1109/icassp.2017.7952156 2017
[12]

A. Lazar. 2004. Income Prediction via Support Vector Machine. InICMLA. 143– 149

2004
[13]

Ying Li, Hengshu Zhu, Keli Liu, Panpan Zhang, and Hui Xiong. 2022. Learning to Distinguish and Aggregate Skill Sets for Job Salary Prediction. InProceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM ’22). ACM, 1045–1055. https://doi.org/10.1145/3511808.3557379

work page doi:10.1145/3511808.3557379 2022
[14]

McLachlan and Suren Rathnayake

Geoffrey J. McLachlan and Suren Rathnayake. 2014. On the number of compo- nents in a Gaussian mixture model.WIREs Data Mining and Knowledge Discovery 4, 5 (2014), 361–372. https://doi.org/10.1002/widm.1135

work page doi:10.1002/widm.1135 2014
[15]

Qingxin Meng, Keli Xiao, Dazhong Shen, Hengshu Zhu, and Hui Xiong. 2022. Fine-Grained Job Salary Benchmarking with a Nonparametric Dirichlet Process- Based Latent Factor Model.INFORMS Journal on Computing34, 5 (2022). https: //doi.org/10.1287/ijoc.2022.1182

work page doi:10.1287/ijoc.2022.1182 2022
[16]

Qingxin Meng, Hengshu Zhu, Keli Xiao, and Hui Xiong. 2018. Intelligent Salary Benchmarking for Talent Recruitment: A Holistic Matrix Factorization Approach. InProceedings of the 2018 IEEE International Conference on Data Mining (ICDM). 337–346. https://doi.org/10.1109/ICDM.2018.00049

work page doi:10.1109/icdm.2018.00049 2018
[17]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 3982–3992. https://doi.org/10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019
[18]

Yujun Sun, Fuzhen Zhuang, Hengshu Zhu, Deqing Wang, and Hui Xiong. 2021. Market-oriented job skill valuation with cooperative composition neural network. Nature Communications12, 1 (2021), 1992. https://doi.org/10.1038/s41467-021- 22215-y

work page doi:10.1038/s41467-021- 2021
[19]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. arXiv:1710.10903 [stat.ML] https://arxiv.org/abs/1710.10903

Pith/arXiv arXiv 2018
[20]

Zhongsheng Wang, Shinsuke Sugaya, and Dat P. T. Nguyen. 2019. Salary Predic- tion Using Bidirectional-GRU-CNN Model.Proceedings of the Annual Conference of the Association for Natural Language Processing(2019)

2019
[21]

Gulnarida Zhalilova, Aliyma Mamatkasymova, Elnura Zhusupova, and Kunduz Zhalzhaeva. 2024. Forecasting data science professionals’ salaries using machine learning methods based on real data.AIP Conference Proceedings3244 (2024), 030034. https://doi.org/10.1063/5.0242445

work page doi:10.1063/5.0242445 2024
[22]

Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V. Chawla. 2019. Heterogeneous Graph Neural Network.Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019). https://api.semanticscholar.org/CorpusID:198952485

2019

[1] [1]

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A Next-generation Hyperparameter Optimization Frame- work.arXiv preprint arXiv:1907.10902(2019). arXiv:1907.10902

Pith/arXiv arXiv 2019

[2] [2]

2025.Community Structure Analysis from Social Networks

Sajid Yousuf Bhat, Fouzia Jan, and Muhammad Abulaish. 2025.Community Structure Analysis from Social Networks. CRC Press. https://doi.org/10.1201/ 9781003508724

2025

[3] [3]

Christopher M. Bishop. 1994.Mixture Density Networks. Technical Report NCRG/94/004. Aston University

1994

[4] [4]

2020.Modelling and Predicting Individual Salaries in the United Kingdom with Graph Convolutional Network

Long Chen, Yeran Sun, and Piyushimita Thakuriah. 2020.Modelling and Predicting Individual Salaries in the United Kingdom with Graph Convolutional Network. Springer, 61–74. https://doi.org/10.1007/978-3-030-14347-3_7

work page doi:10.1007/978-3-030-14347-3_7 2020

[5] [5]

Zhikai Chen, Haitao Mao, Hang Li, Wei Jin, Hongzhi Wen, Xiaochi Wei, Shuaiqiang Wang, Dawei Yin, Wenqi Fan, Hui Liu, and Jiliang Tang. 2024. Ex- ploring the Potential of Large Language Models (LLMs) in Learning on Graphs. arXiv:2307.03393 [cs.LG] https://arxiv.org/abs/2307.03393

arXiv 2024

[6] [6]

Eurostat. 2008. NACE Rev. 2: Statistical Classification of Economic Activities in the European Community. https://ec.europa.eu/eurostat/web/nace-rev2. Based on Regulation (EC) No 1893/2006 of the European Parliament and of the Council

2008

[7] [7]

Alex Graves. 2013. Generating Sequences with Recurrent Neural Networks.arXiv preprint arXiv:1308.0850(2013). arXiv:1308.0850 [cs.NE]

Pith/arXiv arXiv 2013

[8] [8]

Zhuxi Jiang, Yin Zheng, Huachun Tan, Bangsheng Tang, and Hanning Zhou

[9] [9]

arXiv:1611.05148

Variational Deep Embedding: A Generative Approach to Clustering.arXiv preprint arXiv:1611.05148(2016). arXiv:1611.05148

Pith/arXiv arXiv 2016

[10] [10]

Jobdigger. 2025. Real-time Labour Market Data and Analysis. https://www. jobdigger.nl/. Data provided for research purposes

2025

[11] [11]

Keisuke Kinoshita, Marc Delcroix, Atsunori Ogawa, Takuya Higuchi, and To- mohiro Nakatani. 2017. Deep mixture density network for statistical model- based feature enhancement. InProceedings of the 2017 IEEE International Con- ference on Acoustics, Speech and Signal Processing (ICASSP). 251–255. https: //doi.org/10.1109/ICASSP.2017.7952156

work page doi:10.1109/icassp.2017.7952156 2017

[12] [12]

A. Lazar. 2004. Income Prediction via Support Vector Machine. InICMLA. 143– 149

2004

[13] [13]

Ying Li, Hengshu Zhu, Keli Liu, Panpan Zhang, and Hui Xiong. 2022. Learning to Distinguish and Aggregate Skill Sets for Job Salary Prediction. InProceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM ’22). ACM, 1045–1055. https://doi.org/10.1145/3511808.3557379

work page doi:10.1145/3511808.3557379 2022

[14] [14]

McLachlan and Suren Rathnayake

Geoffrey J. McLachlan and Suren Rathnayake. 2014. On the number of compo- nents in a Gaussian mixture model.WIREs Data Mining and Knowledge Discovery 4, 5 (2014), 361–372. https://doi.org/10.1002/widm.1135

work page doi:10.1002/widm.1135 2014

[15] [15]

Qingxin Meng, Keli Xiao, Dazhong Shen, Hengshu Zhu, and Hui Xiong. 2022. Fine-Grained Job Salary Benchmarking with a Nonparametric Dirichlet Process- Based Latent Factor Model.INFORMS Journal on Computing34, 5 (2022). https: //doi.org/10.1287/ijoc.2022.1182

work page doi:10.1287/ijoc.2022.1182 2022

[16] [16]

Qingxin Meng, Hengshu Zhu, Keli Xiao, and Hui Xiong. 2018. Intelligent Salary Benchmarking for Talent Recruitment: A Holistic Matrix Factorization Approach. InProceedings of the 2018 IEEE International Conference on Data Mining (ICDM). 337–346. https://doi.org/10.1109/ICDM.2018.00049

work page doi:10.1109/icdm.2018.00049 2018

[17] [17]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 3982–3992. https://doi.org/10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019

[18] [18]

Yujun Sun, Fuzhen Zhuang, Hengshu Zhu, Deqing Wang, and Hui Xiong. 2021. Market-oriented job skill valuation with cooperative composition neural network. Nature Communications12, 1 (2021), 1992. https://doi.org/10.1038/s41467-021- 22215-y

work page doi:10.1038/s41467-021- 2021

[19] [19]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. arXiv:1710.10903 [stat.ML] https://arxiv.org/abs/1710.10903

Pith/arXiv arXiv 2018

[20] [20]

Zhongsheng Wang, Shinsuke Sugaya, and Dat P. T. Nguyen. 2019. Salary Predic- tion Using Bidirectional-GRU-CNN Model.Proceedings of the Annual Conference of the Association for Natural Language Processing(2019)

2019

[21] [21]

Gulnarida Zhalilova, Aliyma Mamatkasymova, Elnura Zhusupova, and Kunduz Zhalzhaeva. 2024. Forecasting data science professionals’ salaries using machine learning methods based on real data.AIP Conference Proceedings3244 (2024), 030034. https://doi.org/10.1063/5.0242445

work page doi:10.1063/5.0242445 2024

[22] [22]

Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V. Chawla. 2019. Heterogeneous Graph Neural Network.Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019). https://api.semanticscholar.org/CorpusID:198952485

2019