pith. sign in

arxiv: 2509.23175 · v2 · submitted 2025-09-27 · 💻 cs.IR · cs.AI

WARBERT: A Hierarchical BERT-based Model for Web API Recommendation

Pith reviewed 2026-05-18 12:55 UTC · model grok-4.3

classification 💻 cs.IR cs.AI
keywords Web API recommendationBERT modelHierarchical architectureSemantic matchingCandidate filteringMashup recommendation
0
0 comments X

The pith

WARBERT uses a hierarchical BERT structure with dual recommendation and matching components to recommend Web APIs more accurately while avoiding exhaustive searches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces WARBERT to handle three problems in Web API recommendation: unclear meanings when comparing API and mashup text, missing steps that refine a mashup need down to single API details, and the high cost of checking every API in a big collection. It builds a BERT model with two linked parts, one that quickly narrows candidates using recommendation-style labels and another that performs careful similarity checks on the short list. The two parts are fused with attention to produce a final score, and an extra task that guesses mashup categories is added to strengthen the first part. Tests on the ProgrammableWeb collection show higher accuracy and lower computation time than earlier methods.

Core claim

WARBERT is a hierarchical BERT-based model that applies dual-component feature fusion and attention mechanisms to build precise semantic representations for Web API recommendation. It separates the task into WARBERT(R) for fast candidate filtering via recommendation methods and WARBERT(M) for detailed similarity matching, then combines their outputs into a final pairing likelihood while using an auxiliary mashup-category prediction task to improve the filtering stage.

What carries the argument

The hierarchical BERT architecture with dual-component feature fusion and attention mechanisms that progressively refines semantic representations from mashup requirements to individual API descriptions.

If this is right

  • Semantic ambiguities between API and mashup descriptions are reduced through fused feature representations.
  • Progressive refinement from broad mashup requirements to specific API descriptions becomes possible inside one model.
  • Large-scale repositories can be searched without comparing every mashup to every API.
  • An auxiliary category-prediction task further strengthens the initial filtering stage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same two-stage filtering-plus-matching pattern could apply to recommending code libraries or cloud services that share similar description-matching problems.
  • Efficiency gains open the door to real-time API suggestions inside developer tools or marketplaces.
  • Attention-based fusion may help when API descriptions appear in multiple languages or technical domains.

Load-bearing premise

Dual-component feature fusion and attention mechanisms in the hierarchical BERT architecture create accurate semantic representations that resolve ambiguities and enable efficient candidate filtering without exhaustive comparisons.

What would settle it

Evaluating WARBERT against prior methods on the ProgrammableWeb dataset and finding no clear gains in accuracy or speed would show the dual-component hierarchical design does not deliver the claimed benefits.

Figures

Figures reproduced from arXiv: 2509.23175 by Dezhong Yao, Yuhong Gu, Zishuo Xu.

Figure 1
Figure 1. Figure 1: The Web API recommendation example. vast number of available Web APIs. Consequently, Web API recommendations are crucial for effective data integration and have gained significant attention [6]–[8]. The goal of Web API recommendation is to help developers efficiently choose suitable Web APIs for their mashup applications [9], [10] [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The Web API recommendation workflow. of APIs Ami to participate in the mashup, a set of meta￾elements X to prepare for composition [12], [36]. Definition 5 (Web API Recommendation Model). Web API recommendation Model is designed to handle the Web API recommendation task. It is defined as a parameterized function which outputs match score vector Rmi : f(θ, T mi , A) → Rmi = [r mi 1 , . . . , r mi j , . . . … view at source ↗
Figure 3
Figure 3. Figure 3: The architecture of WARBERT. Based on the above characteristics, WARBERT uses BERT to generate contextual embedding Vb of dimension F. For the word sequence of the mashup mi description, we use a special format for input to the model: Vcr = BERT([CLS], Dmi , [SEP]), (2) where [SEP] is a special segmentation token and [CLS] is the classification token. Through BERT, the model learns to obtain the focus in t… view at source ↗
Figure 5
Figure 5. Figure 5: Time consumption for WARBERT with different candidate number [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance of WARBERT with different candidate number [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
read the original abstract

With the rise of Web 2.0 and microservices, the increasing availability of Web APIs has intensified the need for effective recommendation systems. Existing approaches are generally categorized into two methods: recommendation-type methods, which classify APIs using labels, and match-type methods, which retrieve APIs through matching with mashups. However, three significant challenges remain: 1) semantic ambiguities in comparing API and mashup descriptions, 2) a lack of progressive semantic refinement between mashup requirements and individual API descriptions, and 3) computational inefficiency of exhaustive mashup-API comparisons in large-scale repositories. To tackle these challenges, we propose WARBERT, a hierarchical model based on BERT for Web API recommendation. WARBERT utilizes dual-component feature fusion and attention mechanisms to create accurate semantic representations. It consists of WARBERT(R) for initial candidate filtering using recommendation methods, and WARBERT(M), which focuses on refined similarity matching. The final likelihood of an API-mashup pairing combines predictions from both components, with WARBERT(R) further enhanced by an auxiliary task of predicting mashup categories. Experiments conducted on the ProgrammableWeb dataset demonstrate WARBERT outperforms existing baselines, achieving notable improvements in both accuracy and efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes WARBERT, a hierarchical BERT-based architecture for Web API recommendation that addresses semantic ambiguity, progressive refinement, and scalability. It splits the task into WARBERT(R), which performs recommendation-type candidate filtering augmented by an auxiliary mashup-category prediction task, and WARBERT(M), which performs refined match-type similarity computation via dual-component feature fusion and attention; the final API-mashup score is a combination of the two components. Experiments on the ProgrammableWeb dataset are reported to show gains in both accuracy and efficiency relative to existing baselines.

Significance. If the empirical claims hold, the work provides a concrete demonstration that a two-stage BERT pipeline can reduce exhaustive pairwise comparisons while preserving accuracy, bridging recommendation-type and match-type paradigms in service recommendation. This could inform scalable retrieval systems in large API repositories and service-oriented computing.

major comments (2)
  1. [§3.2] §3.2 (WAR BERT(R) filtering): the top-K candidate selection is presented as preserving relevant APIs for the subsequent matching stage, yet no recall@K curves or tables are supplied for the chosen K; without quantitative evidence that recall remains high (e.g., >0.95), any accuracy improvement in the combined model cannot be unambiguously attributed to the hierarchical design rather than to the filtering stage discarding difficult negatives.
  2. [§4.3] §4.3 (experimental results): the reported accuracy and efficiency gains are stated without error bars, standard deviations across runs, or statistical significance tests against the strongest baselines; this leaves open the possibility that observed differences fall within experimental variance.
minor comments (2)
  1. [Abstract] Abstract: the claim of 'notable improvements' is not accompanied by any numeric deltas, baseline names, or dataset size; adding these would make the summary self-contained.
  2. [§3.1] §3.1: the notation for the dual-component fusion weights is introduced without an explicit equation; a short formula would clarify how the two BERT outputs are combined before attention.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and indicate the planned revisions to strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (WAR BERT(R) filtering): the top-K candidate selection is presented as preserving relevant APIs for the subsequent matching stage, yet no recall@K curves or tables are supplied for the chosen K; without quantitative evidence that recall remains high (e.g., >0.95), any accuracy improvement in the combined model cannot be unambiguously attributed to the hierarchical design rather than to the filtering stage discarding difficult negatives.

    Authors: We agree that explicit recall@K evidence for the WARBERT(R) filtering stage is necessary to substantiate the benefits of the hierarchical design. In the revised manuscript we will add recall@K tables and curves for a range of K values, including the K used in our experiments, demonstrating that recall remains above 0.95. This addition will clarify that the observed accuracy gains arise from the progressive refinement performed by WARBERT(M) rather than from the removal of difficult negatives during filtering. revision: yes

  2. Referee: [§4.3] §4.3 (experimental results): the reported accuracy and efficiency gains are stated without error bars, standard deviations across runs, or statistical significance tests against the strongest baselines; this leaves open the possibility that observed differences fall within experimental variance.

    Authors: We acknowledge that the current experimental reporting lacks measures of variability and statistical testing. In the revised version we will report standard deviations computed over multiple independent runs with different random seeds and will include paired t-test p-values comparing WARBERT against the strongest baselines. These statistics will be added to the accuracy and efficiency tables together with a brief description of the experimental protocol. revision: yes

Circularity Check

0 steps flagged

No derivation chain present; empirical model proposal is self-contained

full rationale

The paper describes a hierarchical BERT architecture (WARBERT(R) for recommendation-based filtering with auxiliary category prediction, WARBERT(M) for similarity matching) whose final likelihood combines the two components. All performance claims rest on experimental results against baselines on the ProgrammableWeb dataset rather than any mathematical derivation, first-principles equations, or parameter-fitting steps that reduce outputs to inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked to justify core design choices; the architecture is presented as an engineering response to stated challenges (semantic ambiguity, progressive refinement, efficiency) and is evaluated directly.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields limited visibility into parameters or assumptions; the model implicitly relies on BERT's pre-trained semantic capabilities and the effectiveness of attention for description matching.

free parameters (1)
  • BERT fine-tuning hyperparameters and fusion weights
    Standard in such models but not specified; likely tuned on the dataset to achieve reported gains.
axioms (1)
  • domain assumption BERT embeddings plus attention can resolve semantic ambiguities between API and mashup descriptions
    Invoked by the claim that dual-component fusion creates accurate representations.

pith-pipeline@v0.9.0 · 5741 in / 1207 out tokens · 36152 ms · 2026-05-18T12:55:25.761869+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    Distributed redun- dant placement for microservice-based applications at the edge,

    H. Zhao, S. Deng, Z. Liu, J. Yin, and S. Dustdar, “Distributed redun- dant placement for microservice-based applications at the edge,”IEEE Transactions on Services Computing, vol. 15, no. 3, pp. 1732–1745, 2020

  2. [2]

    DAW AR: Diversity-aware web apis recommendation for mashup creation based on correlation graph,

    W. Gong, X. Zhang, Y . Chen, Q. He, A. Beheshti, X. Xu, C. Yan, and L. Qi, “DAW AR: Diversity-aware web apis recommendation for mashup creation based on correlation graph,” inProc. of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Madrid, Spain, July 11 - 15, 2022, pp. 395–404

  3. [3]

    Understanding mashup development,

    J. Yu, B. Benatallah, F. Casati, and F. Daniel, “Understanding mashup development,”IEEE Internet Computing, vol. 12, no. 5, pp. 44–52, 2008

  4. [4]

    Cross-view graph alignment for mashup recommendation,

    C. Wei, Y . Fan, Z. Jia, and J. Zhang, “Cross-view graph alignment for mashup recommendation,”IEEE Transactions on Services Computing, vol. 17, no. 5, pp. 2151–2164, 2024

  5. [5]

    Joint QoS prediction for web services based on deep fusion of features,

    L. Ding, J. Liu, G. Kang, Y . Xiao, and B. Cao, “Joint QoS prediction for web services based on deep fusion of features,”IEEE Transactions on Network and Service Management, pp. 1–1, 2023

  6. [6]

    Functional and structural fusion based web api recommendations in heterogeneous networks,

    X. Wang, M. Xi, and J. Yin, “Functional and structural fusion based web api recommendations in heterogeneous networks,” inProc. of the IEEE International Conference on Web Services (ICWS), Chicago, IL, USA, July 2-8 2023, pp. 91–96

  7. [7]

    SEHGN: semantic-enhanced heterogeneous graph network for web API recom- mendation,

    X. Wang, M. Xi, Y . Li, X. Pan, Y . Wu, S. Deng, and J. Yin, “SEHGN: semantic-enhanced heterogeneous graph network for web API recom- mendation,”IEEE Transactions on Services Computing, vol. 17, no. 5, pp. 2836–2849, 2024

  8. [8]

    Web API recommendation via leveraging content and network semantics,

    G. Kang, B. Liang, J. Liu, Y . Wen, Y . Xiao, and H. Nie, “Web API recommendation via leveraging content and network semantics,”IEEE Transactions on Network and Service Management, pp. 1–1, 2024

  9. [9]

    API recommendation for mashup creation: A comprehensive survey,

    H. Alhosaini, S. Alharbi, X. Wang, and G. Xu, “API recommendation for mashup creation: A comprehensive survey,”The Computer Journal, vol. 67, no. 5, pp. 1920–1940, 2023

  10. [10]

    Deep learning framework for multi-round service bundle recommendation in iterative mashup development,

    Y . Ma, X. Geng, J. Wang, K. He, and D. Athanasopoulos, “Deep learning framework for multi-round service bundle recommendation in iterative mashup development,”CAAI Transactions on Intelligence Technology, vol. 8, no. 3, pp. 914–930, 2023

  11. [11]

    KS-GNN: keyword search via graph neural network for web API recommendation,

    G. Kang, Y . Wang, H. Ren, B. Cao, J. Liu, and Y . Wen, “KS-GNN: keyword search via graph neural network for web API recommendation,” IEEE Transactions on Network and Service Management, vol. 21, no. 5, pp. 5464–5474, 2024

  12. [12]

    Cooperative mashup embed- ding leveraging knowledge graph for web API recommendation,

    C. Zhang, S. Qin, H. Wu, and L. Zhang, “Cooperative mashup embed- ding leveraging knowledge graph for web API recommendation,”IEEE Access, vol. 12, pp. 49 708–49 719, 2024

  13. [13]

    Generalization in NLI: ways (not) to go beyond simple heuristics,

    P. Bhargava, A. Drozd, and A. Rogers, “Generalization in NLI: ways (not) to go beyond simple heuristics,”CoRR, vol. abs/2110.01518, 2021

  14. [14]

    Web service recommendation via integrating heterogeneous graph attention network representation and FiBiNET score prediction,

    B. Cao, M. Peng, L. Zhang, Y . Qing, B. Tang, G. Kang, and J. Liu, “Web service recommendation via integrating heterogeneous graph attention network representation and FiBiNET score prediction,”IEEE Transactions on Services Computing, vol. 16, no. 5, pp. 3837–3850, 2023

  15. [15]

    ServiceBERT: A pre-trained model for web service tagging and recommendation,

    X. Wang, P. Zhou, Y . Wang, X. Liu, J. Liu, and H. Wu, “ServiceBERT: A pre-trained model for web service tagging and recommendation,” inProc. of International Conference on Service-Oriented Computing (ICSOC), vol. 13121, 2021, pp. 464–478

  16. [16]

    Personalized service recommendation with mashup group preference in heterogeneous infor- mation network,

    F. Xie, L. Chen, D. Lin, Z. Zheng, and X. Lin, “Personalized service recommendation with mashup group preference in heterogeneous infor- mation network,”IEEE Access, vol. 7, pp. 16 155–16 167, 2019

  17. [17]

    Mashup-oriented API recommenda- tion via random walk on knowledge graph,

    X. Wang, H. Wu, and C.-H. Hsu, “Mashup-oriented API recommenda- tion via random walk on knowledge graph,”IEEE Access, vol. 7, pp. 7651–7662, 2019

  18. [18]

    Web service recommendation with reconstructed profile from mashup descriptions,

    Y . Zhong, Y . Fan, W. Tan, and J. Zhang, “Web service recommendation with reconstructed profile from mashup descriptions,”IEEE Transac- tions on Automation Science and Engineering, vol. 15, no. 2, pp. 468– 478, 2018

  19. [19]

    A semantic-based service discovery framework for collaborative environments,

    S. Xu and B. Raahemi, “A semantic-based service discovery framework for collaborative environments,”International Journal of Simulation Modelling, vol. 15, no. 1, pp. 83–96, 2016

  20. [20]

    A fine-grained API link prediction approach supporting mashup recommendation,

    Q. Bao, J. Zhang, X. Duan, R. Ramachandran, T. J. Lee, Y . Zhang, Y . Xu, S. Lee, L. Pan, P. Gatlin, and M. Maskey, “A fine-grained API link prediction approach supporting mashup recommendation,” inProc. of the IEEE International Conference on Web Services (ICWS), Honolulu, HI, USA, June 25-30 2017, pp. 220–228

  21. [21]

    Keyword-driven service recommendation via deep reinforced steiner tree search,

    H. Chen, H. Wu, J. Li, X. Wang, and L. Zhang, “Keyword-driven service recommendation via deep reinforced steiner tree search,”IEEE Transactions on Industrial Informatics, vol. 19, no. 3, pp. 2930–2941, 2023

  22. [22]

    Functional and contextual attention- based LSTM for service recommendation in mashup creation,

    M. Shi, Y . Tang, and J. Liu, “Functional and contextual attention- based LSTM for service recommendation in mashup creation,”IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 5, pp. 1077–1090, 2019

  23. [23]

    A recommender agent for software libraries: An evaluation of memory-based and model-based collaborative filtering,

    F. McCarey, M. O. Cinneide, and N. Kushmerick, “A recommender agent for software libraries: An evaluation of memory-based and model-based collaborative filtering,” inProc. of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Hong Kong, China, 18-22 December 2006, pp. 154–162

  24. [24]

    Collabo- rative web service quality prediction via exploiting matrix factorization and network map,

    M. Tang, Z. Zheng, G. Kang, J. Liu, Y . Yang, and T. Zhang, “Collabo- rative web service quality prediction via exploiting matrix factorization and network map,”IEEE Transactions on Network and Service Man- agement, vol. 13, no. 1, pp. 126–137, 2016

  25. [25]

    CSCF: A mashup service rec- ommendation approach based on content similarity and collaborative filtering,

    C. Buqing, M. Tang, and X. Huang, “CSCF: A mashup service rec- ommendation approach based on content similarity and collaborative filtering,”International Journal of Grid and Distributed Computing, vol. 7, no. 2, 2014

  26. [26]

    QoS-Aware web service recom- mendation via collaborative filtering,

    X. Chen, Z. Zheng, and M. R. Lyu, “QoS-Aware web service recom- mendation via collaborative filtering,”Web Services Foundations, pp. 563–588, 2014

  27. [27]

    Geographic-aware collaborative filtering for web service recommen- dation,

    K. A. Botangen, J. Yu, Q. Z. Sheng, Y . Han, and S. Yongchareon, “Geographic-aware collaborative filtering for web service recommen- dation,”Expert Systems with Applications, vol. 151, p. 113347, 2020

  28. [28]

    Mashup recommendation by regularizing matrix factorization with API co- invocations,

    L. Yao, X. Wang, Q. Z. Sheng, B. Benatallah, and C. Huang, “Mashup recommendation by regularizing matrix factorization with API co- invocations,”IEEE Transactions on Services Computing, vol. 14, no. 2, pp. 502–515, 2021

  29. [29]

    Web ser- vice recommendation based on word embedding and topic model,

    T. Chen, J. Liu, B. Cao, Z. Peng, Y . Wen, and R. Li, “Web ser- vice recommendation based on word embedding and topic model,” inProc. of the IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Com- puting & Networking, Sustainable Computing & Commun...

  30. [30]

    Factorization machine based service recommendation on heterogeneous information networks,

    F. Xie, L. Chen, Y . Ye, Z. Zheng, and X. Lin, “Factorization machine based service recommendation on heterogeneous information networks,” inProc. of the IEEE International Conference on Web Services (ICWS), San Francisco, CA, USA, July 2-7 2018, pp. 115–122. 12

  31. [31]

    How to make latent factors interpretable by feeding factorization machines with knowledge graphs,

    V . W. Anelli, T. D. Noia, E. D. Sciascio, A. Ragone, and J. Trotta, “How to make latent factors interpretable by feeding factorization machines with knowledge graphs,” inProc. of the 18th International Semantic Web Conference (ISWC), vol. 11778, 2019, pp. 38–56

  32. [32]

    NAFM: neural and attentional factorization machine for web API recommendation,

    G. Kang, J. Liu, B. Cao, and M. Cao, “NAFM: neural and attentional factorization machine for web API recommendation,” inProc. of the IEEE International Conference on Web Services (ICWS), 2020, pp. 330– 337

  33. [33]

    Attentional matrix fac- torization with context and co-invocation for service recommendation,

    M. Nguyen, J. Yu, T. Nguyen, and Y . Han, “Attentional matrix fac- torization with context and co-invocation for service recommendation,” Expert Systems with Applications, vol. 186, p. 115698, 2021

  34. [34]

    Category-aware API clustering and distributed recommendation for automatic mashup creation,

    B. Xia, Y . Fan, W. Tan, K. Huang, J. Zhang, and C. Wu, “Category-aware API clustering and distributed recommendation for automatic mashup creation,”IEEE Transactions on Services Computing, vol. 8, no. 5, pp. 674–687, 2015

  35. [35]

    Mashup-oriented web API recommendation via multi-model fusion and multi-task learning,

    H. Wu, Y . Duan, K. Yue, and L. Zhang, “Mashup-oriented web API recommendation via multi-model fusion and multi-task learning,”IEEE Transactions on Services Computing, vol. 15, no. 6, pp. 3330–3343, 2021

  36. [36]

    On the systematic development of domain-specific mashup tools for end users,

    M. Imran, S. Soi, F. Kling, F. Daniel, F. Casati, and M. Marchese, “On the systematic development of domain-specific mashup tools for end users,” inProc. of the 12th International Conference on Web Engineering: 12th International Conference (ICWE), Berlin, Germany, July 23-27 2012, pp. 291–298

  37. [37]

    Effi- cient learning of multiple NLP tasks via collective weight factorization on BERT,

    C. Papadopoulos, Y . Panagakis, M. Koubarakis, and M. Nicolaou, “Effi- cient learning of multiple NLP tasks via collective weight factorization on BERT,” inFindings of the Association for Computational Linguistics: (NAACL), Seattle, W A, United States, July 10-15 2022, pp. 882–890

  38. [38]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems, 2017, pp. 5998–6008

  39. [39]

    Efficient long- range transformers: You need to attend more, but not necessarily at every layer,

    Q. Zhang, D. Ram, C. Hawkins, S. Zha, and T. Zhao, “Efficient long- range transformers: You need to attend more, but not necessarily at every layer,” inFindings of the Association for Computational Linguistics (EMNLP), Singapore, Dec. 2023, pp. 2775–2786

  40. [40]

    Sentence-BERT: Sentence embeddings using siamese bert-networks,

    N. Reimers and I. Gurevych, “Sentence-BERT: Sentence embeddings using siamese bert-networks,” inProc. of the 2019 Conference on Empir- ical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, November 3-7 2019, pp. 3980–3990. Zishuo Xuis working towards a B.S. d...