A Novel Method for News Article Event-Based Embedding

Itzhak Ben-David; Koren Ishlach; Lior Rokach; Michael Fire

arxiv: 2405.13071 · v2 · submitted 2024-05-20 · 💻 cs.CL · cs.AI· cs.SI

A Novel Method for News Article Event-Based Embedding

Koren Ishlach , Itzhak Ben-David , Michael Fire , Lior Rokach This is my paper

Pith reviewed 2026-05-24 00:33 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.SI

keywords news embeddingsevent detectionentity extractiontheme extractionGloVeSiamese networksSIFtime-separated embeddings

0 comments

The pith

A three-stage method using events, entities, and time-separated embeddings produces better news article vectors than full-text approaches.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to show that news article embeddings work better when they extract events, entities, and themes rather than processing entire articles, then build separate embeddings for different time periods and combine two different vector methods. This matters because standard embeddings miss the historical and event-driven context that drives many news applications. The authors test the idea on a large collection of articles and events and report gains on shared event detection. If the claim holds, downstream tasks that rely on article similarity would see measurable lifts without heavier models.

Core claim

The authors claim that processing articles to extract events, entities, and themes, training time-separated GloVe models on periodic slices of data, and then concatenating SIF article vectors with Siamese network outputs yields embeddings that capture latent event context more effectively than full-text baselines and improve or exceed state-of-the-art results on shared event detection.

What carries the argument

Concatenation of SIF and Siamese network outputs applied to time-separated GloVe embeddings of extracted events, entities, and themes.

If this is right

The embeddings improve performance on shared event detection compared with prior full-text methods.
The approach remains lightweight enough to scale to hundreds of thousands of articles.
Time-separated training lets the vectors reflect how entities and themes connect to events across different periods.
The resulting vectors support applications that need event-aware similarity, such as bias or recommendation systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same extraction-plus-time-separation pattern could be tested on non-news text where temporal context matters, such as legal or scientific documents.
If the time slices prove critical, future work could replace fixed periods with event-driven windows.
The dual-encoder concatenation suggests a general template for blending coarse and fine-grained signals in other embedding tasks.

Load-bearing premise

Extracting events, entities, and themes and combining time-separated embeddings with SIF and Siamese outputs actually captures the latent event context better than full article text.

What would settle it

Running the same shared event detection benchmarks on a held-out collection of news articles where the new embeddings show no accuracy gain over full-text baselines.

Figures

Figures reproduced from arXiv: 2405.13071 by Itzhak Ben-David, Koren Ishlach, Lior Rokach, Michael Fire.

**Figure 1.** Figure 1: This figure presents the method’s entire pipeline of news embedding generation. beddings, we deploy a Smooth Inverse Frequency (SIF) as a methodology to construct document-level vectors (see Section 2). Then, these embeddings traverse Siamese Neural Networks trained to minimize the distance between articles that share common events. To test and evaluate our study, we deployed our algorithm on the GDELT p… view at source ↗

**Figure 2.** Figure 2: This figure contained a Pareto chart of the Top 35 persons occurrences in the full collected dataset from GDELT. 3.1.3 Cleaning the Data After processing the entities and themes, the next step is to clean up redundant data. We identify two types of redundant articles: duplicates and very short articles. Article duplication can occur for several reasons, such as the same article being parsed multiple times … view at source ↗

**Figure 3.** Figure 3: The architecture and training process of each Triplet Siamese Network. regularization functions [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: This figure maps each monitored media source to the articles published by it in the preprocessed dataset. Conservative and Liberal media sources are labeled in blue and red, respectively. Table 1Yearly aggregated datasets statistics used in this work for performance analysis. Year #Articles #Events #Mentions %Monthly PR %Daily PR 2015 78877 96783 361148 0.0547 0.5012 2016 166754 196716 702328 0.0423 0.3630… view at source ↗

**Figure 5.** Figure 5: Siamese-Network: Train Triplet Loss. The X-axis is the monitored training steps; for every 4 steps, the average loss was calculated. The labels represent each Siamese model that was trained for a given month. 1. We deployed the semi-supervised event approach, SIF, and their concatenation – for all tasks of common event attribution. 2. We evaluated their performance on those tasks on all the collected datas… view at source ↗

**Figure 6.** Figure 6: Statistical Analysis of Article Generation Methods - Comparison of performance using the Friedman and Nemenyi tests with a 5% significance level in the daily and monthly event attribution task. 5.2 Common Event Attribution Tasks As described in Section 4, we ran the experiments on all 66 distinct monthly datasets we collected and processed from the GDELT project. The results of those experiments on our sug… view at source ↗

**Figure 7.** Figure 7: Statistical Analysis of Article Generation Using the spaCy Model - Comparison of performance using the Friedman and Nemenyi tests with a 5% significance level in the daily and monthly event attribution task. tions for article embeddings. (b) Evaluating if our semi-supervised approach yields an agnostic improvement to the generated articles embedding, relative to the given basic article embedding represen… view at source ↗

**Figure 8.** Figure 8: ROC and Precision-Recall AUC results of the daily and monthly common event attribution task across all methods and datasets. (a) Precision-Recall AUC Comparison (b) ROC AUC Comparison [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: ROC and Precision-Recall AUC results of the daily and monthly common event attribution task across all methods and datasets. 6. Discussion Upon analyzing the results presented in the previous section, we can conclude the following: First, the utilization of entities, themes, and events to construct news embeddings has shown promising results, as reflected in Tables 1 and 3. Even though the task of shared … view at source ↗

**Figure 10.** Figure 10: This figure presents a summary of the proposed SiameseNet Architecture [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗

**Figure 11.** Figure 11: These images provide examples of classic NER output on two US 2020 Election Debate News Articles [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗

**Figure 12.** Figure 12: These images provide examples of classic NER output on two Iran Meddling with US 2020 Election News Articles [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗

read the original abstract

Embedding news articles is a crucial tool for multiple fields, such as media bias detection, identifying fake news, and making news recommendations. However, existing news embedding methods are not optimized to capture the latent context of news events. Most embedding methods rely on full-text information and neglect time-relevant embedding generation. In this paper, we propose a novel lightweight method that optimizes news embedding generation by focusing on entities and themes mentioned in articles and their historical connections to specific events. We suggest a method composed of three stages. First, we process and extract events, entities, and themes from the given news articles. Second, we generate periodic time embeddings for themes and entities by training time-separated GloVe models on current and historical data. Lastly, we concatenate the news embeddings generated by two distinct approaches: Smooth Inverse Frequency (SIF) for article-level vectors and Siamese Neural Networks for embeddings with nuanced event-related information. We leveraged over 850,000 news articles and 1,000,000 events from the GDELT project to test and evaluate our method. We conducted a comparative analysis of different news embedding generation methods for validation. Our experiments demonstrate that our approach can both improve and outperform state-of-the-art methods on shared event detection tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The Siamese network is probably trained supervised on GDELT event pairs, so the outperformance on shared event detection does not demonstrate better latent context capture.

read the letter

The paper puts together event/entity/theme extraction, time-separated GloVe models, SIF, and a Siamese network to produce news embeddings tuned to events rather than full text. The time-separated embeddings for themes and entities are a straightforward way to bring in historical context, and the overall pipeline is a new assembly even if the pieces are known. Using the large GDELT corpus is also practical for this domain. That is the main positive: a concrete, lightweight recipe aimed at media analysis tasks. The central claim is that this beats SOTA on shared event detection. The abstract gives no numbers, no baselines, no metrics, and no training details, so the claim cannot be checked. The stress-test concern looks right on the description given. The Siamese component is added specifically for nuanced event information, and the whole thing runs on GDELT articles and events. If the network is trained with a supervised objective on pairs labeled by whether they share a GDELT event, then the embeddings receive direct supervision from the evaluation signal. Plain GloVe or sentence BERT baselines do not get that signal, so measured gains cannot be attributed to the event-modeling steps. This directly undercuts the assumption that the composition itself captures latent event context more effectively. The work is aimed at people doing news recommendation, bias detection, or event tracking in NLP. A reader could pull the pipeline and try it on their own data, but would have to re-implement the Siamese part carefully to avoid the supervision problem. It deserves peer review so the authors can show the exact training objective, the full experimental setup, and fair unsupervised baselines.

Referee Report

2 major / 2 minor

Summary. The paper proposes a three-stage method for news article embeddings focused on event context: extract events/entities/themes from articles, train time-separated GloVe models on current/historical data for periodic embeddings of themes and entities, then concatenate SIF article-level vectors with Siamese NN outputs for nuanced event information. Using 850k GDELT articles and 1M events, it claims the approach both improves upon and outperforms SOTA methods on shared event detection tasks.

Significance. If the outperformance claim holds under fair, unsupervised comparisons, the method could offer a lightweight alternative to full-text embeddings by incorporating historical event-entity-theme connections, with potential utility in media bias or recommendation systems. The use of GDELT for both training and evaluation is a strength for scale, but the absence of reproducible details and possible supervision issues reduce the assessed impact.

major comments (2)

[Abstract / Experimental section] Abstract and method description: the central claim that the three-stage pipeline (event extraction + time-separated GloVe + SIF/Siamese concatenation) captures latent event context more effectively than full-text SOTA is load-bearing but unverifiable, as no experimental setup, metrics (e.g., precision/recall/F1), baselines (e.g., BERT, plain GloVe), or statistical tests are provided to support outperformance on shared event detection.
[Method / Siamese NN description] Method composition (Siamese NN stage): if the Siamese network is trained with a supervised contrastive or classification objective on GDELT-labeled event-sharing pairs (as implied by its use 'for embeddings with nuanced event-related information' on the same corpus used for evaluation), this introduces task-specific supervision unavailable to unsupervised baselines, directly undermining attribution of gains to the proposed event-context modeling rather than label leakage.

minor comments (2)

[Abstract] The abstract states 'we conducted a comparative analysis' but provides zero specifics on the number of baselines, dataset splits, or evaluation protocol; this should be expanded with a dedicated experiments subsection including tables of results.
[Method] Notation for the concatenation step (SIF + Siamese) is not formalized with an equation; adding one would clarify the final embedding construction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the two major comments below, agreeing that additional experimental details and methodological clarifications are needed. Revisions will be made to the manuscript to provide these.

read point-by-point responses

Referee: [Abstract / Experimental section] Abstract and method description: the central claim that the three-stage pipeline (event extraction + time-separated GloVe + SIF/Siamese concatenation) captures latent event context more effectively than full-text SOTA is load-bearing but unverifiable, as no experimental setup, metrics (e.g., precision/recall/F1), baselines (e.g., BERT, plain GloVe), or statistical tests are provided to support outperformance on shared event detection.

Authors: We agree that the current version does not provide sufficient detail on the experimental setup, metrics, baselines, or statistical tests. The abstract summarizes the comparative analysis performed on shared event detection using the GDELT dataset (850k articles, 1M events), but the Experimental section will be expanded in revision to include the full setup, evaluation metrics (precision, recall, F1), all baselines compared (including BERT and plain GloVe), and any statistical tests used to support the outperformance claims. revision: yes
Referee: [Method / Siamese NN description] Method composition (Siamese NN stage): if the Siamese network is trained with a supervised contrastive or classification objective on GDELT-labeled event-sharing pairs (as implied by its use 'for embeddings with nuanced event-related information' on the same corpus used for evaluation), this introduces task-specific supervision unavailable to unsupervised baselines, directly undermining attribution of gains to the proposed event-context modeling rather than label leakage.

Authors: The Siamese network is trained with a supervised contrastive objective on GDELT-labeled event-sharing pairs to incorporate nuanced event information, as described. This is an intentional component of the method rather than an unintended leakage. In the revision, we will explicitly state the training objective, clarify the data splits used for training versus evaluation to mitigate leakage concerns, and add comparisons against purely unsupervised baselines to better attribute performance gains to the event-context components (entity/theme extraction and periodic GloVe). revision: yes

Circularity Check

0 steps flagged

No circularity: method and evaluation remain independent of self-defined inputs

full rationale

The paper describes a three-stage pipeline (event/entity/theme extraction, time-separated GloVe embeddings, then SIF + Siamese concatenation) and reports comparative results on the external GDELT corpus for shared-event detection. No equation or claim reduces a derived quantity to a fitted parameter by construction, no uniqueness theorem is imported via self-citation, and the evaluation signal is not shown to be identical to the training objective in a way that collapses the reported outperformance. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract only; no specific free parameters, axioms, or invented entities detailed beyond standard NLP components like GloVe and neural networks. Time periods for embeddings may involve ad hoc choices but are unspecified.

pith-pipeline@v0.9.0 · 5755 in / 1131 out tokens · 26200 ms · 2026-05-24T00:33:36.748879+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · 8 internal anchors

[1]

The impact of digital platforms on news and journalistic content

Derek Wilding, Peter Fray, Sacha Molitorisz, and Elaine McKewon. The impact of digital platforms on news and journalistic content. Digital Platforms Inquiry, 2018

work page 2018
[2]

Measuring the media agenda

Mary Layton Atkinson, John Lovett, and Frank R Baum- gartner. Measuring the media agenda. Political Commu- nication, 31(2):355–380, 2014

work page 2014
[3]

We can detect your bias: Predicting the political ideology of news articles

Ramy Baly, Giovanni Da San Martino, James Glass, and Preslav Nakov. We can detect your bias: Predicting the political ideology of news articles. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4982–4991, 2020

work page 2020
[4]

The effect of fox news on health behavior during covid-19

Elliott Ash, Sergio Galletta, Dominik Hangartner, Yotam Margalit, and Matteo Pinna. The effect of fox news on health behavior during covid-19. Available at SSRN 3636762, 2020

work page 2020
[5]

Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, chal- lenges and opportunities

Priyanka Meel and Dinesh Kumar Vishwakarma. Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, chal- lenges and opportunities. Expert Systems with Applica- tions, 153:112986, 2020

work page 2020
[6]

An exploration of how fake news is taking over social media and putting public health at risk

Salman Bin Naeem, Rubina Bhatti, and Aqsa Khan. An exploration of how fake news is taking over social media and putting public health at risk. Health Information & Libraries Journal, 38(2):143–149, 2021

work page 2021
[7]

Estimating coun- tries’ peace index through the lens of the world news as monitored by gdelt

Vasiliki V oukelatou, Luca Pappalardo, Ioanna Miliou, Lorenzo Gabrielli, and Fosca Giannotti. Estimating coun- tries’ peace index through the lens of the world news as monitored by gdelt. In 2020 IEEE 7th International Con- ference on Data Science and Advanced Analytics (DSAA), pages 216–225, 2020

work page 2020
[8]

The evolution of geo-relations between china and southeast asian countries based on gdelt

LI Bing and Peng Fei. The evolution of geo-relations between china and southeast asian countries based on gdelt. World Regional Studies, 30(6):1127, 2021

work page 2021
[9]

Using the gdelt dataset to analyse the italian sovereign bond market

Sergio Consoli, Luca Tiozzo Pezzoli, and Elisa Tosetti. Using the gdelt dataset to analyse the italian sovereign bond market. In Machine Learning, Optimization, and Data Science: 6th International Conference, LOD 2020, Siena, Italy, July 19–23, 2020, Revised Selected Papers, Part I 6, pages 190–202. Springer, 2020

work page 2020
[10]

Comparisons of the city brand influence of global cities: Word-embedding based semantic mining and clustering analysis on the big data of gdelt global news knowledge graph

Chenyu Zheng. Comparisons of the city brand influence of global cities: Word-embedding based semantic mining and clustering analysis on the big data of gdelt global news knowledge graph. Sustainability, 12(16):6294, 2020

work page 2020
[11]

Event prediction in the big data era: A systematic survey

Liang Zhao. Event prediction in the big data era: A systematic survey. ACM Computing Surveys (CSUR) , 54(5):1–37, 2021

work page 2021
[12]

Analyzing in- ternational event data: a handbook of computer-based techniques

Philip A Schrodt and Deborah J Gerner. Analyzing in- ternational event data: a handbook of computer-based techniques. University of Kansas, Online Manuscript, http://www. ku. edu/keds/papers. dir/automated. html , 2000

work page 2000
[13]

The conflict and peace data bank (copdab) project

Edward E Azar. The conflict and peace data bank (copdab) project. Journal of Conflict Resolution , 24(1):143–152, 1980

work page 1980
[14]

World-event-interaction-survey: A research project on the theory and measurement of international interaction and transaction

Charles A McClelland. World-event-interaction-survey: A research project on the theory and measurement of international interaction and transaction. University of Southern California, 1967

work page 1967
[15]

Gdelt: Global data on events, location, and tone, 1979–2012

Kalev Leetaru and Philip A Schrodt. Gdelt: Global data on events, location, and tone, 1979–2012. In ISA annual convention, volume 2, pages 1–49. Citeseer, 2013

work page 1979
[16]

Cameo: Conflict and mediation event observations event and actor codebook

Philip A Schrodt. Cameo: Conflict and mediation event observations event and actor codebook. Pennsylvania State University, 610:35, 2012. A Novel Method for News Article Event-Based Embedding — 20/28

work page 2012
[17]

Clark, Jeffrey R

Tom S. Clark, Jeffrey R. Lax, and Douglas Rice. Measur- ing the political salience of supreme court cases. Journal of Law and Courts, 3(1):37–65, 2015

work page 2015
[18]

big data

Robert A Blair and Nicholas Sambanis. Forecasting civil wars: Theory and structure in an age of “big data” and machine learning. Journal of Conflict Resolution , 64(10):1885–1915, 2020

work page 1915
[19]

News2vec: News network embedding with subnode in- formation

Ye Ma, Lu Zong, Yikang Yang, and Jionglong Su. News2vec: News network embedding with subnode in- formation. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 4843–4852, 2019

work page 2019
[20]

Tackling fake news detection by continually im- proving social context representations using graph neural networks

Nikhil Mehta, Mar ´ıa Leonor Pacheco, and Dan Gold- wasser. Tackling fake news detection by continually im- proving social context representations using graph neural networks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1363–1380, 2022

work page 2022
[21]

Context-aware graph embedding for session-based news recommendation

Heng-Shiou Sheu and Sheng Li. Context-aware graph embedding for session-based news recommendation. In Proceedings of the 14th ACM Conference on Recom- mender Systems, RecSys ’20, page 657–662, New York, NY , USA, 2020. Association for Computing Machinery

work page 2020
[22]

Event2vec: Neural embed- dings for news events

Vinay Setty and Katja Hose. Event2vec: Neural embed- dings for news events. In The 41st International ACM SIGIR Conference on Research & Development in Infor- mation Retrieval, pages 1013–1016, 2018

work page 2018
[23]

News recommender system: a review of recent progress, challenges, and opportunities

Shaina Raza and Chen Ding. News recommender system: a review of recent progress, challenges, and opportunities. Artificial Intelligence Review, pages 1–52, 2022

work page 2022
[24]

Unbert: User- news matching bert for news recommendation

Qi Zhang, Jingjie Li, Qinglin Jia, Chuyuan Wang, Jiem- ing Zhu, Zhaowei Wang, and Xiuqiang He. Unbert: User- news matching bert for news recommendation. In IJCAI, volume 21, pages 3356–3362, 2021

work page 2021
[25]

Understanding graph embedding methods and their applications

Mengjia Xu. Understanding graph embedding methods and their applications. SIAM Review, 63(4):825–853, 2021

work page 2021
[26]

Glove: Global vectors for word representa- tion

Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representa- tion. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014

work page 2014
[27]

A sim- ple but tough-to-beat baseline for sentence embeddings

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. A sim- ple but tough-to-beat baseline for sentence embeddings. In International conference on learning representations, 2017

work page 2017
[28]

Learning deep structure-preserving image-text embeddings

Liwei Wang, Yin Li, and Svetlana Lazebnik. Learning deep structure-preserving image-text embeddings. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5005–5013, 2016

work page 2016
[29]

De- tecting the magnitude of events from news articles

Ameeta Agrawal, Raghavender Sahdev, Heidar Davoudi, Forouq Khonsari, Aijun An, and Susan McGrath. De- tecting the magnitude of events from news articles. In 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), pages 177–184, 2016

work page 2016
[30]

Be- yond word embeddings: A survey

Francesca Incitti, Federico Urli, and Lauro Snidaro. Be- yond word embeddings: A survey. Information Fusion, 89:418–436, 2023

work page 2023
[31]

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[32]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional trans- formers for language understanding. arXiv preprint arXiv:1810.04805, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[33]

Distributed representations of sentences and documents

Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196. PMLR, 2014

work page 2014
[34]

Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

Jason Phang, Thibault F ´evry, and Samuel R Bowman. Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks. arXiv preprint arXiv:1811.01088, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[35]

The evolution of topic modeling

Rob Churchill and Lisa Singh. The evolution of topic modeling. ACM Computing Surveys, 54(10s):1–35, 2022

work page 2022
[36]

Topic modeling: a comprehensive review

Pooja Kherwa and Poonam Bansal. Topic modeling: a comprehensive review. EAI Endorsed transactions on scalable information systems, 7(24), 2019

work page 2019
[37]

The evolution of topic modeling

Rob Churchill and Lisa Singh. The evolution of topic modeling. ACM Comput. Surv., 54(10s), nov 2022. A Novel Method for News Article Event-Based Embedding — 21/28

work page 2022
[38]

Modeling the evolution of climate change assess- ment research using dynamic topic models and cross- domain divergence maps

Jennifer Sleeman, Milton Halem, Tim Finin, and Mark Cane. Modeling the evolution of climate change assess- ment research using dynamic topic models and cross- domain divergence maps. In 2017 AAAI Spring Sympo- sium Series, 2017

work page 2017
[39]

Words that matter: How the news and social media shaped the 2016 Presidential campaign

Leticia Bode, Ceren Budak, Jonathan M Ladd, Frank Newport, Josh Pasek, Lisa O Singh, Stuart N Soroka, and Michael W Traugott. Words that matter: How the news and social media shaped the 2016 Presidential campaign. Brookings Institution Press, 2019

work page 2016
[40]

Improving topic models with latent fea- ture word representations

Dat Quoc Nguyen, Richard Billingsley, Lan Du, and Mark Johnson. Improving topic models with latent fea- ture word representations. Transactions of the Associa- tion for Computational Linguistics, 3:299–313, 2015

work page 2015
[41]

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

Christopher E Moody. Mixing dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[42]

ERNIE: Enhanced Language Representation with Informative Entities

Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. Ernie: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905
[43]

Named entity recognition in query

Jiafeng Guo, Gu Xu, Xueqi Cheng, and Hang Li. Named entity recognition in query. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval , pages 267–274, 2009

work page 2009
[44]

Domain-specific knowledge graph construction

Mayank Kejriwal. Domain-specific knowledge graph construction. Springer, 2019

work page 2019
[45]

Named entity resources-overview and outlook

Maud Ehrmann, Damien Nouvel, and Sophie Rosset. Named entity resources-overview and outlook. Language Resources and Evaluation, 2016

work page 2016
[46]

No Starch Press, 2020

Yuli Vasiliev.Natural language processing with Python and spaCy: A practical introduction . No Starch Press, 2020

work page 2020
[47]

Natural language processing: python and NLTK

Nitin Hardeniya, Jacob Perkins, Deepti Chopra, Nisheeth Joshi, and Iti Mathur. Natural language processing: python and NLTK. Packt Publishing Ltd, 2016

work page 2016
[48]

Flair: An easy-to-use framework for state-of-the-art nlp

Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland V ollgraf. Flair: An easy-to-use framework for state-of-the-art nlp. In Pro- ceedings of the 2019 conference of the North American chapter of the association for computational linguistics (demonstrations), pages 54–59, 2019

work page 2019
[49]

Medieval spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information

Mª Luisa D ´ıez Platas, Salvador Ros Munoz, Elena Gonz´alez-Blanco, Pablo Ruiz Fabo, and Elena Al- varez Mellado. Medieval spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information. Journal of the Associa- tion for Information Science and Technology, 72(2):224– 238, 2021

work page 2021
[50]

Named entity recognition and classification in historical documents: A survey

Maud Ehrmann, Ahmed Hamdi, Elvys Linhares Pontes, Matteo Romanello, and Antoine Doucet. Named entity recognition and classification in historical documents: A survey. ACM Computing Surveys, 2023

work page 2023
[51]

Named entity recogni- tion with bidirectional lstm-cnns

Jason PC Chiu and Eric Nichols. Named entity recogni- tion with bidirectional lstm-cnns. Transactions of the as- sociation for computational linguistics, 4:357–370, 2016

work page 2016
[52]

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, and Antoine Bordes. Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[53]

A survey of event extraction from text

Wei Xiang and Bang Wang. A survey of event extraction from text. IEEE Access, 7:173111–173137, 2019

work page 2019
[54]

Fake news detection: A survey of graph neural network methods

Huyen Trang Phan, Ngoc Thanh Nguyen, and Dosam Hwang. Fake news detection: A survey of graph neural network methods. Applied Soft Computing, page 110235, 2023

work page 2023
[55]

making the news

Shyam Upadhyay, Christos Christodoulopoulos, and Dan Roth. “making the news”: Identifying noteworthy events in news articles. In Proceedings of the Fourth Workshop on Events, pages 1–7, 2016

work page 2016
[56]

Effi- cient methods for natural language processing: A survey

Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R Ciosici, Michael Hassid, Ken- neth Heafield, Sara Hooker, Colin Raffel, et al. Effi- cient methods for natural language processing: A survey. Transactions of the Association for Computational Lin- guistics, 11:826–860, 2023

work page 2023
[57]

Data mining methods for the content analyst: An introduction to the computational analysis of content

Kalev Leetaru. Data mining methods for the content analyst: An introduction to the computational analysis of content. Routledge, 2012. A Novel Method for News Article Event-Based Embedding — 22/28

work page 2012
[58]

Generalized hamming distance

Abraham Bookstein, Vladimir A Kulyukin, and Timo Raita. Generalized hamming distance. Information Re- trieval, 5:353–375, 2002

work page 2002
[59]

Mridha, Farisa Benta Safir, Md

Abu Quwsar Ohi, M.F. Mridha, Farisa Benta Safir, Md. Abdul Hamid, and Muhammad Mostafa Monowar. Autoembedder: A semi-supervised dnn embedding system for clustering. Knowledge-Based Systems , 204:106190, 2020

work page 2020
[60]

Signature verification us- ing a” siamese” time delay neural network

Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard S¨ackinger, and Roopak Shah. Signature verification us- ing a” siamese” time delay neural network. Advances in neural information processing systems, 6, 1993

work page 1993
[61]

Intention detection based on siamese neural network with triplet loss

Fuji Ren and Siyuan Xue. Intention detection based on siamese neural network with triplet loss. IEEE Access, 8:82242–82254, 2020

work page 2020
[62]

Facenet: A unified embedding for face recognition and clustering

Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015

work page 2015
[63]

Dis- criminative learning of deep convolutional feature point descriptors

Edgar Simo-Serra, Eduard Trulls, Luis Ferraz, Iasonas Kokkinos, Pascal Fua, and Francesc Moreno-Noguer. Dis- criminative learning of deep convolutional feature point descriptors. In Proceedings of the IEEE international conference on computer vision, pages 118–126, 2015

work page 2015
[64]

In Defense of the Triplet Loss for Person Re-Identification

Alexander Hermans, Lucas Beyer, and Bastian Leibe. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[65]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[66]

Action recognition based on discriminative embedding of actions using siamese networks

Debaditya Roy, C Krishna Mohan, and K Sri Rama Murty. Action recognition based on discriminative embedding of actions using siamese networks. In 2018 25th IEEE International Conference on Image Processing (ICIP) , pages 3473–3477. IEEE, 2018

work page 2018
[67]

The relationship be- tween precision-recall and roc curves

Jesse Davis and Mark Goadrich. The relationship be- tween precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240, 2006

work page 2006
[68]

The use of ranks to avoid the assump- tion of normality implicit in the analysis of variance.Jour- nal of the american statistical association, 32(200):675– 701, 1937

Milton Friedman. The use of ranks to avoid the assump- tion of normality implicit in the analysis of variance.Jour- nal of the american statistical association, 32(200):675– 701, 1937

work page 1937
[69]

Distribution-free multiple compar- isons

Peter Bjorn Nemenyi. Distribution-free multiple compar- isons. Princeton University, 1963

work page 1963
[70]

Red media, blue media: Evidence of ideological selectivity in media use

Shanto Iyengar and Kyu S Hahn. Red media, blue media: Evidence of ideological selectivity in media use. Journal of communication, 59(1):19–39, 2009

work page 2009
[71]

The utilization of ma- chine learning algorithms for assisting physicians in the diagnosis of diabetes

Linh Phuong Nguyen, Do Dinh Tung, Duong Thanh Nguyen, Hong Nhung Le, Toan Quoc Tran, Ta Van Binh, and Dung Thuy Nguyen Pham. The utilization of ma- chine learning algorithms for assisting physicians in the diagnosis of diabetes. Diagnostics, 13(12):2087, 2023

work page 2087
[72]

Statistical comparisons of classifiers over multiple data sets

Janez Demˇsar. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine learning research, 7:1–30, 2006. A Novel Method for News Article Event-Based Embedding — 23/28

work page 2006
[73]

Finally, we provide further references to the GDELT raw data

Research Notes In the appendix, we present the detailed implementation notes of our method, including the running environment and the parameter settings for the deployed algorithms and Neu- ral Networks. Finally, we provide further references to the GDELT raw data. A.1 Implementation Notes Running Environment. The experiments are conducted on a single Lin...

work page 2012
[74]

The single monthly trained GloVe model was not able to represent the 9.11 entity

The Table showcases two examples of how data reliant is the Glove model. The single monthly trained GloVe model was not able to represent the 9.11 entity. It was, however able to successfully encapsulate the Vladimir Putin entity, while including Alexander Litvinenko as a close neighbor. Alexander Litvinenko was a former officer of the Russian Federal Sec...

work page 2006

[1] [1]

The impact of digital platforms on news and journalistic content

Derek Wilding, Peter Fray, Sacha Molitorisz, and Elaine McKewon. The impact of digital platforms on news and journalistic content. Digital Platforms Inquiry, 2018

work page 2018

[2] [2]

Measuring the media agenda

Mary Layton Atkinson, John Lovett, and Frank R Baum- gartner. Measuring the media agenda. Political Commu- nication, 31(2):355–380, 2014

work page 2014

[3] [3]

We can detect your bias: Predicting the political ideology of news articles

Ramy Baly, Giovanni Da San Martino, James Glass, and Preslav Nakov. We can detect your bias: Predicting the political ideology of news articles. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4982–4991, 2020

work page 2020

[4] [4]

The effect of fox news on health behavior during covid-19

Elliott Ash, Sergio Galletta, Dominik Hangartner, Yotam Margalit, and Matteo Pinna. The effect of fox news on health behavior during covid-19. Available at SSRN 3636762, 2020

work page 2020

[5] [5]

Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, chal- lenges and opportunities

Priyanka Meel and Dinesh Kumar Vishwakarma. Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, chal- lenges and opportunities. Expert Systems with Applica- tions, 153:112986, 2020

work page 2020

[6] [6]

An exploration of how fake news is taking over social media and putting public health at risk

Salman Bin Naeem, Rubina Bhatti, and Aqsa Khan. An exploration of how fake news is taking over social media and putting public health at risk. Health Information & Libraries Journal, 38(2):143–149, 2021

work page 2021

[7] [7]

Estimating coun- tries’ peace index through the lens of the world news as monitored by gdelt

Vasiliki V oukelatou, Luca Pappalardo, Ioanna Miliou, Lorenzo Gabrielli, and Fosca Giannotti. Estimating coun- tries’ peace index through the lens of the world news as monitored by gdelt. In 2020 IEEE 7th International Con- ference on Data Science and Advanced Analytics (DSAA), pages 216–225, 2020

work page 2020

[8] [8]

The evolution of geo-relations between china and southeast asian countries based on gdelt

LI Bing and Peng Fei. The evolution of geo-relations between china and southeast asian countries based on gdelt. World Regional Studies, 30(6):1127, 2021

work page 2021

[9] [9]

Using the gdelt dataset to analyse the italian sovereign bond market

Sergio Consoli, Luca Tiozzo Pezzoli, and Elisa Tosetti. Using the gdelt dataset to analyse the italian sovereign bond market. In Machine Learning, Optimization, and Data Science: 6th International Conference, LOD 2020, Siena, Italy, July 19–23, 2020, Revised Selected Papers, Part I 6, pages 190–202. Springer, 2020

work page 2020

[10] [10]

Comparisons of the city brand influence of global cities: Word-embedding based semantic mining and clustering analysis on the big data of gdelt global news knowledge graph

Chenyu Zheng. Comparisons of the city brand influence of global cities: Word-embedding based semantic mining and clustering analysis on the big data of gdelt global news knowledge graph. Sustainability, 12(16):6294, 2020

work page 2020

[11] [11]

Event prediction in the big data era: A systematic survey

Liang Zhao. Event prediction in the big data era: A systematic survey. ACM Computing Surveys (CSUR) , 54(5):1–37, 2021

work page 2021

[12] [12]

Analyzing in- ternational event data: a handbook of computer-based techniques

Philip A Schrodt and Deborah J Gerner. Analyzing in- ternational event data: a handbook of computer-based techniques. University of Kansas, Online Manuscript, http://www. ku. edu/keds/papers. dir/automated. html , 2000

work page 2000

[13] [13]

The conflict and peace data bank (copdab) project

Edward E Azar. The conflict and peace data bank (copdab) project. Journal of Conflict Resolution , 24(1):143–152, 1980

work page 1980

[14] [14]

World-event-interaction-survey: A research project on the theory and measurement of international interaction and transaction

Charles A McClelland. World-event-interaction-survey: A research project on the theory and measurement of international interaction and transaction. University of Southern California, 1967

work page 1967

[15] [15]

Gdelt: Global data on events, location, and tone, 1979–2012

Kalev Leetaru and Philip A Schrodt. Gdelt: Global data on events, location, and tone, 1979–2012. In ISA annual convention, volume 2, pages 1–49. Citeseer, 2013

work page 1979

[16] [16]

Cameo: Conflict and mediation event observations event and actor codebook

Philip A Schrodt. Cameo: Conflict and mediation event observations event and actor codebook. Pennsylvania State University, 610:35, 2012. A Novel Method for News Article Event-Based Embedding — 20/28

work page 2012

[17] [17]

Clark, Jeffrey R

Tom S. Clark, Jeffrey R. Lax, and Douglas Rice. Measur- ing the political salience of supreme court cases. Journal of Law and Courts, 3(1):37–65, 2015

work page 2015

[18] [18]

big data

Robert A Blair and Nicholas Sambanis. Forecasting civil wars: Theory and structure in an age of “big data” and machine learning. Journal of Conflict Resolution , 64(10):1885–1915, 2020

work page 1915

[19] [19]

News2vec: News network embedding with subnode in- formation

Ye Ma, Lu Zong, Yikang Yang, and Jionglong Su. News2vec: News network embedding with subnode in- formation. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 4843–4852, 2019

work page 2019

[20] [20]

Tackling fake news detection by continually im- proving social context representations using graph neural networks

Nikhil Mehta, Mar ´ıa Leonor Pacheco, and Dan Gold- wasser. Tackling fake news detection by continually im- proving social context representations using graph neural networks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1363–1380, 2022

work page 2022

[21] [21]

Context-aware graph embedding for session-based news recommendation

Heng-Shiou Sheu and Sheng Li. Context-aware graph embedding for session-based news recommendation. In Proceedings of the 14th ACM Conference on Recom- mender Systems, RecSys ’20, page 657–662, New York, NY , USA, 2020. Association for Computing Machinery

work page 2020

[22] [22]

Event2vec: Neural embed- dings for news events

Vinay Setty and Katja Hose. Event2vec: Neural embed- dings for news events. In The 41st International ACM SIGIR Conference on Research & Development in Infor- mation Retrieval, pages 1013–1016, 2018

work page 2018

[23] [23]

News recommender system: a review of recent progress, challenges, and opportunities

Shaina Raza and Chen Ding. News recommender system: a review of recent progress, challenges, and opportunities. Artificial Intelligence Review, pages 1–52, 2022

work page 2022

[24] [24]

Unbert: User- news matching bert for news recommendation

Qi Zhang, Jingjie Li, Qinglin Jia, Chuyuan Wang, Jiem- ing Zhu, Zhaowei Wang, and Xiuqiang He. Unbert: User- news matching bert for news recommendation. In IJCAI, volume 21, pages 3356–3362, 2021

work page 2021

[25] [25]

Understanding graph embedding methods and their applications

Mengjia Xu. Understanding graph embedding methods and their applications. SIAM Review, 63(4):825–853, 2021

work page 2021

[26] [26]

Glove: Global vectors for word representa- tion

Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representa- tion. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014

work page 2014

[27] [27]

A sim- ple but tough-to-beat baseline for sentence embeddings

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. A sim- ple but tough-to-beat baseline for sentence embeddings. In International conference on learning representations, 2017

work page 2017

[28] [28]

Learning deep structure-preserving image-text embeddings

Liwei Wang, Yin Li, and Svetlana Lazebnik. Learning deep structure-preserving image-text embeddings. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5005–5013, 2016

work page 2016

[29] [29]

De- tecting the magnitude of events from news articles

Ameeta Agrawal, Raghavender Sahdev, Heidar Davoudi, Forouq Khonsari, Aijun An, and Susan McGrath. De- tecting the magnitude of events from news articles. In 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), pages 177–184, 2016

work page 2016

[30] [30]

Be- yond word embeddings: A survey

Francesca Incitti, Federico Urli, and Lauro Snidaro. Be- yond word embeddings: A survey. Information Fusion, 89:418–436, 2023

work page 2023

[31] [31]

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[32] [32]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional trans- formers for language understanding. arXiv preprint arXiv:1810.04805, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[33] [33]

Distributed representations of sentences and documents

Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196. PMLR, 2014

work page 2014

[34] [34]

Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks

Jason Phang, Thibault F ´evry, and Samuel R Bowman. Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks. arXiv preprint arXiv:1811.01088, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[35] [35]

The evolution of topic modeling

Rob Churchill and Lisa Singh. The evolution of topic modeling. ACM Computing Surveys, 54(10s):1–35, 2022

work page 2022

[36] [36]

Topic modeling: a comprehensive review

Pooja Kherwa and Poonam Bansal. Topic modeling: a comprehensive review. EAI Endorsed transactions on scalable information systems, 7(24), 2019

work page 2019

[37] [37]

The evolution of topic modeling

Rob Churchill and Lisa Singh. The evolution of topic modeling. ACM Comput. Surv., 54(10s), nov 2022. A Novel Method for News Article Event-Based Embedding — 21/28

work page 2022

[38] [38]

Modeling the evolution of climate change assess- ment research using dynamic topic models and cross- domain divergence maps

Jennifer Sleeman, Milton Halem, Tim Finin, and Mark Cane. Modeling the evolution of climate change assess- ment research using dynamic topic models and cross- domain divergence maps. In 2017 AAAI Spring Sympo- sium Series, 2017

work page 2017

[39] [39]

Words that matter: How the news and social media shaped the 2016 Presidential campaign

Leticia Bode, Ceren Budak, Jonathan M Ladd, Frank Newport, Josh Pasek, Lisa O Singh, Stuart N Soroka, and Michael W Traugott. Words that matter: How the news and social media shaped the 2016 Presidential campaign. Brookings Institution Press, 2019

work page 2016

[40] [40]

Improving topic models with latent fea- ture word representations

Dat Quoc Nguyen, Richard Billingsley, Lan Du, and Mark Johnson. Improving topic models with latent fea- ture word representations. Transactions of the Associa- tion for Computational Linguistics, 3:299–313, 2015

work page 2015

[41] [41]

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

Christopher E Moody. Mixing dirichlet topic models and word embeddings to make lda2vec. arXiv preprint arXiv:1605.02019, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[42] [42]

ERNIE: Enhanced Language Representation with Informative Entities

Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. Ernie: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905

[43] [43]

Named entity recognition in query

Jiafeng Guo, Gu Xu, Xueqi Cheng, and Hang Li. Named entity recognition in query. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval , pages 267–274, 2009

work page 2009

[44] [44]

Domain-specific knowledge graph construction

Mayank Kejriwal. Domain-specific knowledge graph construction. Springer, 2019

work page 2019

[45] [45]

Named entity resources-overview and outlook

Maud Ehrmann, Damien Nouvel, and Sophie Rosset. Named entity resources-overview and outlook. Language Resources and Evaluation, 2016

work page 2016

[46] [46]

No Starch Press, 2020

Yuli Vasiliev.Natural language processing with Python and spaCy: A practical introduction . No Starch Press, 2020

work page 2020

[47] [47]

Natural language processing: python and NLTK

Nitin Hardeniya, Jacob Perkins, Deepti Chopra, Nisheeth Joshi, and Iti Mathur. Natural language processing: python and NLTK. Packt Publishing Ltd, 2016

work page 2016

[48] [48]

Flair: An easy-to-use framework for state-of-the-art nlp

Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland V ollgraf. Flair: An easy-to-use framework for state-of-the-art nlp. In Pro- ceedings of the 2019 conference of the North American chapter of the association for computational linguistics (demonstrations), pages 54–59, 2019

work page 2019

[49] [49]

Medieval spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information

Mª Luisa D ´ıez Platas, Salvador Ros Munoz, Elena Gonz´alez-Blanco, Pablo Ruiz Fabo, and Elena Al- varez Mellado. Medieval spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information. Journal of the Associa- tion for Information Science and Technology, 72(2):224– 238, 2021

work page 2021

[50] [50]

Named entity recognition and classification in historical documents: A survey

Maud Ehrmann, Ahmed Hamdi, Elvys Linhares Pontes, Matteo Romanello, and Antoine Doucet. Named entity recognition and classification in historical documents: A survey. ACM Computing Surveys, 2023

work page 2023

[51] [51]

Named entity recogni- tion with bidirectional lstm-cnns

Jason PC Chiu and Eric Nichols. Named entity recogni- tion with bidirectional lstm-cnns. Transactions of the as- sociation for computational linguistics, 4:357–370, 2016

work page 2016

[52] [52]

Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

Alexis Conneau, Douwe Kiela, Holger Schwenk, Loic Barrault, and Antoine Bordes. Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[53] [53]

A survey of event extraction from text

Wei Xiang and Bang Wang. A survey of event extraction from text. IEEE Access, 7:173111–173137, 2019

work page 2019

[54] [54]

Fake news detection: A survey of graph neural network methods

Huyen Trang Phan, Ngoc Thanh Nguyen, and Dosam Hwang. Fake news detection: A survey of graph neural network methods. Applied Soft Computing, page 110235, 2023

work page 2023

[55] [55]

making the news

Shyam Upadhyay, Christos Christodoulopoulos, and Dan Roth. “making the news”: Identifying noteworthy events in news articles. In Proceedings of the Fourth Workshop on Events, pages 1–7, 2016

work page 2016

[56] [56]

Effi- cient methods for natural language processing: A survey

Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R Ciosici, Michael Hassid, Ken- neth Heafield, Sara Hooker, Colin Raffel, et al. Effi- cient methods for natural language processing: A survey. Transactions of the Association for Computational Lin- guistics, 11:826–860, 2023

work page 2023

[57] [57]

Data mining methods for the content analyst: An introduction to the computational analysis of content

Kalev Leetaru. Data mining methods for the content analyst: An introduction to the computational analysis of content. Routledge, 2012. A Novel Method for News Article Event-Based Embedding — 22/28

work page 2012

[58] [58]

Generalized hamming distance

Abraham Bookstein, Vladimir A Kulyukin, and Timo Raita. Generalized hamming distance. Information Re- trieval, 5:353–375, 2002

work page 2002

[59] [59]

Mridha, Farisa Benta Safir, Md

Abu Quwsar Ohi, M.F. Mridha, Farisa Benta Safir, Md. Abdul Hamid, and Muhammad Mostafa Monowar. Autoembedder: A semi-supervised dnn embedding system for clustering. Knowledge-Based Systems , 204:106190, 2020

work page 2020

[60] [60]

Signature verification us- ing a” siamese” time delay neural network

Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard S¨ackinger, and Roopak Shah. Signature verification us- ing a” siamese” time delay neural network. Advances in neural information processing systems, 6, 1993

work page 1993

[61] [61]

Intention detection based on siamese neural network with triplet loss

Fuji Ren and Siyuan Xue. Intention detection based on siamese neural network with triplet loss. IEEE Access, 8:82242–82254, 2020

work page 2020

[62] [62]

Facenet: A unified embedding for face recognition and clustering

Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015

work page 2015

[63] [63]

Dis- criminative learning of deep convolutional feature point descriptors

Edgar Simo-Serra, Eduard Trulls, Luis Ferraz, Iasonas Kokkinos, Pascal Fua, and Francesc Moreno-Noguer. Dis- criminative learning of deep convolutional feature point descriptors. In Proceedings of the IEEE international conference on computer vision, pages 118–126, 2015

work page 2015

[64] [64]

In Defense of the Triplet Loss for Person Re-Identification

Alexander Hermans, Lucas Beyer, and Bastian Leibe. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[65] [65]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[66] [66]

Action recognition based on discriminative embedding of actions using siamese networks

Debaditya Roy, C Krishna Mohan, and K Sri Rama Murty. Action recognition based on discriminative embedding of actions using siamese networks. In 2018 25th IEEE International Conference on Image Processing (ICIP) , pages 3473–3477. IEEE, 2018

work page 2018

[67] [67]

The relationship be- tween precision-recall and roc curves

Jesse Davis and Mark Goadrich. The relationship be- tween precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240, 2006

work page 2006

[68] [68]

The use of ranks to avoid the assump- tion of normality implicit in the analysis of variance.Jour- nal of the american statistical association, 32(200):675– 701, 1937

Milton Friedman. The use of ranks to avoid the assump- tion of normality implicit in the analysis of variance.Jour- nal of the american statistical association, 32(200):675– 701, 1937

work page 1937

[69] [69]

Distribution-free multiple compar- isons

Peter Bjorn Nemenyi. Distribution-free multiple compar- isons. Princeton University, 1963

work page 1963

[70] [70]

Red media, blue media: Evidence of ideological selectivity in media use

Shanto Iyengar and Kyu S Hahn. Red media, blue media: Evidence of ideological selectivity in media use. Journal of communication, 59(1):19–39, 2009

work page 2009

[71] [71]

The utilization of ma- chine learning algorithms for assisting physicians in the diagnosis of diabetes

Linh Phuong Nguyen, Do Dinh Tung, Duong Thanh Nguyen, Hong Nhung Le, Toan Quoc Tran, Ta Van Binh, and Dung Thuy Nguyen Pham. The utilization of ma- chine learning algorithms for assisting physicians in the diagnosis of diabetes. Diagnostics, 13(12):2087, 2023

work page 2087

[72] [72]

Statistical comparisons of classifiers over multiple data sets

Janez Demˇsar. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine learning research, 7:1–30, 2006. A Novel Method for News Article Event-Based Embedding — 23/28

work page 2006

[73] [73]

Finally, we provide further references to the GDELT raw data

Research Notes In the appendix, we present the detailed implementation notes of our method, including the running environment and the parameter settings for the deployed algorithms and Neu- ral Networks. Finally, we provide further references to the GDELT raw data. A.1 Implementation Notes Running Environment. The experiments are conducted on a single Lin...

work page 2012

[74] [74]

The single monthly trained GloVe model was not able to represent the 9.11 entity

The Table showcases two examples of how data reliant is the Glove model. The single monthly trained GloVe model was not able to represent the 9.11 entity. It was, however able to successfully encapsulate the Vladimir Putin entity, while including Alexander Litvinenko as a close neighbor. Alexander Litvinenko was a former officer of the Russian Federal Sec...

work page 2006