A Novel Approach for Detection and Ranking of Trendy and Emerging Cyber Threat Events in Twitter Streams

Avishek Bose; Carlos Aguirre; Vahid Behzadan; William H. Hsu

arxiv: 1907.07768 · v1 · pith:QUPX243Knew · submitted 2019-07-12 · 💻 cs.IR · cs.CR· cs.LG· cs.SI· stat.ML

A Novel Approach for Detection and Ranking of Trendy and Emerging Cyber Threat Events in Twitter Streams

Avishek Bose , Vahid Behzadan , Carlos Aguirre , William H. Hsu This is my paper

Pith reviewed 2026-05-24 21:57 UTC · model grok-4.3

classification 💻 cs.IR cs.CRcs.LGcs.SIstat.ML

keywords cyber threat detectionTwitter streamsevent detectionunsupervised machine learningnamed entity extractionuser influencenovelty detectiontrend ranking

0 comments

The pith

An unsupervised machine learning method detects novel and developing cyber threat events in Twitter streams and ranks them by importance score.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an unsupervised machine learning approach combined with text information extraction to identify cyber threat events on Twitter that are novel, meaning previously non-extant, or developing, meaning they gain significance through similarity to earlier events. It distinguishes these categories via similarity measures and produces rankings by extracting named entities and keywords from tweets, then weighting noun phrases according to the influence of the posting users. The approach treats novelty and trendiness together rather than as separate criteria. Evaluation measures the method's efficiency and detection error rate over time intervals against labels from human annotators.

Core claim

The central claim is that an unsupervised machine learning approach can detect both novel cyber threat events (previously non-extant) and developing ones (marked by significance with respect to similarity with a previously detected event) in Twitter streams, while enabling ranking of events based on an importance score derived from tweet terms characterized as named entities, keywords, or both, with noun phrases weighted in proportion to user influence.

What carries the argument

Unsupervised machine learning for event detection that uses similarity measures to classify events as novel or developing, paired with named entity and keyword extraction weighted by imputed user influence to produce ranked importance scores.

If this is right

Events can be ranked by an importance score that incorporates both content extraction and user influence.
Novel and developing events are identified together as a holistic measure rather than independent criteria.
Detection performance can be quantified by efficiency and error rate relative to human ground truth over specified time intervals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the similarity-based distinction works, the same pipeline could be tested on other social media streams for non-cyber events such as product launches or public health signals.
The ranking mechanism suggests a way to prioritize alerts for security teams by combining textual features with user reach.
Extending the time-interval evaluation to live streaming data would test whether the approach scales to real-time use.

Load-bearing premise

Similarity measures applied to previously detected events can reliably distinguish novel events from developing ones, and human annotator labels supply accurate ground truth for measuring performance.

What would settle it

A controlled Twitter stream containing known cyber threat events where the method's novelty-versus-developing classifications and importance rankings disagree with consensus labels from multiple independent cybersecurity experts reviewing the same data.

Figures

Figures reproduced from arXiv: 1907.07768 by Avishek Bose, Carlos Aguirre, Vahid Behzadan, William H. Hsu.

**Figure 1.** Figure 1: Graphical representation of commonSet, keywordSet and namedEntitySet TABLE I Summery Result of five time intervals; NT:Number of Tweets; JT: Just Trendy; TN: Trendy and Novel; FS: First Story; TE: Total Number of Events Interval NT JT TN FS TE 1 145 0 1 14 15 2 314 0 0 50 50 3 812 1 7 37 45 4 1239 0 9 18 27 5 297 4 0 5 11 the result of five time intervals collectively from 2018-08- 30 23:00:08 to 2018-09-… view at source ↗

**Figure 2.** Figure 2: Flowchart of the proposed approach [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Event plot of the second time interval proposed approach [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

We present a new machine learning and text information extraction approach to detection of cyber threat events in Twitter that are novel (previously non-extant) and developing (marked by significance with respect to similarity with a previously detected event). While some existing approaches to event detection measure novelty and trendiness, typically as independent criteria and occasionally as a holistic measure, this work focuses on detecting both novel and developing events using an unsupervised machine learning approach. Furthermore, our proposed approach enables the ranking of cyber threat events based on an importance score by extracting the tweet terms that are characterized as named entities, keywords, or both. We also impute influence to users in order to assign a weighted score to noun phrases in proportion to user influence and the corresponding event scores for named entities and keywords. To evaluate the performance of our proposed approach, we measure the efficiency and detection error rate for events over a specified time interval, relative to human annotator ground truth.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper outlines a Twitter pipeline that detects cyber threat events with unsupervised ML then splits them into novel versus developing based on similarity, ranks them via named entities plus user influence, but stays at a high-level description without specifics or numbers.

read the letter

The main takeaway is a combined system for spotting both brand-new cyber threat discussions on Twitter and ones that are growing in significance. It starts with unsupervised learning to find events, uses similarity to prior detections to label them novel or developing, pulls named entities and keywords for ranking, and weights those by user influence scores. Evaluation is planned around efficiency and error rate against human labels over time windows. This is a practical engineering assembly rather than a new theoretical result. The integration of novelty and trend detection in one unsupervised flow is a reasonable choice for threat monitoring work, and weighting by user influence adds a sensible filter for prioritizing real signals. The abstract does not spell out the clustering method, the exact similarity measure, or any quantitative outcomes, which leaves the actual performance unclear. Human ground truth for event significance can be noisy, and without baseline comparisons or reported error rates it is hard to judge whether the distinction between novel and developing events holds up better than simpler approaches. The paper is aimed at applied people in cybersecurity and information retrieval who build monitoring tools. A reader wanting formal proofs, large-scale reproducible benchmarks, or a clear advance over prior event detection work will not get much here. The idea is coherent enough on its own terms to go to peer review if the full manuscript supplies the missing method details and results; otherwise it reads as an incremental pipeline description.

Referee Report

0 major / 1 minor

Summary. The paper presents an unsupervised machine learning and text information extraction pipeline for detecting cyber threat events in Twitter streams. It identifies events, classifies them as novel (previously non-extant) or developing (significant similarity to prior detections), ranks them via an importance score derived from named entities, keywords, and user-influence-weighted noun phrases, and evaluates efficiency plus detection error rate against human annotator ground truth over a time interval.

Significance. If the pipeline performs as described, it offers a practical, integrated system for real-time monitoring of emerging cyber threats on social media by jointly handling novelty assessment, trend detection, and ranked output; this could be useful for operational cybersecurity applications where unsupervised operation and human-comparable error rates are priorities.

minor comments (1)

The abstract provides only high-level descriptions of the machine learning approach, similarity measures, and ranking formulas; without the specific algorithms, distance functions, or weighting equations from the full manuscript, the claims cannot be fully assessed for internal consistency or reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their summary of the manuscript and for noting the potential operational value of an integrated unsupervised pipeline that jointly handles novelty assessment, trend detection, and ranked output for cyber threat events. We are pleased that the significance assessment highlights the practical utility for real-time monitoring where unsupervised operation and human-comparable error rates are priorities. No specific major comments were listed in the report, so we have no revisions to propose at this stage. We remain available to provide any additional clarifications or details that would help resolve the 'uncertain' recommendation.

Circularity Check

0 steps flagged

No significant circularity; empirical pipeline with external evaluation

full rationale

The paper presents an unsupervised ML pipeline for Twitter event detection that classifies events as novel or developing via similarity to prior detections and ranks them using named-entity/keyword extraction plus user influence weighting. Evaluation relies on efficiency metrics and error rates against independent human annotator ground truth. No equations, parameter fits, derivations, or self-citation chains appear in the abstract or described method; the central claims do not reduce to inputs by construction. This is a standard applied pipeline whose validity rests on external benchmarks rather than internal redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; full methods, equations, and data details unavailable, so ledger is necessarily incomplete.

axioms (1)

domain assumption Human annotator ground truth provides reliable labels for evaluating event detection performance over time intervals.
Stated in abstract as basis for measuring efficiency and detection error rate.

pith-pipeline@v0.9.0 · 5712 in / 1194 out tokens · 22260 ms · 2026-05-24T21:57:35.690648+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 1 internal anchor

[1]

Real-Time Novel Event Detection from Social Media,

Q. Li, A. Nourbakhsh, S. Shah and X. Liu, “Real-Time Novel Event Detection from Social Media,” 2017 IEEE 33rd International Conference on Data Engineering (ICDE) , San Diego, CA, 2017, pp. 1129-1139. doi: 10.1109/ICDE.2017.157

work page doi:10.1109/icde.2017.157 2017
[2]

Emerging topic detection on Twitter based on temporal and social terms evaluation

Mario Cataldi, Luigi Di Caro, and Claudio Schifanella. 2010. “ Emerging topic detection on Twitter based on temporal and social terms evaluation”, In Proceedings of the Tenth International Workshop on Multimedia Data Mining (MDMKDD ’10). ACM, New York, NY , USA, Article 4, 10 pages. DOI: https://doi.org/10.1145/1814245.1814249

work page doi:10.1145/1814245.1814249 2010
[3]

Twitinfo: aggregat- ing and visualizing microblogs for event exploration

Adam Marcus, Michael S. Bernstein, Osama Badar, David R. Karger, Samuel Madden, and Robert C. Miller. 2011.,“Twitinfo: aggregat- ing and visualizing microblogs for event exploration.”, In Proceed- ings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11). ACM , New York, NY , USA, 227-236. DOI: https://doi.org/10.1145/1978942.1978975

work page doi:10.1145/1978942.1978975 2011
[4]

Developing a Twitter-based trafﬁc event detection model using deep learning architectures

Sina Dabiri, Kevin Heaslip,“Developing a Twitter-based trafﬁc event detection model using deep learning architectures”, Expert Systems with Applications, V olume 118, 2019, Pages 425-439, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2018.10.017

work page doi:10.1016/j.eswa.2018.10.017 2019
[5]

Using Deep Neural Networks to Translate Multi-lingual Threat Intelligence

P. Ranade, S. Mittal, A. Joshi and K. Joshi,“Using Deep Neural Networks to Translate Multi-lingual Threat Intelligence”, 2018 IEEE International Conference on Intelligence and Security Informatics (ISI) , Miami, FL, 2018, pp. 238-243. doi: 10.1109/ISI.2018.8587374

work page doi:10.1109/isi.2018.8587374 2018
[6]

Event detection and analysis on short text messages

A. Edouard,“Event detection and analysis on short text messages”, Universit Cte d’Azur, 2017

work page 2017
[7]

New Event Detect Based on LDA and Correlation of Subject Terms

W. Li and Y . Huang,“New Event Detect Based on LDA and Correlation of Subject Terms”, 2011 International Conference on Internet Technology and Applications, Wuhan, 2011, pp. 1-4. doi: 10.1109/ITAP.2011.6006301

work page doi:10.1109/itap.2011.6006301 2011
[8]

On-line trend anal- ysis with topic models:# twitter trends detection topic model online

Lau, Jey Han, Nigel Collier, and Timothy Baldwin.“On-line trend anal- ysis with topic models:# twitter trends detection topic model online.”, Proceedings of COLING , 2012 (2012): 1519-1534

work page 2012
[9]

Crowdsourcing Cybersecu- rity: Cyber Attack Detection using Social Media

Rupinder Paul Khandpur, Taoran Ji, Steve Jan, Gang Wang, Chang- Tien Lu, and Naren Ramakrishnan. 2017.“Crowdsourcing Cybersecu- rity: Cyber Attack Detection using Social Media”, In Proceedings of the 2017 ACM on Conference on Information and Knowledge Man- agement (CIKM ’17). ACM, New York, NY , USA, 1049-1057. DOI: https://doi.org/10.1145/3132847.3132866

work page doi:10.1145/3132847.3132866 2017
[10]

Twitter-scale new event detection via k-term hashing

Wurzer, Dominik, Victor Lavrenko, and Miles Osborne.“Twitter-scale new event detection via k-term hashing.” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing , pp. 2584-2589

work page 2015
[11]

Sec-buzzer: cyber security emerging topic mining with open threat intelligence retrieval and timeline event annotation

K.-C. Lee, C.-H. Hsieh, L.-J. Wei, C.-H. Mao, J.-H. Dai, and Y .- T. Kuang,“Sec-buzzer: cyber security emerging topic mining with open threat intelligence retrieval and timeline event annotation”, Soft Comput- ing, vol. 21, no. 11, pp. 28832896, 2017

work page 2017
[12]

Discover: Mining online chatter for emerg- ing cyber threats

Sapienza, Anna, Sindhu Kiranmai Ernala, Alessandro Bessi, Kristina Lerman, and Emilio Ferrara. “Discover: Mining online chatter for emerg- ing cyber threats.” Companion of the The Web Conference 2018 on The Web Conference 2018 , pp. 983-990. International World Wide Web Conferences Steering Committee, 2018

work page 2018
[13]

SONAR: Automatic Detection of Cyber Secu- rity Events over the Twitter Stream

Quentin Le Sceller, ElMouatez Billah Karbab, Mourad Debbabi, and Farkhund Iqbal. 2017.“SONAR: Automatic Detection of Cyber Secu- rity Events over the Twitter Stream.” Proceedings of the 12th Inter- national Conference on Availability, Reliability and Security (ARES ’17). ACM , New York, NY , USA, Article 23, 11 pages. DOI: https://doi.org/10.1145/3098954.3098992

work page doi:10.1145/3098954.3098992 2017
[14]

Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Xiaojing Liao, Kan Yuan, XiaoFeng Wang, Zhou Li, Luyi Xing, and Raheem Beyah. 2016.“Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence.” Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16). ACM , New York, NY , USA, 755-

work page 2016
[15]

DOI: https://doi.org/10.1145/2976749.2978315

work page doi:10.1145/2976749.2978315
[16]

Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering

Ifrim, Georgiana, Bichen Shi, and Igor Brigadir.“Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering.” In SNOW-DC@ WWW, pp. 33-40. 2014

work page 2014
[17]

Weakly Supervised Extraction of Computer Security Events from Twitter

Alan Ritter, Evan Wright, William Casey, and Tom Mitchell. 2015.“Weakly Supervised Extraction of Computer Security Events from Twitter.” n Proceedings of the 24th International Conference on World Wide Web (WWW ’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 896-

work page 2015
[18]

DOI: https://doi.org/10.1145/2736277.2741083

work page doi:10.1145/2736277.2741083
[19]

Cyberthreat discovery in open source intelli- gence using deep learning techniques

Branco, Eunice Picareta.“Cyberthreat discovery in open source intelli- gence using deep learning techniques.” PhD dissertation, 2017

work page 2017
[20]

https://github.com/behzadanksu/cybertweets

work page
[21]

TextRazor-2019;https://www.textrazor.com/

work page 2019
[22]

Textrank: Bringing order into text

Mihalcea, Rada, and Paul Tarau. “Textrank: Bringing order into text.” Proceedings of the 2004 conference on empirical methods in natural language processing. 2004

work page 2004
[23]

The PageRank citation ranking: Bringing order to the web

Page, Lawrence, Sergey Brin, Rajeev Motwani, and Terry Winograd. “The PageRank citation ranking: Bringing order to the web”. Stanford InfoLab, 1999

work page 1999
[24]

A density-based algorithm for discovering clusters in large spatial databases with noise

Ester, Martin, Hans-Peter Kriegel, Jrg Sander, and Xiaowei Xu. “A density-based algorithm for discovering clusters in large spatial databases with noise.” Kdd, vol. 96 , no. 34, pp. 226-231. 1996

work page 1996
[25]

Wu and R

H. Wu and R. Luk and K. Wong and K. Kwok. “Interpreting TF- IDF term weights as making relevance decisions. ACM Transactions on Information Systems, 26 (3). 2008

work page 2008
[26]

SymSpell 6.4

Wolf Garbe ¡wolf.garbe@faroo.com¿,“SymSpell 6.4”, https://github.com/wolfgarbe/symspell

work page
[27]

Corpus and Deep Learning Classiﬁer for Collection of Cyber Threat Indicators in Twitter Stream

Behzadan, Vahid, Carlos Aguirre, Avishek Bose, and William Hsu. “Corpus and Deep Learning Classiﬁer for Collection of Cyber Threat Indicators in Twitter Stream”. 2018 IEEE International Conference on Big Data (Big Data) , pp. 5002-5007. IEEE, 2018

work page 2018
[28]

Software Framework for Topic Modelling with Large Corpora

Radim rehurek and Petr Sojka“Software Framework for Topic Modelling with Large Corpora”, Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks ,pages 45–50, May 22, 2010; DOI: http://is.muni.cz/publication/884893/en

work page 2010
[29]

Scikit-learn: Machine Learning in Python

Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V . and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. and Weiss, R. and Dubourg, V . and Vanderplas, J. and Passos, A. and Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E., “Scikit-learn: Machine Learning in Python” Journal of Machine Learning Research , volume 1...

work page 2011
[30]

Distributed representations of sentences and documents

Le, Quoc, and Tomas Mikolov. “Distributed representations of sentences and documents.” In International conference on machine learning , pp. 1188-1196. 2014

work page 2014
[31]

Latent Dirichlet Allocation

Blei, David M.; Ng, Andrew Y .; Jordan, Michael I (January 2003). Lafferty, John (ed.). “Latent Dirichlet Allocation”. Journal of Machine Learning Research. 3 (45): pp. 9931022

work page 2003
[32]

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean.“Efﬁcient es- timation of word representations in vector space”. CoRR, abs/1301.3781, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[33]

Potential adjustments to Streaming API sample volumes

Andy Piper, “Potential adjustments to Streaming API sample volumes”, https://twittercommunity.com/t/potential-adjustments-to-streaming-api- sample-volumes/31628, Feb 2, 2015

work page 2015

[1] [1]

Real-Time Novel Event Detection from Social Media,

Q. Li, A. Nourbakhsh, S. Shah and X. Liu, “Real-Time Novel Event Detection from Social Media,” 2017 IEEE 33rd International Conference on Data Engineering (ICDE) , San Diego, CA, 2017, pp. 1129-1139. doi: 10.1109/ICDE.2017.157

work page doi:10.1109/icde.2017.157 2017

[2] [2]

Emerging topic detection on Twitter based on temporal and social terms evaluation

Mario Cataldi, Luigi Di Caro, and Claudio Schifanella. 2010. “ Emerging topic detection on Twitter based on temporal and social terms evaluation”, In Proceedings of the Tenth International Workshop on Multimedia Data Mining (MDMKDD ’10). ACM, New York, NY , USA, Article 4, 10 pages. DOI: https://doi.org/10.1145/1814245.1814249

work page doi:10.1145/1814245.1814249 2010

[3] [3]

Twitinfo: aggregat- ing and visualizing microblogs for event exploration

Adam Marcus, Michael S. Bernstein, Osama Badar, David R. Karger, Samuel Madden, and Robert C. Miller. 2011.,“Twitinfo: aggregat- ing and visualizing microblogs for event exploration.”, In Proceed- ings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’11). ACM , New York, NY , USA, 227-236. DOI: https://doi.org/10.1145/1978942.1978975

work page doi:10.1145/1978942.1978975 2011

[4] [4]

Developing a Twitter-based trafﬁc event detection model using deep learning architectures

Sina Dabiri, Kevin Heaslip,“Developing a Twitter-based trafﬁc event detection model using deep learning architectures”, Expert Systems with Applications, V olume 118, 2019, Pages 425-439, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2018.10.017

work page doi:10.1016/j.eswa.2018.10.017 2019

[5] [5]

Using Deep Neural Networks to Translate Multi-lingual Threat Intelligence

P. Ranade, S. Mittal, A. Joshi and K. Joshi,“Using Deep Neural Networks to Translate Multi-lingual Threat Intelligence”, 2018 IEEE International Conference on Intelligence and Security Informatics (ISI) , Miami, FL, 2018, pp. 238-243. doi: 10.1109/ISI.2018.8587374

work page doi:10.1109/isi.2018.8587374 2018

[6] [6]

Event detection and analysis on short text messages

A. Edouard,“Event detection and analysis on short text messages”, Universit Cte d’Azur, 2017

work page 2017

[7] [7]

New Event Detect Based on LDA and Correlation of Subject Terms

W. Li and Y . Huang,“New Event Detect Based on LDA and Correlation of Subject Terms”, 2011 International Conference on Internet Technology and Applications, Wuhan, 2011, pp. 1-4. doi: 10.1109/ITAP.2011.6006301

work page doi:10.1109/itap.2011.6006301 2011

[8] [8]

On-line trend anal- ysis with topic models:# twitter trends detection topic model online

Lau, Jey Han, Nigel Collier, and Timothy Baldwin.“On-line trend anal- ysis with topic models:# twitter trends detection topic model online.”, Proceedings of COLING , 2012 (2012): 1519-1534

work page 2012

[9] [9]

Crowdsourcing Cybersecu- rity: Cyber Attack Detection using Social Media

Rupinder Paul Khandpur, Taoran Ji, Steve Jan, Gang Wang, Chang- Tien Lu, and Naren Ramakrishnan. 2017.“Crowdsourcing Cybersecu- rity: Cyber Attack Detection using Social Media”, In Proceedings of the 2017 ACM on Conference on Information and Knowledge Man- agement (CIKM ’17). ACM, New York, NY , USA, 1049-1057. DOI: https://doi.org/10.1145/3132847.3132866

work page doi:10.1145/3132847.3132866 2017

[10] [10]

Twitter-scale new event detection via k-term hashing

Wurzer, Dominik, Victor Lavrenko, and Miles Osborne.“Twitter-scale new event detection via k-term hashing.” Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing , pp. 2584-2589

work page 2015

[11] [11]

Sec-buzzer: cyber security emerging topic mining with open threat intelligence retrieval and timeline event annotation

K.-C. Lee, C.-H. Hsieh, L.-J. Wei, C.-H. Mao, J.-H. Dai, and Y .- T. Kuang,“Sec-buzzer: cyber security emerging topic mining with open threat intelligence retrieval and timeline event annotation”, Soft Comput- ing, vol. 21, no. 11, pp. 28832896, 2017

work page 2017

[12] [12]

Discover: Mining online chatter for emerg- ing cyber threats

Sapienza, Anna, Sindhu Kiranmai Ernala, Alessandro Bessi, Kristina Lerman, and Emilio Ferrara. “Discover: Mining online chatter for emerg- ing cyber threats.” Companion of the The Web Conference 2018 on The Web Conference 2018 , pp. 983-990. International World Wide Web Conferences Steering Committee, 2018

work page 2018

[13] [13]

SONAR: Automatic Detection of Cyber Secu- rity Events over the Twitter Stream

Quentin Le Sceller, ElMouatez Billah Karbab, Mourad Debbabi, and Farkhund Iqbal. 2017.“SONAR: Automatic Detection of Cyber Secu- rity Events over the Twitter Stream.” Proceedings of the 12th Inter- national Conference on Availability, Reliability and Security (ARES ’17). ACM , New York, NY , USA, Article 23, 11 pages. DOI: https://doi.org/10.1145/3098954.3098992

work page doi:10.1145/3098954.3098992 2017

[14] [14]

Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Xiaojing Liao, Kan Yuan, XiaoFeng Wang, Zhou Li, Luyi Xing, and Raheem Beyah. 2016.“Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence.” Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16). ACM , New York, NY , USA, 755-

work page 2016

[15] [15]

DOI: https://doi.org/10.1145/2976749.2978315

work page doi:10.1145/2976749.2978315

[16] [16]

Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering

Ifrim, Georgiana, Bichen Shi, and Igor Brigadir.“Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering.” In SNOW-DC@ WWW, pp. 33-40. 2014

work page 2014

[17] [17]

Weakly Supervised Extraction of Computer Security Events from Twitter

Alan Ritter, Evan Wright, William Casey, and Tom Mitchell. 2015.“Weakly Supervised Extraction of Computer Security Events from Twitter.” n Proceedings of the 24th International Conference on World Wide Web (WWW ’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 896-

work page 2015

[18] [18]

DOI: https://doi.org/10.1145/2736277.2741083

work page doi:10.1145/2736277.2741083

[19] [19]

Cyberthreat discovery in open source intelli- gence using deep learning techniques

Branco, Eunice Picareta.“Cyberthreat discovery in open source intelli- gence using deep learning techniques.” PhD dissertation, 2017

work page 2017

[20] [20]

https://github.com/behzadanksu/cybertweets

work page

[21] [21]

TextRazor-2019;https://www.textrazor.com/

work page 2019

[22] [22]

Textrank: Bringing order into text

Mihalcea, Rada, and Paul Tarau. “Textrank: Bringing order into text.” Proceedings of the 2004 conference on empirical methods in natural language processing. 2004

work page 2004

[23] [23]

The PageRank citation ranking: Bringing order to the web

Page, Lawrence, Sergey Brin, Rajeev Motwani, and Terry Winograd. “The PageRank citation ranking: Bringing order to the web”. Stanford InfoLab, 1999

work page 1999

[24] [24]

A density-based algorithm for discovering clusters in large spatial databases with noise

Ester, Martin, Hans-Peter Kriegel, Jrg Sander, and Xiaowei Xu. “A density-based algorithm for discovering clusters in large spatial databases with noise.” Kdd, vol. 96 , no. 34, pp. 226-231. 1996

work page 1996

[25] [25]

Wu and R

H. Wu and R. Luk and K. Wong and K. Kwok. “Interpreting TF- IDF term weights as making relevance decisions. ACM Transactions on Information Systems, 26 (3). 2008

work page 2008

[26] [26]

SymSpell 6.4

Wolf Garbe ¡wolf.garbe@faroo.com¿,“SymSpell 6.4”, https://github.com/wolfgarbe/symspell

work page

[27] [27]

Corpus and Deep Learning Classiﬁer for Collection of Cyber Threat Indicators in Twitter Stream

Behzadan, Vahid, Carlos Aguirre, Avishek Bose, and William Hsu. “Corpus and Deep Learning Classiﬁer for Collection of Cyber Threat Indicators in Twitter Stream”. 2018 IEEE International Conference on Big Data (Big Data) , pp. 5002-5007. IEEE, 2018

work page 2018

[28] [28]

Software Framework for Topic Modelling with Large Corpora

Radim rehurek and Petr Sojka“Software Framework for Topic Modelling with Large Corpora”, Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks ,pages 45–50, May 22, 2010; DOI: http://is.muni.cz/publication/884893/en

work page 2010

[29] [29]

Scikit-learn: Machine Learning in Python

Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V . and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P. and Weiss, R. and Dubourg, V . and Vanderplas, J. and Passos, A. and Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E., “Scikit-learn: Machine Learning in Python” Journal of Machine Learning Research , volume 1...

work page 2011

[30] [30]

Distributed representations of sentences and documents

Le, Quoc, and Tomas Mikolov. “Distributed representations of sentences and documents.” In International conference on machine learning , pp. 1188-1196. 2014

work page 2014

[31] [31]

Latent Dirichlet Allocation

Blei, David M.; Ng, Andrew Y .; Jordan, Michael I (January 2003). Lafferty, John (ed.). “Latent Dirichlet Allocation”. Journal of Machine Learning Research. 3 (45): pp. 9931022

work page 2003

[32] [32]

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean.“Efﬁcient es- timation of word representations in vector space”. CoRR, abs/1301.3781, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[33] [33]

Potential adjustments to Streaming API sample volumes

Andy Piper, “Potential adjustments to Streaming API sample volumes”, https://twittercommunity.com/t/potential-adjustments-to-streaming-api- sample-volumes/31628, Feb 2, 2015

work page 2015