pith. sign in

arxiv: 1906.08693 · v1 · pith:CSNEH6PYnew · submitted 2019-06-20 · 💻 cs.IR · cs.SI

Citizens' Emotion on GST: A Spatio-Temporal Analysis over Twitter Data

Pith reviewed 2026-05-25 19:10 UTC · model grok-4.3

classification 💻 cs.IR cs.SI
keywords GSTTwitteremotion analysissentiment analysisspatio-temporal analysisNRC lexiconpublic policyIndia
0
0 comments X

The pith

Over 142,000 tweets classified by NRC lexicon map emotional responses to GST rollout over time and space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper collects tweets about the Goods and Services Tax posted in India during July 2017 and applies the NRC emotion lexicon to label them for eight basic emotions plus positive and negative sentiment. It runs a temporal analysis across 142,508 tweets and a spatial analysis across 58,613 tweets, both gathered via the Twitter streaming API. The goal is to trace how public emotions shifted in the weeks after GST implementation. This produces a record of citizen reaction that can be examined for patterns by date and by location.

Core claim

We have performed temporal analysis and spatial analysis on 1,42,508 and 58,613 tweets respectively using the National Research Council Canada (NRC) emotion Lexicon for eight basic emotions and two sentiments on tweets posted during the post-GST implementation period from July 04, 2017 to July 25, 2017.

What carries the argument

NRC emotion Lexicon applied to tweets collected via Twitter streaming API to assign scores for joy, trust, anticipation, surprise, fear, sadness, anger, disgust, positive, and negative.

If this is right

  • Policy makers obtain a dated record of emotional reaction that can be checked against specific GST rule changes.
  • Regional differences in emotion scores become visible when tweets are grouped by location.
  • The same lexicon pipeline can be rerun on later periods to measure whether emotions stabilized after the initial rollout.
  • Negative emotions such as anger or disgust can be tracked as early indicators of public resistance to the tax.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be applied to other policy events mentioned in the abstract, such as demonetization, to compare emotional signatures.
  • Location metadata in the spatial subset allows testing whether urban versus rural areas showed different emotion distributions.
  • If lexicon accuracy proves low on informal text, replacing it with a domain-specific emotion dictionary would be a direct next step.

Load-bearing premise

The NRC lexicon, developed on general text, correctly identifies the emotions expressed in short, informal tweets about a specific Indian tax policy, and the collected tweets represent the broader public's views.

What would settle it

A random sample of several hundred GST tweets manually labeled for the same eight emotions and two sentiments shows low agreement with the NRC lexicon outputs.

Figures

Figures reproduced from arXiv: 1906.08693 by Ankit Rai, Deepak Uniyal.

Figure 1
Figure 1. Figure 1: Showing the Data Preprocessing and Sentiment Analysis Process on Sample [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Showing the varying Sentiments, Emotions and Hourly Frequency of Tweets over [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Showing the Word Cloud for Top 40 Hashtags and Top 40 Mentions [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Showing the Variation of Sentiments and Emotions On Tweets Addressed To Mr. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

People might not be close-at-hand but they still are - by virtue of the social network. The social network has transformed lives in many ways. People can express their views, opinions and life experiences on various platforms be it Twitter, Facebook or any other medium there is. Such events constitute of reviewing a product or service, conveying views on political banters, predicting share prices or giving feedback on the government policies like Demonetization or GST. These social platforms can be used to investigate the insights of the emotional curve that the general public is generating. This kind of analysis can help make a product better, predict the future prospects and also to implement the public policies in a better way. Such kind of research on sentiment analysis is increasing rapidly. In this research paper, we have performed temporal analysis and spatial analysis on 1,42,508 and 58,613 tweets respectively and these tweets were posted during the post-GST implementation period from July 04, 2017 to July 25, 2017. The tweets were collected using the Twitter streaming API. A well-known lexicon, National Research Council Canada (NRC) emotion Lexicon is used for opinion mining that exhibits a blend of eight basic emotions i.e. joy, trust, anticipation, surprise, fear sadness, anger, disgust and two sentiments i.e. positive and negative for 6,554 words.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript collects 142,508 temporal and 58,613 spatial tweets posted between 4–25 July 2017 using the Twitter streaming API and applies the NRC emotion lexicon to extract eight basic emotions (joy, trust, anticipation, surprise, fear, sadness, anger, disgust) plus positive/negative sentiment for spatio-temporal analysis of public reaction to GST implementation.

Significance. If the lexicon outputs were shown to be reliable on this corpus, the work would supply a concrete, large-scale example of lexicon-based emotion tracking on policy-related social media, potentially useful for monitoring public response to fiscal reforms. The scale of the tweet collection is a modest strength, but the absence of any reported results, validation, or error analysis means the manuscript currently contributes only a methods sketch rather than a supported empirical finding.

major comments (2)
  1. [Abstract] Abstract: the text states that temporal and spatial analyses 'have been performed' on the cited tweet volumes yet supplies no quantitative results, no emotion time-series, no spatial maps, no summary statistics, and no comparison to any baseline or ground truth. The central claim therefore reduces to a description of data collection and lexicon choice rather than a demonstrated outcome.
  2. [Abstract] Abstract / Methods (lexicon application): the NRC lexicon was constructed on general English text; the manuscript provides no domain adaptation, no held-out accuracy evaluation against human labels on GST tweets, no handling of tweet-specific artifacts (hashtags, abbreviations, Hinglish, sarcasm), and no error analysis. Because the temporal curves and spatial maps rest entirely on these unvalidated labels, any systematic mismatch between lexicon and domain would render the reported patterns indistinguishable from noise.
minor comments (2)
  1. [Abstract] Abstract: Indian-style thousand separator (1,42,508) is used once and then omitted; adopt consistent international notation throughout.
  2. [Abstract] Abstract: the sentence describing the NRC lexicon ends abruptly after '6,554 words' without stating how many of those words actually appear in the collected tweets or how ties/zero-count words are handled.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments accurately identify that the submitted manuscript describes data collection and lexicon application but does not present the analysis results or any validation of the emotion labels. We will revise the manuscript accordingly to strengthen the empirical contribution.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the text states that temporal and spatial analyses 'have been performed' on the cited tweet volumes yet supplies no quantitative results, no emotion time-series, no spatial maps, no summary statistics, and no comparison to any baseline or ground truth. The central claim therefore reduces to a description of data collection and lexicon choice rather than a demonstrated outcome.

    Authors: We agree with the observation. The abstract and body state that temporal and spatial analyses were performed on the collected tweets, yet the submitted manuscript contains no quantitative results, time-series, maps, statistics, or baseline comparisons. This was an omission during preparation. In the revised manuscript we will add a dedicated results section containing the emotion time-series, spatial distribution maps, summary statistics on emotion frequencies, and any feasible comparisons to baselines or prior work. revision: yes

  2. Referee: [Abstract] Abstract / Methods (lexicon application): the NRC lexicon was constructed on general English text; the manuscript provides no domain adaptation, no held-out accuracy evaluation against human labels on GST tweets, no handling of tweet-specific artifacts (hashtags, abbreviations, Hinglish, sarcasm), and no error analysis. Because the temporal curves and spatial maps rest entirely on these unvalidated labels, any systematic mismatch between lexicon and domain would render the reported patterns indistinguishable from noise.

    Authors: The referee correctly identifies a core limitation. The NRC lexicon was applied without domain adaptation, without accuracy evaluation on GST tweets, and without explicit handling of tweet artifacts or error analysis. We will revise the methods and add a new evaluation subsection that reports results from manual annotation of a random sample of tweets (e.g., precision/recall against human labels) and a discussion of limitations arising from Hinglish, sarcasm, and abbreviations. Simple preprocessing steps for common hashtags and abbreviations will also be described. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive application of external lexicon

full rationale

The paper performs temporal and spatial analysis by applying the pre-existing NRC emotion lexicon (an external resource developed independently) to a collected set of tweets. No parameters are fitted, no predictions are generated from the data itself, no self-citations form the load-bearing justification, and no derivations reduce to the inputs by construction. The work is an application of an off-the-shelf tool to new data, with all core steps (lexicon lookup, aggregation over time/space) independent of the target results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on the untested assumption that the NRC lexicon transfers accurately to informal tweets about GST and that the sampled tweets represent public opinion. No free parameters or invented entities are introduced.

axioms (2)
  • domain assumption The NRC emotion lexicon accurately captures emotions in short, informal tweets about Indian tax policy
    The lexicon is applied directly without domain-specific validation or adaptation mentioned in the abstract.
  • domain assumption Tweets collected via the Twitter streaming API during the stated period are representative of citizens' emotions on GST
    The abstract states the collection method and counts but offers no sampling-bias discussion.

pith-pipeline@v0.9.0 · 5779 in / 1310 out tokens · 21397 ms · 2026-05-25T19:10:43.784034+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 2 internal anchors

  1. [1]

    Sentiment Analysis of Twitter Data: A Survey of Techniques

    Kharde, Vishal, and Prof Sonawane. ”Sentiment analysis of twitter data: a survey of techniques.” arXiv preprint arXiv:1601.06971 (2016)

  2. [2]

    ”Sentiment analysis algorithms and applications: A survey.” Ain Shams Engineering Journal 5, no

    Medhat, Walaa, Ahmed Hassan, and Hoda Korashy. ”Sentiment analysis algorithms and applications: A survey.” Ain Shams Engineering Journal 5, no. 4 (2014): 1093- 1113

  3. [3]

    and Toshniwal, D., 2019

    Agarwal, A. and Toshniwal, D., 2019. ”SmPFT: Social media based profile fusion technique for data enrichment. Computer Networks”, 158, pp.123-131

  4. [4]

    and Toshniwal, D., 2019

    Agarwal, A. and Toshniwal, D., 2019. ”Face off: Travel habits, Road conditions and Traffic city characteristics bared using Twitter”. IEEE Access

  5. [5]

    Sandner, and Isabell M

    Tumasjan, Andranik, Timm Oliver Sprenger, Philipp G. Sandner, and Isabell M. Welpe. ”Predicting elections with twitter: What 140 characters reveal about political sentiment.” Icwsm 10, no. 1 (2010): 178-185

  6. [6]

    Fake news detection on social media: A data mining perspective

    Shu K, Sliva A, Wang S, Tang J, Liu H. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter. 2017 Sep 1;19(1):22-36

  7. [7]

    and Chen, M., 2018, July

    Krishnan, S. and Chen, M., 2018, July. Identifying Tweets with Fake News. In 2018 IEEE International Conference on Information Reuse and Integration (IRI) (pp. 460- 464). IEEE

  8. [8]

    and Narayanan, S., 2012, July

    Wang, H., Can, D., Kazemzadeh, A., Bar, F. and Narayanan, S., 2012, July. A sys- tem for real-time twitter sentiment analysis of 2012 us presidential election cycle. In Proceedings of the ACL 2012 System Demonstrations (pp. 115-120). Association for Computational Linguistics

  9. [9]

    and Kolya, A.K., 2017, November

    Das, S. and Kolya, A.K., 2017, November. Sense GST: Text mining & sentiment anal- ysis of GST tweets by Naive Bayes algorithm. In Research in Computational Intelli- gence and Communication Networks (ICRCICN), 2017 Third International Conference on (pp. 239-244). IEEE

  10. [10]

    and Roy, S., 2018, January

    Ganguly, M. and Roy, S., 2018, January. A social network analysis of opinions on GST in India within Twitter. In Proceedings of the Workshop Program of the 19th In- ternational Conference on Distributed Computing and Networking (p. 18). ACM

  11. [11]

    and Shinde, V ., 2014

    Mane, S.B., Sawant, Y ., Kazi, S. and Shinde, V ., 2014. Real time sentiment analysis of twitter data using hadoop. IJCSIT) International Journal of Computer Science and Information Technologies, 5(3), pp.3098-3100. CITIZENS ’ EMOTION ON GST: A S PATIO-T EMPORAL ANALYSIS OVER TWITTER DATA 11

  12. [12]

    and Majhi, B., 2016, October

    Pagolu, V .S., Reddy, K.N., Panda, G. and Majhi, B., 2016, October. Sentiment analy- sis of Twitter data for predicting stock market movements. In Signal Processing, Com- munication, Power and Embedded System (SCOPES), 2016 International Conference on (pp. 1345-1350). IEEE

  13. [13]

    and Toshniwal, D., 2018, June

    Agarwal, A. and Toshniwal, D., 2018, June. Application of Lexicon Based Approach in Sentiment Analysis for short Tweets. In 2018 International Conference on Advances in Computing and Communication Engineering (ICACCE) (pp. 189-193). IEEE

  14. [14]

    and Toshniwal, D., 2018

    Agarwal, A., Singh, R. and Toshniwal, D., 2018. Geospatial sentiment analysis using twitter data for UK-EU referendum. Journal of Information and Optimization Sciences, 39(1), pp.303-317

  15. [15]

    and Mittal, A., 2015, December

    Agarwal, A., Gupta, B., Bhatt, G. and Mittal, A., 2015, December. Construction of a Semi-Automated model for FAQ Retrieval via Short Message Service. In Proceedings of the 7th Forum for Information Retrieval Evaluation (pp. 35-38). ACM

  16. [16]

    A Datamining Approach for Emotions Extraction and Discovering Cricketers performance from Stadium to Sensex

    Agarwal Amit, Brijraj Singh, Jatin Bedi, and Durga Toshniwal. ”A Datamining Ap- proach for Emotions Extraction and Discovering Cricketers performance from Stadium to Sensex.”arXiv preprint arXiv:1809.00310 (2018)