pith. sign in

arxiv: 1907.01642 · v1 · pith:KB4S7VXDnew · submitted 2019-06-28 · 💻 cs.IR · cs.CL· cs.DL

Introducing MathQA -- A Math-Aware Question Answering System

Pith reviewed 2026-05-25 13:38 UTC · model grok-4.3

classification 💻 cs.IR cs.CLcs.DL
keywords math-aware question answeringWikidataSymPyformula retrievalnatural language querycomputable expressions
0
0 comments X

The pith

MathQA answers natural language questions with a single computable formula drawn from Wikidata.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an open-source question answering system that accepts math questions in English or Hindi and returns one formula from Wikidata. It connects the formula to SymPy so that users can supply values for its variables and obtain numeric results, while constants are loaded directly from the knowledge base. A user study showed the system beating a commercial computational engine by 13 percent. The authors also added a process for suggesting new formulas to Wikidata editors and found that a simple rule identified the correct formula 80 percent of the time.

Core claim

MathQA is built on Ask Platypus and returns a single mathematical formula from Wikidata for a given natural language question. The system then converts the formula into a computable expression using SymPy, lets users enter numeric values for variables, and pulls constant values from Wikidata. In a user study the system outperformed a commercial computational mathematical knowledge engine by 13 percent. Because only a few Wikidata items contained formulas at the start, the authors created a suggestion workflow for editors and showed that the heuristic of taking the first formula in an article produced correct suggestions 80 percent of the time.

What carries the argument

The Ask Platypus pipeline extended with Wikidata formula lookup and SymPy evaluation, which converts a natural-language query into a single executable mathematical expression.

If this is right

  • Users receive a formula they can immediately evaluate by supplying variable values.
  • Constants in the formula are replaced with values taken from Wikidata.
  • The same pipeline works for questions written in Hindi.
  • Formula suggestions to Wikidata editors can be validated with an 80 percent success rate using the first-formula heuristic.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Increasing the number of formulas stored in Wikidata would raise the fraction of questions the system can answer.
  • The same lookup-plus-SymPy pattern could be applied to other public knowledge bases that store mathematical expressions.
  • The 80 percent heuristic for validating formula edits offers a low-cost starting point for importing more mathematical content into Wikidata.

Load-bearing premise

The performance of the system heavily depends on the size and quality of the formula data available in Wikidata.

What would settle it

A user study on the same or similar question set in which MathQA does not outperform the commercial engine by 13 percent.

Figures

Figures reproduced from arXiv: 1907.01642 by Bela Gipp, Felix Hamborg, Kaushal Dudhat, Moritz Schubotz, Philipp Scharpf, Yash Nagar.

Figure 1
Figure 1. Figure 1: Screenshot of MathQA the user to perform arithmetic operations using the retrieved for￾mula. We developed three modules: The Question Parsing Module (1) transforms questions into a triple representation and produces a simplified dependency tree. The Formula Retrieval Module (2) then queries the Wikidata knowledge-base for the requested formula and presents the result to the user. The user can subsequently … view at source ↗
Figure 2
Figure 2. Figure 2: Wikidata statement terminology illustrated by an [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Workflow of our extraction and loading process. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: MathQA GUI for English (a) and Hindi (b) ques [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
read the original abstract

We present an open source math-aware Question Answering System based on Ask Platypus. Our system returns as a single mathematical formula for a natural language question in English or Hindi. This formulae originate from the knowledge-base Wikidata. We translate these formulae to computable data by integrating the calculation engine sympy into our system. This way, users can enter numeric values for the variables occurring in the formula. Moreover, the system loads numeric values for constants occurring in the formula from Wikidata. In a user study, our system outperformed a commercial computational mathematical knowledge engine by 13%. However, the performance of our system heavily depends on the size and quality of the formula data available in Wikidata. Since only a few items in Wikidata contained formulae when we started the project, we facilitated the import process by suggesting formula edits to Wikidata editors. With the simple heuristic that the first formula is significant for the article, 80% of the suggestions were correct.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents MathQA, an open-source math-aware QA system extending Ask Platypus. For English or Hindi natural-language questions it returns a single mathematical formula drawn from Wikidata; SymPy is integrated so that users can supply numeric values for variables while constants are populated from Wikidata. The central empirical claim is that the system outperformed a commercial computational mathematical knowledge engine by 13 % in a user study. The authors also report that a simple heuristic (first formula is the significant one) achieved 80 % accuracy when suggesting formula edits to Wikidata editors.

Significance. If the user-study result is reproducible, the work would illustrate a practical route for open, structured knowledge bases to support specialized mathematical QA while adding symbolic computation. The open-source release and the explicit data-contribution effort to Wikidata are positive contributions. The acknowledged dependence on Wikidata coverage, however, limits generalizability.

major comments (2)
  1. [Abstract] Abstract: the headline claim that the system 'outperformed a commercial computational mathematical knowledge engine by 13 %' in a user study supplies no information on study size, participant recruitment, question-selection protocol, how queries were issued to the baseline engine, scoring rubric (correctness vs. preference vs. latency), blinding, or any statistical test. Because this 13 % margin is the primary performance result, the absence of these details renders the central claim only weakly supported.
  2. [Evaluation (implied by abstract claim)] The manuscript contains no error analysis, confusion matrix, or per-question breakdown comparing MathQA successes and failures against the commercial baseline; without such data it is impossible to determine whether the reported margin reflects answer correctness, response time, or user preference.
minor comments (2)
  1. [Abstract] Abstract, sentence 3: 'This formulae originate …' is grammatically incorrect; 'These formulae originate …' or 'The formulae originate …' is required.
  2. The description of SymPy integration does not specify how variable names are extracted from Wikidata formulas or how type mismatches between Wikidata constants and SymPy expectations are handled.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our evaluation. We address the two major comments below and will revise the manuscript to strengthen the presentation of the user study results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claim that the system 'outperformed a commercial computational mathematical knowledge engine by 13 %' in a user study supplies no information on study size, participant recruitment, question-selection protocol, how queries were issued to the baseline engine, scoring rubric (correctness vs. preference vs. latency), blinding, or any statistical test. Because this 13 % margin is the primary performance result, the absence of these details renders the central claim only weakly supported.

    Authors: We agree that the manuscript provides insufficient methodological detail to fully support the reported 13% margin. In the revision we will expand the abstract and add a dedicated subsection describing the user study, including participant count and recruitment, question selection criteria, the protocol for submitting queries to the commercial baseline, the preference-based scoring rubric, any blinding procedures used, and the statistical test applied. This will allow readers to better assess the result. revision: yes

  2. Referee: [Evaluation (implied by abstract claim)] The manuscript contains no error analysis, confusion matrix, or per-question breakdown comparing MathQA successes and failures against the commercial baseline; without such data it is impossible to determine whether the reported margin reflects answer correctness, response time, or user preference.

    Authors: We concur that a more granular analysis would improve the paper. We will add a qualitative error analysis section that categorizes the questions according to user preference patterns and discusses factors such as formula coverage and computation support that influenced outcomes. A full quantitative confusion matrix or exhaustive per-question breakdown cannot be supplied, because the study collected only aggregate preference scores rather than item-level logs; we will nevertheless include the most detailed breakdown feasible from the existing data. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external user study and Wikidata data

full rationale

The paper describes a system architecture (Ask Platypus base + Wikidata formula retrieval + SymPy integration) and reports two empirical observations: (1) 80% correctness of a simple heuristic for formula suggestions, and (2) 13% outperformance versus a commercial engine in a user study. Neither observation is derived from the system definition itself; both are presented as measured outcomes against external baselines. No equations, fitted parameters renamed as predictions, self-citation chains, or uniqueness theorems appear. The explicit caveat that performance depends on external Wikidata quality further confirms the claims are not self-referential.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the external availability and quality of formulas in Wikidata plus the correctness of the user-study comparison; no new free parameters, axioms beyond standard domain assumptions, or invented entities are introduced.

axioms (1)
  • domain assumption Wikidata contains or can be extended with mathematical formulas of sufficient quality for QA use
    Explicitly stated in the abstract as the factor on which system performance depends.

pith-pipeline@v0.9.0 · 5714 in / 1150 out tokens · 25653 ms · 2026-05-25T13:38:01.093897+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

  1. [1]

    S. S. (Google). Wikidata:Primary sources tool. https://www. wikidata.org/wiki/Wikidata:Primary_sources_tool. Ac- cessed: 2018-04-11

  2. [2]

    B. K. et al. START Natural Language Question Answering System. start.csail.mit.edu. Accessed: 2018-04-11

  3. [3]

    user Bene

    W. user Bene. Ask Wikidata! - Wikimedia Tool Labs . https: //tools.wmflabs.org/bene/ask/. Accessed: 2018-04-11

  4. [4]

    1. G. contributors. Pywikibot: Python library to automate work on MediaWiki sites . https://www.mediawiki.org/wiki/ Manual:Pywikibot. Accessed: 2018-04-11

  5. [5]

    math. wikipedia. org: A vision for a collaborative semi-formal, language independent math (s) encyclopedia

    J. Corneli and M. Schubotz. “math. wikipedia. org: A vision for a collaborative semi-formal, language independent math (s) encyclopedia”. In: Proc. CAITP (2017)

  6. [6]

    Towards Better Visual Tools for Exploring Wikipedia Article Development–The Use Case of

    F. FlÃűck et al. “Towards Better Visual Tools for Exploring Wikipedia Article Development–The Use Case of "gamergate controversy"”. In: Ninth International AAAI Conference on Web and Social Media. 2015, p48–55

  7. [7]

    Foundation

    W. Foundation. Wikidata/Notes/Requirements. https://meta. wikimedia.org/wiki/Wikidata/Notes/Requirements. Ac- cessed: 2018-04-11

  8. [8]

    Foundation

    W. Foundation. Wikipedia, The Free Encyclopedia . Accessed: 2018-04-11

  9. [9]

    Exploiting A Controlled Vocabulary to Improve Collection Selection and Retrieval Effectiveness

    J. C. French et al. “Exploiting A Controlled Vocabulary to Improve Collection Selection and Retrieval Effectiveness”. In: Proceedings of the 2001 ACM CIKM International Conference on Information and Knowledge Management, Atlanta, Georgia, USA, November 5-10, 2001 . ACM, 2001, pp. 199–206. doi: 10.1145/502585.502619

  10. [10]

    T. S. N. L. P. Group. Stanford Parser. https://nlp.stanford.edu/ software/lex-parser.shtml. Accessed: 2018-04-11

  11. [11]

    Natural language ques- tion answering: the view from here

    L. Hirschman and R. J. Gaizauskas. “Natural language ques- tion answering: the view from here”. In:Natural Language Engineering 7.4 (2001), pp. 275–300.doi: 10.1017/S1351324901002807

  12. [12]

    WebProtÃľgÃľ: a collaborative Web-based platform for editing biomedical ontologies

    M. Horridge et al. “WebProtÃľgÃľ: a collaborative Web-based platform for editing biomedical ontologies”. In: Bioinformat- ics 30.16 (2014), pp. 2384–2385. doi: 10.1093/bioinformatics/ btu256

  13. [13]

    A Search Engine for Mathematical Formulae

    M. Kohlhase and I. Sucan. “A Search Engine for Mathematical Formulae”. In: Artificial Intelligence and Symbolic Computa- tion, 8th International Conference, AISC 2006, Beijing, China, September 20-22, 2006, Proceedings . Ed. by J. Calmet, T. Ida, and D. Wang. Vol. 4120. Lecture Notes in Computer Science. Springer, 2006, pp. 241–253. doi: 10.1007/11856290_...

  14. [14]

    Semantic Wikipedia

    M. KrÃűtzsch et al. “Semantic Wikipedia”. In: J. Web Sem. 5.4 (2007), pp. 251–261. doi: 10.1016/j.websem.2007.09.001

  15. [15]

    Scaling question answering to the web

    C. C. T. Kwok, O. Etzioni, and D. S. Weld. “Scaling question answering to the web”. In: ACM Trans. Inf. Syst. 19.3 (2001), pp. 242–262. doi: 10.1145/502115.502117

  16. [16]

    Beyond Information Retrieval - Medical Ques- tion Answering

    M. Lee et al. “Beyond Information Retrieval - Medical Ques- tion Answering”. In: AMIA. AMIA, 2006

  17. [17]

    The Wiki way: quick collabo- ration on the Web

    B. Leuf and W. Cunningham. “The Wiki way: quick collabo- ration on the Web”. In: (2001)

  18. [18]

    by Lexistems SAS and E

    S. by Lexistems SAS and E. de Lyon. Ask Platypus. https: //askplatyp.us/. Accessed: 2018-04-11

  19. [19]

    An open-source toolkit for mining Wikipedia

    D. N. Milne and I. H. Witten. “An open-source toolkit for mining Wikipedia”. In: Artif. Intell. 194 (2013), pp. 222–239. doi: 10.1016/j.artint.2012.06.007

  20. [20]

    Collaborative knowl- edge building with wikis: The impact of redundancy and po- larity

    J. Moskaliuk, J. Kimmerle, and U. Cress. “Collaborative knowl- edge building with wikis: The impact of redundancy and po- larity”. In:Computers & Education 58.4 (2012), pp. 1049–1057. doi: 10.1016/j.compedu.2011.11.024

  21. [21]

    H. O. the Net Foundation (HON). HON’s Question Answering tool. https : / / www. hon . ch,http : / / services . hon . ch / cgi - bin/QA10/qa.pl. Accessed: 2018-04-11

  22. [22]

    Mathematical Language Process- ing Project

    R. Pagel and M. Schubotz. “Mathematical Language Process- ing Project”. In: Joint Proceedings of the MathUI, OpenMath and ThEdu Workshops and Work in Progress track at CICM co-located with Conferences on Intelligent Computer Mathe- matics (CICM 2014), Coimbra, Portugal, July 7-11, 2014. Ed. by M. England et al. Vol. 1186. CEUR Workshop Proceedings. CEUR-W...

  23. [23]

    Mining coreference relations between formulas and text using Wikipedia

    M. N. Quoc et al. “Mining coreference relations between formulas and text using Wikipedia”. In: Proceedings of the Second Workshop on NLP Challenges in the Information Ex- plosion Era (NLPIX 2010) . 2010, pp. 69–74

  24. [24]

    Evaluating Web-based Question Answer- ing Systems

    D. R. Radev et al. “Evaluating Web-based Question Answer- ing Systems”. In: Proceedings of the Third International Con- ference on Language Resources and Evaluation, LREC 2002, May 29-31, 2002, Las Palmas, Canary Islands, Spain. European Language Resources Association, 2002

  25. [25]

    Improving retrieval performance by relevance feedback

    G. Salton and C. Buckley. “Improving retrieval performance by relevance feedback”. In: Readings in information retrieval 24.5 (1997), pp. 355–363

  26. [26]

    Introducing New Features to Wikipedia: Case Studies for Web Science

    M. Schindler and D. Vrandeccic. “Introducing New Features to Wikipedia: Case Studies for Web Science”. In:IEEE Intelli- gent Systems 26.1 (2011), pp. 56–61. doi: 10.1109/MIS.2011.17

  27. [27]

    Exploring the One-brain Barrier: A Man- ual Contribution to the NTCIR-12 MathIR Task

    M. Schubotz et al. “Exploring the One-brain Barrier: A Man- ual Contribution to the NTCIR-12 MathIR Task”. In: Pro- ceedings of the 12th NTCIR Conference on Evaluation of In- formation Access Technologies, National Center of Sciences, Tokyo, Japan, June 7-10, 2016 . Ed. by N. Kando, T. Sakai, and M. Sanderson. National Institute of Informatics (NII), 2016

  28. [28]

    Answering English questions by computer: a survey

    R. F. Simmons. “Answering English questions by computer: a survey”. In: Commun. ACM 8.1 (1965), pp. 53–70. doi: 10. 1145/363707.363732

  29. [29]

    S. D. Team. SymPy: Python library for symbolic mathematics . http://www.sympy.org. Accessed: 2018-04-11

  30. [30]

    TrollbÃďck

    A. TrollbÃďck. LaTeX to SymPy parser. https://github.com/ augustt198/latex2sympy. Accessed: 2018-04-11

  31. [31]

    The TREC question answering track

    E. M. Voorhees. “The TREC question answering track”. In: Natural Language Engineering 7.4 (2001), pp. 361–378. doi: 10.1017/S1351324901002789

  32. [32]

    Wikidata: a free collabora- tive knowledgebase

    D. Vrandecic and M. KrÃűtzsch. “Wikidata: a free collabora- tive knowledgebase”. In:Commun. ACM 57.10 (2014), pp. 78–

  33. [33]

    doi: 10.1145/2629489

  34. [34]

    Contextual analysis of mathematical ex- pressions for advanced mathematical search

    K. Yokoi et al. “Contextual analysis of mathematical ex- pressions for advanced mathematical search”. In: Polibits 43 (2011), pp. 81–86

  35. [35]

    A web-based question answering system

    D. Zhang and W. S. Lee. “A web-based question answering system”. In: (2003). Introducing MathQA - A Math-Aware Question Answering System JCDL ’18, June 3–7, 2018, Fort Worth, TX, USA Listing 1: Use the following BibTeX code to cite this article @InProceedings { Schubotz2018b , author = { Moritz Schubotz and Philipp Scharpf and Kaushal Dudhat and Yash Naga...