Neural Machine Translating from Natural Language to SPARQL

Dagmar Gromann; Sebastian Rudolph; Xiaoyu Yin

arxiv: 1906.09302 · v1 · pith:C7VNBMLAnew · submitted 2019-06-21 · 💻 cs.CL

Neural Machine Translating from Natural Language to SPARQL

Xiaoyu Yin , Dagmar Gromann , Sebastian Rudolph This is my paper

Pith reviewed 2026-05-25 18:37 UTC · model grok-4.3

classification 💻 cs.CL

keywords neural machine translationSPARQLnatural language to queryconvolutional neural networkknowledge graphslinked dataquery generationdeep learning

0 comments

The pith

A CNN-based neural machine translation model converts natural language questions into SPARQL queries with up to 94 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates eight neural machine translation models on the task of turning natural language questions into SPARQL queries for knowledge graphs. It reports that a convolutional neural network architecture outperforms the others when trained on large, high-quality datasets. This matters because SPARQL requires knowledge of both domain entities and query syntax that most web users lack. Success here would let non-experts directly access linked data resources without learning the query language. The work focuses on closing the gap between powerful structured query tools and everyday users.

Core claim

The paper claims that neural machine translation techniques can be applied to translate from natural language to SPARQL, with a CNN-based architecture showing the strongest results among the eight models tested, reaching a BLEU score of up to 98 and accuracy of up to 94 percent when sufficient high-quantity and high-quality training data are available.

What carries the argument

CNN-based neural machine translation architecture performing sequence-to-sequence mapping from natural language input to SPARQL output.

If this is right

Non-expert users could query linked data resources without learning SPARQL syntax or domain entity names.
Automated translation would reduce syntax errors that occur when people write SPARQL queries by hand.
The same modeling approach could scale to other structured query languages if comparable training datasets are created.
Knowledge graphs would become more usable as everyday web resources rather than tools limited to specialists.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same CNN translation setup could be retrained on pairs of natural language and SQL or other query languages.
Voice assistants could incorporate the model to let users ask database questions in spoken language.
Further gains might come from combining the model with pre-trained language models on larger general text corpora.
The method might handle more complex SPARQL features such as aggregations or optional patterns if datasets are extended accordingly.

Load-bearing premise

The datasets used for training and evaluation are of sufficiently high quantity and quality to support the claimed performance levels.

What would settle it

A test set of natural language questions phrased differently from the training data or drawn from a different knowledge graph would produce substantially lower BLEU scores and accuracy than reported.

Figures

Figures reproduced from arXiv: 1906.09302 by Dagmar Gromann, Sebastian Rudolph, Xiaoyu Yin.

**Figure 1.** Figure 1: The comparison between three NSpM models on test BLEU scores the other two attention-based models. However, when looking at accuracy, all but the attention-based and the ConvS2S models experience serious problems in producing a sequentially correctly ordered query. While we still believe that the DBNQA dataset is the best choice for training NMT models to translate from NL to SPARQL, the dataset also has o… view at source ↗

read the original abstract

SPARQL is a highly powerful query language for an ever-growing number of Linked Data resources and Knowledge Graphs. Using it requires a certain familiarity with the entities in the domain to be queried as well as expertise in the language's syntax and semantics, none of which average human web users can be assumed to possess. To overcome this limitation, automatically translating natural language questions to SPARQL queries has been a vibrant field of research. However, to this date, the vast success of deep learning methods has not yet been fully propagated to this research problem. This paper contributes to filling this gap by evaluating the utilization of eight different Neural Machine Translation (NMT) models for the task of translating from natural language to the structured query language SPARQL. While highlighting the importance of high-quantity and high-quality datasets, the results show a dominance of a CNN-based architecture with a BLEU score of up to 98 and accuracy of up to 94%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The headline BLEU and accuracy numbers cannot be assessed because the abstract gives no information on dataset size, complexity, or construction.

read the letter

The key takeaway is that this paper reports strong results for a CNN-based NMT model on natural language to SPARQL translation, but the abstract supplies no details whatsoever on the datasets, so those numbers are hard to trust. The comparison of eight models is new in its specifics but follows a standard approach. The work applies existing neural machine translation techniques to the problem of turning English questions into SPARQL queries. It tests eight different architectures and finds the CNN one on top with BLEU scores up to 98 and accuracy up to 94. It also correctly calls out that dataset quality matters a lot for this kind of task. What stands out as positive is the direct comparison across multiple models rather than just picking one. That gives a bit more insight than single-model papers. The main weakness is the complete lack of information on the data. No mention of how many examples were used for training, how the natural language and SPARQL pairs were created or checked for correctness, or what kinds of queries were involved. The stress test concern matches exactly what the abstract shows. High performance on a narrow or synthetic set would not demonstrate real capability for general use. Without error analysis or baseline comparisons beyond the eight models, it's difficult to see what the results actually mean. This paper would mainly interest people working on query interfaces for linked data. A reader looking for practical tools or solid benchmarks might find the missing pieces frustrating. I would not recommend sending this to peer review in its current state. The authors need to add the dataset descriptions, sizes, and some analysis of where the model succeeds or fails before it would be worth a referee's time.

Referee Report

2 major / 1 minor

Summary. The manuscript evaluates eight Neural Machine Translation architectures for the task of translating natural language questions into SPARQL queries. It reports that a CNN-based model dominates the others, reaching BLEU scores of up to 98 and accuracy of up to 94%, and stresses that high-quantity, high-quality datasets are critical to achieving these results.

Significance. If the headline performance numbers prove reproducible on datasets whose size, complexity distribution, and construction are documented and non-trivial, the work would provide concrete evidence that modern NMT techniques can be applied to semantic-web query generation, lowering the barrier for non-expert users of Linked Data resources.

major comments (2)

[Abstract] Abstract: the central claim that the CNN architecture 'dominates' the other seven models with BLEU up to 98 and accuracy up to 94% is presented without any description of the training-set size, test-set size, query-complexity distribution, or the procedure used to generate and validate the NL–SPARQL pairs. Because the abstract itself identifies dataset quality and quantity as decisive, the absence of these numbers renders the performance figures unverifiable.
[Abstract] Abstract: no information is supplied on the other seven NMT architectures, the training regimen, hyper-parameters, or any comparison against previously published NL-to-SPARQL baselines, so the reported dominance cannot be assessed for methodological soundness or incremental contribution.

minor comments (1)

The abstract would be clearer if it named the specific datasets or domains employed, even at a high level.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below and agree that the abstract can be strengthened for greater clarity and verifiability while preserving conciseness.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the CNN architecture 'dominates' the other seven models with BLEU up to 98 and accuracy up to 94% is presented without any description of the training-set size, test-set size, query-complexity distribution, or the procedure used to generate and validate the NL–SPARQL pairs. Because the abstract itself identifies dataset quality and quantity as decisive, the absence of these numbers renders the performance figures unverifiable.

Authors: We agree that including brief dataset statistics in the abstract would improve verifiability of the headline results. The full manuscript already documents training/test set sizes, query complexity distribution, and the NL–SPARQL pair generation/validation procedure in the Datasets and Experiments sections. In revision we will add a short clause to the abstract noting the scale of the high-quality datasets used (while retaining the existing emphasis on their importance). revision: yes
Referee: [Abstract] Abstract: no information is supplied on the other seven NMT architectures, the training regimen, hyper-parameters, or any comparison against previously published NL-to-SPARQL baselines, so the reported dominance cannot be assessed for methodological soundness or incremental contribution.

Authors: The eight architectures, training regimen, hyper-parameters, and comparisons to prior NL-to-SPARQL baselines are fully described in Sections 3 (Models), 4 (Experimental Setup), and 5 (Results). We acknowledge that the abstract could more explicitly signal the scope of the evaluation. We will revise the abstract to note that eight NMT models were compared and that the CNN variant outperformed the others, directing readers to the detailed methodology in the body. revision: yes

Circularity Check

0 steps flagged

Empirical ML evaluation contains no derivation chain or circular steps

full rationale

The paper reports experimental results from training and testing eight NMT architectures (including a CNN-based one) on NL-to-SPARQL translation tasks. No equations, first-principles derivations, parameter fits presented as predictions, or uniqueness theorems appear. Performance metrics (BLEU up to 98, accuracy up to 94) are direct outputs of model training/evaluation on the chosen datasets; they are not reduced to inputs by construction. Self-citations, if present, are not load-bearing for any claimed result. This is a standard empirical study whose central claims are falsifiable via replication on external data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical application of existing neural models; the abstract introduces no new free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5690 in / 944 out tokens · 23355 ms · 2026-05-25T18:37:55.001727+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

results show a dominance of a CNN-based architecture with a BLEU score of up to 98 and accuracy of up to 94%
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

While highlighting the importance of high-quantity and high-quality datasets

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 3 internal anchors

[1]

In: Proc

Bahdanau, D., Cho, K., Bengio, Y .: Neural Machine Translation by Jointly Learning to Align and Translate. In: Proc. 6th Int. Conf. on Learning Representations (2015)

work page 2015
[2]

In: Lang, J

Cai, R., Xu, B., Zhang, Z., Yang, X., Li, Z., Liang, Z.: An Encoder-Decoder Framework Translating Natural Language to Database Queries. In: Lang, J. (ed.) Proc. 27th Int. Joint Conf. on Artiﬁcial Intelligence. pp. 3977–3983 (2018)

work page 2018
[3]

In: Proc

Dong, L., Lapata, M.: Language to Logical Form with Neural Attention. In: Proc. 54th An- nual Meeting of the Association for Computational Linguistics. pp. 33–43 (2016)

work page 2016
[4]

In: Sack, H., Blomqvist, E., Matthieu, Ghidini, C., Ponzetto, S., Lange, C

Dubey, M., Dasgupta, S., Sharma, A., Höffner, K., Lehmann, J.: AskNow: A Framework for Natural Language Query Formalization in SPARQL. In: Sack, H., Blomqvist, E., Matthieu, Ghidini, C., Ponzetto, S., Lange, C. (eds.) Proc. 13th Extended Semantic Web Conf. (2016)

work page 2016
[5]

In: Cabrio, E., Cimiano, P., Lopez, V ., Ngomo, A.C.N., Unger, C., Walter, S

Ferré, S.: squall2sparql: a Translator from Controlled English to Full SPARQL 1.1. In: Cabrio, E., Cimiano, P., Lopez, V ., Ngomo, A.C.N., Unger, C., Walter, S. (eds.) Work. Mul- tilingual Question Answering over Linked Data. Valencia, Spain (2013)

work page 2013
[6]

In: Proc

Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y .N.: Convolutional Sequence to Sequence Learning. In: Proc. 34th Int. Conf. on Machine Learning. vol. 70, pp. 1243–1252 (2017)

work page 2017
[7]

Hartmann, A.K., Soru, T., Marx, E.: Generating a Large Dataset for Neural Question An- swering over the DBpedia Knowledge Base (2018), preprint at ResearchGate

work page 2018
[8]

In: Proc

Luong, M.T., Pham, H., Manning, C.D.: Effective Approaches to Attention-based Neural Machine Translation. In: Proc. 2015 Conf. on Empirical Methods in Natural Language Pro- cessing. pp. 1412–1421 (2015)

work page 2015
[9]

In: Proc

Luong, T., Kayser, M., Manning, C.D.: Deep Neural Language Models for Machine Transla- tion. In: Proc. 19th Conf. on Computational Natural Language Learning. pp. 305–309 (2015)

work page 2015
[10]

Semantic Parsing Natural Language into SPARQL: Improving Target Language Representation with Neural Attention

Luz, F.F., Finger, M.: Semantic Parsing Natural Language into SPARQL: Improving Target Language Representation with Neural Attention. CoRR abs/1803.04329 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[11]

In: Proc

Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a Method for Automatic Evaluation of Machine Translation. In: Proc. 40th Annual Meeting on Association for Computational Linguistics. pp. 311–318 (2002)

work page 2002
[12]

(eds.): RDF 1.1 Primer

Schreiber, G., Raimond, Y . (eds.): RDF 1.1 Primer. W3C Recommendation (24 February 2014), available at texttthttp://www.w3.org/TR/rdf11-primer/

work page 2014
[13]

CoRR abs/1708.07624 (2017)

Soru, T., Marx, E., Moussallem, D., Publio, G., Valdestilhas, A., Esteves, D., Neto, C.B.: SPARQL as a Foreign Language. CoRR abs/1708.07624 (2017)

work page arXiv 2017
[14]

In: ICML Workshop on Neural Abstract Machines & Program Induction (2018)

Soru, T., Marx, E., Valdestilhas, A., Esteves, D., Moussallem, D., Publio, G.: Neural Ma- chine Translation for Query Construction and Composition. In: ICML Workshop on Neural Abstract Machines & Program Induction (2018)

work page 2018
[15]

In: Proc

Sutskever, I., Vinyals, O., Le, Q.V .: Sequence to sequence learning with neural networks. In: Proc. 27th Ann. Conf. on Neural Information Processing Systems. pp. 3104–3112 (2014)

work page 2014
[16]

W3C Recommendation (21 March 2013), available at texttthttp://www.w3.org/TR/sparql11-overview/

The W3C SPARQL Working Group (ed.): SPARQL 1.1 Overview. W3C Recommendation (21 March 2013), available at texttthttp://www.w3.org/TR/sparql11-overview/

work page 2013
[17]

In: d’Amato, C

Trivedi, P., Maheshwari, G., Dubey, M., Lehmann, J.: Lc-quad: A Corpus for Complex Ques- tion Answering over Knowledge Graphs. In: d’Amato, C. (ed.) Proc. 16th Int. Semantic Web Conf. pp. 210–218 (2017) Neural Machine Translating from Natural Language to SPARQL 17

work page 2017
[18]

In: Dragoni, M., Solanki, M., Blomqvist, E

Usbeck, R., Ngomo, A.C.N., Haarmann, B., Krithara, A., Röder, M., Napolitano, G.: 7th Open Challenge on Question Answering over Linked Data. In: Dragoni, M., Solanki, M., Blomqvist, E. (eds.) Semantic Web Evaluation Challenge. SemWebEval. Communications in Computer and Information Science, vol. 769, pp. 59–69 (2017)

work page 2017
[19]

In: Proc

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is All You Need. In: Proc. 30th Ann. Conf. on Neural Information Processing Systems. pp. 5998–6008 (2017)

work page 2017
[20]

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Wu, Y ., Schuster, M., et al.: Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. CoRR abs/1609.08144 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[21]

Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

Zhong, V ., Xiong, C., Socher, R.: Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. CoRR abs/1709.00103 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

In: Proc

Bahdanau, D., Cho, K., Bengio, Y .: Neural Machine Translation by Jointly Learning to Align and Translate. In: Proc. 6th Int. Conf. on Learning Representations (2015)

work page 2015

[2] [2]

In: Lang, J

Cai, R., Xu, B., Zhang, Z., Yang, X., Li, Z., Liang, Z.: An Encoder-Decoder Framework Translating Natural Language to Database Queries. In: Lang, J. (ed.) Proc. 27th Int. Joint Conf. on Artiﬁcial Intelligence. pp. 3977–3983 (2018)

work page 2018

[3] [3]

In: Proc

Dong, L., Lapata, M.: Language to Logical Form with Neural Attention. In: Proc. 54th An- nual Meeting of the Association for Computational Linguistics. pp. 33–43 (2016)

work page 2016

[4] [4]

In: Sack, H., Blomqvist, E., Matthieu, Ghidini, C., Ponzetto, S., Lange, C

Dubey, M., Dasgupta, S., Sharma, A., Höffner, K., Lehmann, J.: AskNow: A Framework for Natural Language Query Formalization in SPARQL. In: Sack, H., Blomqvist, E., Matthieu, Ghidini, C., Ponzetto, S., Lange, C. (eds.) Proc. 13th Extended Semantic Web Conf. (2016)

work page 2016

[5] [5]

In: Cabrio, E., Cimiano, P., Lopez, V ., Ngomo, A.C.N., Unger, C., Walter, S

Ferré, S.: squall2sparql: a Translator from Controlled English to Full SPARQL 1.1. In: Cabrio, E., Cimiano, P., Lopez, V ., Ngomo, A.C.N., Unger, C., Walter, S. (eds.) Work. Mul- tilingual Question Answering over Linked Data. Valencia, Spain (2013)

work page 2013

[6] [6]

In: Proc

Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y .N.: Convolutional Sequence to Sequence Learning. In: Proc. 34th Int. Conf. on Machine Learning. vol. 70, pp. 1243–1252 (2017)

work page 2017

[7] [7]

Hartmann, A.K., Soru, T., Marx, E.: Generating a Large Dataset for Neural Question An- swering over the DBpedia Knowledge Base (2018), preprint at ResearchGate

work page 2018

[8] [8]

In: Proc

Luong, M.T., Pham, H., Manning, C.D.: Effective Approaches to Attention-based Neural Machine Translation. In: Proc. 2015 Conf. on Empirical Methods in Natural Language Pro- cessing. pp. 1412–1421 (2015)

work page 2015

[9] [9]

In: Proc

Luong, T., Kayser, M., Manning, C.D.: Deep Neural Language Models for Machine Transla- tion. In: Proc. 19th Conf. on Computational Natural Language Learning. pp. 305–309 (2015)

work page 2015

[10] [10]

Semantic Parsing Natural Language into SPARQL: Improving Target Language Representation with Neural Attention

Luz, F.F., Finger, M.: Semantic Parsing Natural Language into SPARQL: Improving Target Language Representation with Neural Attention. CoRR abs/1803.04329 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[11] [11]

In: Proc

Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a Method for Automatic Evaluation of Machine Translation. In: Proc. 40th Annual Meeting on Association for Computational Linguistics. pp. 311–318 (2002)

work page 2002

[12] [12]

(eds.): RDF 1.1 Primer

Schreiber, G., Raimond, Y . (eds.): RDF 1.1 Primer. W3C Recommendation (24 February 2014), available at texttthttp://www.w3.org/TR/rdf11-primer/

work page 2014

[13] [13]

CoRR abs/1708.07624 (2017)

Soru, T., Marx, E., Moussallem, D., Publio, G., Valdestilhas, A., Esteves, D., Neto, C.B.: SPARQL as a Foreign Language. CoRR abs/1708.07624 (2017)

work page arXiv 2017

[14] [14]

In: ICML Workshop on Neural Abstract Machines & Program Induction (2018)

Soru, T., Marx, E., Valdestilhas, A., Esteves, D., Moussallem, D., Publio, G.: Neural Ma- chine Translation for Query Construction and Composition. In: ICML Workshop on Neural Abstract Machines & Program Induction (2018)

work page 2018

[15] [15]

In: Proc

Sutskever, I., Vinyals, O., Le, Q.V .: Sequence to sequence learning with neural networks. In: Proc. 27th Ann. Conf. on Neural Information Processing Systems. pp. 3104–3112 (2014)

work page 2014

[16] [16]

W3C Recommendation (21 March 2013), available at texttthttp://www.w3.org/TR/sparql11-overview/

The W3C SPARQL Working Group (ed.): SPARQL 1.1 Overview. W3C Recommendation (21 March 2013), available at texttthttp://www.w3.org/TR/sparql11-overview/

work page 2013

[17] [17]

In: d’Amato, C

Trivedi, P., Maheshwari, G., Dubey, M., Lehmann, J.: Lc-quad: A Corpus for Complex Ques- tion Answering over Knowledge Graphs. In: d’Amato, C. (ed.) Proc. 16th Int. Semantic Web Conf. pp. 210–218 (2017) Neural Machine Translating from Natural Language to SPARQL 17

work page 2017

[18] [18]

In: Dragoni, M., Solanki, M., Blomqvist, E

Usbeck, R., Ngomo, A.C.N., Haarmann, B., Krithara, A., Röder, M., Napolitano, G.: 7th Open Challenge on Question Answering over Linked Data. In: Dragoni, M., Solanki, M., Blomqvist, E. (eds.) Semantic Web Evaluation Challenge. SemWebEval. Communications in Computer and Information Science, vol. 769, pp. 59–69 (2017)

work page 2017

[19] [19]

In: Proc

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is All You Need. In: Proc. 30th Ann. Conf. on Neural Information Processing Systems. pp. 5998–6008 (2017)

work page 2017

[20] [20]

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Wu, Y ., Schuster, M., et al.: Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. CoRR abs/1609.08144 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[21] [21]

Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

Zhong, V ., Xiong, C., Socher, R.: Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. CoRR abs/1709.00103 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017