pith. sign in

arxiv: 1907.04149 · v1 · pith:WLXCCBPAnew · submitted 2019-07-04 · 💻 cs.CL · cs.IR

Answer Extraction for Why Arabic Questions Answering Systems: EWAQ

Pith reviewed 2026-05-25 09:11 UTC · model grok-4.3

classification 💻 cs.CL cs.IR
keywords Arabic question answeringwhy questionstextual entailmentanswer extractionEWAQArabic QA systems
0
0 comments X

The pith

Textual entailment metrics improve accuracy in extracting answers to Arabic why-questions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops EWAQ, a system that uses entailment-based similarity to extract answers for why-questions in Arabic from passages retrieved by search engines. It re-ranks the passages and scores potential answers using textual entailment to pick the most likely correct one. This approach addresses the scarcity of Arabic QA systems focused on why-questions. A sympathetic reader would care because it offers a way to get direct, accurate answers to explanatory questions in Arabic, where general search engines fall short. The results show higher accuracy compared to Yahoo, Google, and Ask.com on a manual test set.

Core claim

The EWAQ system extracts the answer only to why questions by scoring each answer with entailment metrics and ranking them according to their scores. When compared with search engines like yahoo, google and ask.com using a manual test set, EWAQ shows increased accuracy by implementing the textual entailment in re-ranking the retrieved relevant passages and deciding the correct answer.

What carries the argument

Entailment metrics for re-ranking retrieved passages and selecting the highest-scoring answer for why-questions.

If this is right

  • The accuracy of answer extraction for why-questions in Arabic increases with the use of entailment-based similarity.
  • EWAQ outperforms established web-based QA systems like Yahoo, Google, and Ask.com on manual test data.
  • Textual entailment can be used to tackle the answer extraction module in Arabic language QA systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same entailment re-ranking step might be tested on other question types such as how or what in Arabic.
  • Similar scoring could be applied to non-English languages that lack specialized QA resources.
  • Placing entailment scoring earlier in the pipeline before full passage retrieval might change the accuracy gains.

Load-bearing premise

Textual entailment metrics applied to passages from general search engines can reliably identify the correct answers for why-questions in Arabic.

What would settle it

Running EWAQ and the search engines on a new independent set of Arabic why-questions and finding no accuracy improvement over the baselines would falsify the central claim.

Figures

Figures reproduced from arXiv: 1907.04149 by Fatima T. AL-Khawaldeh.

Figure 1
Figure 1. Figure 1: The obtained accuracy results of yahoo, google, ask and EWAQ systems VI. CONCLUSIONS In this paper, it has been presented an approach for enhancing accuracy of Arabic why questions answering systems called EWAQ. The main objectives of EWAQ system is to improving the re-ranking passages relevant and retrieved by search engines (Yahoo, Google, Ask). The process of re-ranking the retrieved passages is based o… view at source ↗
read the original abstract

With the increasing amount of web information, questions answering systems becomes very important to allow users to access to direct answers for their requests. This paper presents an Arabic Questions Answering Systems based on entailment metrics. The type of questions which this paper focuses on is why questions. There are many reasons lead us to develop this system: generally, the lack of Arabic Questions Answering Systems and scarcity Arabic Questions Answering Systems which focus on why questions. The goal of the proposed system in this research is to extract answers from re-ranked retrieved passages which are retrieved by search engines. This system extracts the answer only to why questions. This system is called by EWAQ: Entailment based Why Arabic Questions Answering. Each answer is scored with entailment metrics and ranked according to their scores in order to determine the most possible correct answer. EWAQ is compared with search engines: yahoo, google and ask.com, the well-established web-based Questions Answering systems, using manual test set. In EWAQ experiments, it is showed that the accuracy is increased by implementing the textual entailment in re-raking the retrieved relevant passages by search engines and deciding the correct answer. The obtained results show that using entailment based similarity can help significantly to tackle the why Answer Extraction module in Arabic language.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper presents EWAQ, an Arabic QA system focused on why-questions. It retrieves candidate passages via general web search engines (Google, Yahoo, Ask.com), then applies textual entailment metrics to re-rank the passages and extract the highest-scoring answer; the abstract claims this yields higher accuracy than the search engines alone on a manual test set.

Significance. If the performance claims were substantiated with quantitative results, the work would address a documented gap in Arabic QA resources for explanatory questions by showing that off-the-shelf entailment can improve answer extraction over raw retrieval. The approach is simple and leverages existing tools, which could be useful if the Arabic-specific issues are handled.

major comments (3)
  1. [Abstract] Abstract: the central claim that 'the accuracy is increased by implementing the textual entailment' and that entailment 'can help significantly' is unsupported; no accuracy numbers, no test-set size or construction details, no baseline scores, and no error analysis are supplied anywhere in the manuscript.
  2. [Method] Method / Experiments: no description is given of the textual entailment model or metric applied to Arabic text, any Arabic-specific preprocessing or adaptation, or the precise scoring formula used to rank passages; without these the re-ranking claim cannot be reproduced or evaluated.
  3. [Experiments] Evaluation: the comparison to search engines is described only at the level of naming the engines; there is no account of how answers are extracted from the re-ranked passages, how ties or non-entailing passages are handled, or any statistical test of improvement.
minor comments (3)
  1. [Abstract] Abstract contains multiple grammatical and phrasing errors ('questions answering systems becomes', 'scarcity Arabic Questions Answering Systems', 'access to direct answers for their requests').
  2. [Abstract] The system acronym expansion 'Entailment based Why Arabic Questions Answering' is inconsistent with the title phrasing.
  3. [Introduction] No references to prior Arabic QA or RTE work are mentioned in the provided text, leaving the novelty claim unanchored.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We agree that the submitted manuscript lacks sufficient detail to support its claims and will revise accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'the accuracy is increased by implementing the textual entailment' and that entailment 'can help significantly' is unsupported; no accuracy numbers, no test-set size or construction details, no baseline scores, and no error analysis are supplied anywhere in the manuscript.

    Authors: The referee correctly notes that the abstract and body provide no numerical results, test-set details, baselines or error analysis. We will revise the abstract and add a full Experiments section reporting accuracy figures, test-set size and construction, direct baseline comparisons, and error analysis. revision: yes

  2. Referee: [Method] Method / Experiments: no description is given of the textual entailment model or metric applied to Arabic text, any Arabic-specific preprocessing or adaptation, or the precise scoring formula used to rank passages; without these the re-ranking claim cannot be reproduced or evaluated.

    Authors: We acknowledge the method section omits the entailment model, Arabic preprocessing steps and exact scoring formula. The revised manuscript will supply these details, including the metric employed, any language-specific adaptations, and the passage-ranking formula. revision: yes

  3. Referee: [Experiments] Evaluation: the comparison to search engines is described only at the level of naming the engines; there is no account of how answers are extracted from the re-ranked passages, how ties or non-entailing passages are handled, or any statistical test of improvement.

    Authors: The evaluation description is indeed limited. We will expand it to explain answer extraction from re-ranked passages, tie-breaking and non-entailment handling, and any statistical tests of improvement over the search-engine baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system evaluation on external test set with no self-referential reductions.

full rationale

The paper describes an Arabic why-QA pipeline that retrieves passages via third-party search engines (Google, Yahoo, Ask.com) then re-ranks them with off-the-shelf textual entailment metrics. The central claim is an observed accuracy lift on a manually constructed test set. No equations, fitted parameters, or uniqueness theorems are presented; the performance numbers are direct measurements against an external baseline rather than quantities defined by the same data or by prior self-citations. The derivation chain therefore contains no self-definitional, fitted-input, or self-citation-load-bearing steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on the abstract alone, the central claim rests on the untested premise that entailment metrics will work for Arabic why-answer extraction; no free parameters, invented entities, or additional axioms are stated.

axioms (1)
  • domain assumption Textual entailment metrics can be used to rank candidate answers for why-questions in Arabic.
    This premise is required for the re-ranking and answer selection step to succeed.

pith-pipeline@v0.9.0 · 5758 in / 1180 out tokens · 49841 ms · 2026-05-25T09:11:33.786925+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

  1. [1]

    What do You Mean? Finding Answers to Complex Questions

    Diekema A., Yilmazel O., Chen J., Harwell S., Liddy E. and He L.," What do You Mean? Finding Answers to Complex Questions", In Maybury, M.T. (Ed.) New Directions in Question Answering. The MIT Press, pp. 141-152, 2004

  2. [2]

    Natural language QA: the view from here

    Hirschman L. and Gaizauskas R.," Natural language QA: the view from here", Natural Language Engineering, vol. 7(4), pp.275-300, 2001

  3. [3]

    the TREC -8 QA Track Report,

    Voorhees E., "the TREC -8 QA Track Report, "In Proceedings of the Eighth Text Retrieval Conference (TREC-8), 2000

  4. [4]

    Finding an answer based on the recognition of the issue focus

    Ferret O., Grau B. and Huraults-Plantet M., Illouz G. Monceaux, L., Robba I., Vilnat A. “Finding an answer based on the recognition of the issue focus", In Proceedings ofTREC-10, 2001

  5. [5]

    and NègreCross S.,“Lingual QA using QRISTAL for CLEF 2006, Lecture Notes in Computer Science, Vol

    Laurent D., Seguela P. and NègreCross S.,“Lingual QA using QRISTAL for CLEF 2006, Lecture Notes in Computer Science, Vol. 4730, pp. 339-350,2007

  6. [6]

    ASK website: http//:www.Ask.com- Last visited-April, 2015

  7. [7]

    A knowledge -based Arabic QA System (AQAS),

    Mohammed F., Nasser K. and Harb H., “A knowledge -based Arabic QA System (AQAS),” In Proceedings of ACM SIGART Bulletin, pp. 21-33, 1993

  8. [8]

    QARAB: A QA System to Support the Arabic Language

    Hammo B., Abu-Salem H. and Lytinen S., “QARAB: A QA System to Support the Arabic Language”. In Proceedings of the workshop on computational approaches to Semitic languages, pp. 55 -65, Philadelphia, 2002

  9. [9]

    Implementation of the ArabiQA QA System's components

    Benajiba Y., Rosso P. and Lyhyaoui A. “Implementation of the ArabiQA QA System's components”, In Proceedings of Workshop on Arabic Natural Language Processing, 2nd Information Communication Technologies Int. Symposium, ICTIS, 2007

  10. [10]

    nooj website: http://www.nooj4nlp- Last visited-April, 2015

  11. [11]

    An Arabic Question- Answering system for factoid questions

    Brini W., Ellouze M. Mesfar S. and Belguith L. “An Arabic Question- Answering system for factoid questions”,In Proceedings of IEEE International Conference on Natural Langu age Processing and Knowledge Engineering, 2009

  12. [12]

    ARQA High-Performance Arabic QA System

    Badawy O., Shaheen M. and Hamadene A. “ARQA High-Performance Arabic QA System”, In Proceedings of Arabic Language Technology International Conference, pp. 129- 136, 2011

  13. [13]

    QArabPro: A Rule Based Question Answering System for Reading Comprehension Tests in Arabic

    Akour M., Abufardeh S., Magel K. and Al -Radaideh Q.," QArabPro: A Rule Based Question Answering System for Reading Comprehension Tests in Arabic", American Journal of Applied Sciences, vol. 8 (6), pp. 652-661, 2011

  14. [14]

    Development of Yes/No ArabicQA System

    Bdour W, Gharaibeh N., "Development of Yes/No ArabicQA System ", International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4(1), pp. 51-63, 2013

  15. [15]

    The Pascal recognizi ng textual entailment challenge

    Dagan, I., Oren G., and Magnini B.," The Pascal recognizi ng textual entailment challenge", In Proceedings of the PASCAL Challenges Workshop on Recognizing Textual Entailment, 2005

  16. [16]

    Question Answering via Bayesian Inference on Lexical Relations

    Ramakrishnan G., Jadhav A., Joshi A., Chakrabarti, S. and Bhattacharyya P.," Question Answering via Bayesian Inference on Lexical Relations” , In Proceeding of ACL Workshop Multilingual Summarization and Question Answering, pp. 1-10, 2003

  17. [17]

    Global WordNet website: http://globalwordnet.org/arabic -wordnet- Last visited-April, 2015

  18. [18]

    Effects of stop words elimination for arabic information retrieval: a comparative study

    Abu-Elkhair I., "Effects of stop words elimination for arabic information retrieval: a comparative study", International Journal of Computing and Information Science, vol. 4(3), pp 119-133, 2006

  19. [19]

    Implementation of a new hybrid method for stemming of Arabic text

    Dilekh T., and Behloul A., "Implementation of a new hybrid method for stemming of Arabic text", International Journal of Computer Applications, vol.46 (8), 2012

  20. [20]

    Lexical Cohesion and Entailment based Segmentation for Arabic Text Summarization (LCEAS),

    AL -Khawaldeh F., Samawi V., "Lexical Cohesion and Entailment based Segmentation for Arabic Text Summarization (LCEAS)," The World of Computer Science and Information Technology Journal (WSCIT), Vol. 5(3), pp. 51, 60, 2015

  21. [21]

    Entailment–based Linear Segmentation in Summarization

    Tatar D., Mihis A., and Lupsa D., "Entailment–based Linear Segmentation in Summarization", International Journal of Software Engineering and Knowledge Engineering vol. 19(80), pp. 1023–1038, 2009. 63.27 66.19 61.48 68.53 56 58 60 62 64 66 68 70THE ACCURACY THE SYSTEMS ASK GOOGLE YAHOO EWAQ