Answer Extraction for Why Arabic Questions Answering Systems: EWAQ

Fatima T. AL-Khawaldeh

arxiv: 1907.04149 · v1 · pith:WLXCCBPAnew · submitted 2019-07-04 · 💻 cs.CL · cs.IR

Answer Extraction for Why Arabic Questions Answering Systems: EWAQ

Fatima T. AL-Khawaldeh This is my paper

Pith reviewed 2026-05-25 09:11 UTC · model grok-4.3

classification 💻 cs.CL cs.IR

keywords Arabic question answeringwhy questionstextual entailmentanswer extractionEWAQArabic QA systems

0 comments

The pith

Textual entailment metrics improve accuracy in extracting answers to Arabic why-questions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops EWAQ, a system that uses entailment-based similarity to extract answers for why-questions in Arabic from passages retrieved by search engines. It re-ranks the passages and scores potential answers using textual entailment to pick the most likely correct one. This approach addresses the scarcity of Arabic QA systems focused on why-questions. A sympathetic reader would care because it offers a way to get direct, accurate answers to explanatory questions in Arabic, where general search engines fall short. The results show higher accuracy compared to Yahoo, Google, and Ask.com on a manual test set.

Core claim

The EWAQ system extracts the answer only to why questions by scoring each answer with entailment metrics and ranking them according to their scores. When compared with search engines like yahoo, google and ask.com using a manual test set, EWAQ shows increased accuracy by implementing the textual entailment in re-ranking the retrieved relevant passages and deciding the correct answer.

What carries the argument

Entailment metrics for re-ranking retrieved passages and selecting the highest-scoring answer for why-questions.

If this is right

The accuracy of answer extraction for why-questions in Arabic increases with the use of entailment-based similarity.
EWAQ outperforms established web-based QA systems like Yahoo, Google, and Ask.com on manual test data.
Textual entailment can be used to tackle the answer extraction module in Arabic language QA systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same entailment re-ranking step might be tested on other question types such as how or what in Arabic.
Similar scoring could be applied to non-English languages that lack specialized QA resources.
Placing entailment scoring earlier in the pipeline before full passage retrieval might change the accuracy gains.

Load-bearing premise

Textual entailment metrics applied to passages from general search engines can reliably identify the correct answers for why-questions in Arabic.

What would settle it

Running EWAQ and the search engines on a new independent set of Arabic why-questions and finding no accuracy improvement over the baselines would falsify the central claim.

Figures

Figures reproduced from arXiv: 1907.04149 by Fatima T. AL-Khawaldeh.

**Figure 1.** Figure 1: The obtained accuracy results of yahoo, google, ask and EWAQ systems VI. CONCLUSIONS In this paper, it has been presented an approach for enhancing accuracy of Arabic why questions answering systems called EWAQ. The main objectives of EWAQ system is to improving the re-ranking passages relevant and retrieved by search engines (Yahoo, Google, Ask). The process of re-ranking the retrieved passages is based o… view at source ↗

read the original abstract

With the increasing amount of web information, questions answering systems becomes very important to allow users to access to direct answers for their requests. This paper presents an Arabic Questions Answering Systems based on entailment metrics. The type of questions which this paper focuses on is why questions. There are many reasons lead us to develop this system: generally, the lack of Arabic Questions Answering Systems and scarcity Arabic Questions Answering Systems which focus on why questions. The goal of the proposed system in this research is to extract answers from re-ranked retrieved passages which are retrieved by search engines. This system extracts the answer only to why questions. This system is called by EWAQ: Entailment based Why Arabic Questions Answering. Each answer is scored with entailment metrics and ranked according to their scores in order to determine the most possible correct answer. EWAQ is compared with search engines: yahoo, google and ask.com, the well-established web-based Questions Answering systems, using manual test set. In EWAQ experiments, it is showed that the accuracy is increased by implementing the textual entailment in re-raking the retrieved relevant passages by search engines and deciding the correct answer. The obtained results show that using entailment based similarity can help significantly to tackle the why Answer Extraction module in Arabic language.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EWAQ re-ranks search-engine passages for Arabic why-questions via textual entailment, but the abstract gives no numbers, test-set details, or baselines to support the accuracy claim.

read the letter

The paper describes EWAQ, a system that pulls passages from general search engines and then scores them with entailment metrics to pick answers for Arabic why-questions. It targets a narrow but real gap: Arabic QA resources are thin, and why-questions need explanatory rather than factoid matching. The core move is to treat entailment as a re-ranker on top of raw web results, which is a direct extension of existing RTE techniques to this language and question type. That part is clear and addresses a documented scarcity of Arabic why-QA work. The execution, however, stays at the level of description. The abstract states that accuracy increased and that entailment helps significantly, yet supplies no quantitative scores, no test-set size, no construction method, no error analysis, and no direct comparison numbers against the three search engines. Without those, the central claim cannot be checked. The assumption that off-the-shelf entailment will reliably surface causal explanations rather than topical overlap also sits untested in the given text, especially for Arabic morphology and noisy web passages. Readers already working on Arabic or non-English QA might note the entailment angle as a possible direction, but the lack of evaluation data makes the paper too thin for serious use or citation right now. It does not look ready for peer review until the results section is added with concrete numbers and a reproducible test set.

Referee Report

3 major / 3 minor

Summary. The paper presents EWAQ, an Arabic QA system focused on why-questions. It retrieves candidate passages via general web search engines (Google, Yahoo, Ask.com), then applies textual entailment metrics to re-rank the passages and extract the highest-scoring answer; the abstract claims this yields higher accuracy than the search engines alone on a manual test set.

Significance. If the performance claims were substantiated with quantitative results, the work would address a documented gap in Arabic QA resources for explanatory questions by showing that off-the-shelf entailment can improve answer extraction over raw retrieval. The approach is simple and leverages existing tools, which could be useful if the Arabic-specific issues are handled.

major comments (3)

[Abstract] Abstract: the central claim that 'the accuracy is increased by implementing the textual entailment' and that entailment 'can help significantly' is unsupported; no accuracy numbers, no test-set size or construction details, no baseline scores, and no error analysis are supplied anywhere in the manuscript.
[Method] Method / Experiments: no description is given of the textual entailment model or metric applied to Arabic text, any Arabic-specific preprocessing or adaptation, or the precise scoring formula used to rank passages; without these the re-ranking claim cannot be reproduced or evaluated.
[Experiments] Evaluation: the comparison to search engines is described only at the level of naming the engines; there is no account of how answers are extracted from the re-ranked passages, how ties or non-entailing passages are handled, or any statistical test of improvement.

minor comments (3)

[Abstract] Abstract contains multiple grammatical and phrasing errors ('questions answering systems becomes', 'scarcity Arabic Questions Answering Systems', 'access to direct answers for their requests').
[Abstract] The system acronym expansion 'Entailment based Why Arabic Questions Answering' is inconsistent with the title phrasing.
[Introduction] No references to prior Arabic QA or RTE work are mentioned in the provided text, leaving the novelty claim unanchored.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We agree that the submitted manuscript lacks sufficient detail to support its claims and will revise accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'the accuracy is increased by implementing the textual entailment' and that entailment 'can help significantly' is unsupported; no accuracy numbers, no test-set size or construction details, no baseline scores, and no error analysis are supplied anywhere in the manuscript.

Authors: The referee correctly notes that the abstract and body provide no numerical results, test-set details, baselines or error analysis. We will revise the abstract and add a full Experiments section reporting accuracy figures, test-set size and construction, direct baseline comparisons, and error analysis. revision: yes
Referee: [Method] Method / Experiments: no description is given of the textual entailment model or metric applied to Arabic text, any Arabic-specific preprocessing or adaptation, or the precise scoring formula used to rank passages; without these the re-ranking claim cannot be reproduced or evaluated.

Authors: We acknowledge the method section omits the entailment model, Arabic preprocessing steps and exact scoring formula. The revised manuscript will supply these details, including the metric employed, any language-specific adaptations, and the passage-ranking formula. revision: yes
Referee: [Experiments] Evaluation: the comparison to search engines is described only at the level of naming the engines; there is no account of how answers are extracted from the re-ranked passages, how ties or non-entailing passages are handled, or any statistical test of improvement.

Authors: The evaluation description is indeed limited. We will expand it to explain answer extraction from re-ranked passages, tie-breaking and non-entailment handling, and any statistical tests of improvement over the search-engine baselines. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system evaluation on external test set with no self-referential reductions.

full rationale

The paper describes an Arabic why-QA pipeline that retrieves passages via third-party search engines (Google, Yahoo, Ask.com) then re-ranks them with off-the-shelf textual entailment metrics. The central claim is an observed accuracy lift on a manually constructed test set. No equations, fitted parameters, or uniqueness theorems are presented; the performance numbers are direct measurements against an external baseline rather than quantities defined by the same data or by prior self-citations. The derivation chain therefore contains no self-definitional, fitted-input, or self-citation-load-bearing steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on the abstract alone, the central claim rests on the untested premise that entailment metrics will work for Arabic why-answer extraction; no free parameters, invented entities, or additional axioms are stated.

axioms (1)

domain assumption Textual entailment metrics can be used to rank candidate answers for why-questions in Arabic.
This premise is required for the re-ranking and answer selection step to succeed.

pith-pipeline@v0.9.0 · 5758 in / 1180 out tokens · 49841 ms · 2026-05-25T09:11:33.786925+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

What do You Mean? Finding Answers to Complex Questions

Diekema A., Yilmazel O., Chen J., Harwell S., Liddy E. and He L.," What do You Mean? Finding Answers to Complex Questions", In Maybury, M.T. (Ed.) New Directions in Question Answering. The MIT Press, pp. 141-152, 2004

work page 2004
[2]

Natural language QA: the view from here

Hirschman L. and Gaizauskas R.," Natural language QA: the view from here", Natural Language Engineering, vol. 7(4), pp.275-300, 2001

work page 2001
[3]

the TREC -8 QA Track Report,

Voorhees E., "the TREC -8 QA Track Report, "In Proceedings of the Eighth Text Retrieval Conference (TREC-8), 2000

work page 2000
[4]

Finding an answer based on the recognition of the issue focus

Ferret O., Grau B. and Huraults-Plantet M., Illouz G. Monceaux, L., Robba I., Vilnat A. “Finding an answer based on the recognition of the issue focus", In Proceedings ofTREC-10, 2001

work page 2001
[5]

and NègreCross S.,“Lingual QA using QRISTAL for CLEF 2006, Lecture Notes in Computer Science, Vol

Laurent D., Seguela P. and NègreCross S.,“Lingual QA using QRISTAL for CLEF 2006, Lecture Notes in Computer Science, Vol. 4730, pp. 339-350,2007

work page 2006
[6]

ASK website: http//:www.Ask.com- Last visited-April, 2015

work page 2015
[7]

A knowledge -based Arabic QA System (AQAS),

Mohammed F., Nasser K. and Harb H., “A knowledge -based Arabic QA System (AQAS),” In Proceedings of ACM SIGART Bulletin, pp. 21-33, 1993

work page 1993
[8]

QARAB: A QA System to Support the Arabic Language

Hammo B., Abu-Salem H. and Lytinen S., “QARAB: A QA System to Support the Arabic Language”. In Proceedings of the workshop on computational approaches to Semitic languages, pp. 55 -65, Philadelphia, 2002

work page 2002
[9]

Implementation of the ArabiQA QA System's components

Benajiba Y., Rosso P. and Lyhyaoui A. “Implementation of the ArabiQA QA System's components”, In Proceedings of Workshop on Arabic Natural Language Processing, 2nd Information Communication Technologies Int. Symposium, ICTIS, 2007

work page 2007
[10]

nooj website: http://www.nooj4nlp- Last visited-April, 2015

work page 2015
[11]

An Arabic Question- Answering system for factoid questions

Brini W., Ellouze M. Mesfar S. and Belguith L. “An Arabic Question- Answering system for factoid questions”,In Proceedings of IEEE International Conference on Natural Langu age Processing and Knowledge Engineering, 2009

work page 2009
[12]

ARQA High-Performance Arabic QA System

Badawy O., Shaheen M. and Hamadene A. “ARQA High-Performance Arabic QA System”, In Proceedings of Arabic Language Technology International Conference, pp. 129- 136, 2011

work page 2011
[13]

QArabPro: A Rule Based Question Answering System for Reading Comprehension Tests in Arabic

Akour M., Abufardeh S., Magel K. and Al -Radaideh Q.," QArabPro: A Rule Based Question Answering System for Reading Comprehension Tests in Arabic", American Journal of Applied Sciences, vol. 8 (6), pp. 652-661, 2011

work page 2011
[14]

Development of Yes/No ArabicQA System

Bdour W, Gharaibeh N., "Development of Yes/No ArabicQA System ", International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4(1), pp. 51-63, 2013

work page 2013
[15]

The Pascal recognizi ng textual entailment challenge

Dagan, I., Oren G., and Magnini B.," The Pascal recognizi ng textual entailment challenge", In Proceedings of the PASCAL Challenges Workshop on Recognizing Textual Entailment, 2005

work page 2005
[16]

Question Answering via Bayesian Inference on Lexical Relations

Ramakrishnan G., Jadhav A., Joshi A., Chakrabarti, S. and Bhattacharyya P.," Question Answering via Bayesian Inference on Lexical Relations” , In Proceeding of ACL Workshop Multilingual Summarization and Question Answering, pp. 1-10, 2003

work page 2003
[17]

Global WordNet website: http://globalwordnet.org/arabic -wordnet- Last visited-April, 2015

work page 2015
[18]

Effects of stop words elimination for arabic information retrieval: a comparative study

Abu-Elkhair I., "Effects of stop words elimination for arabic information retrieval: a comparative study", International Journal of Computing and Information Science, vol. 4(3), pp 119-133, 2006

work page 2006
[19]

Implementation of a new hybrid method for stemming of Arabic text

Dilekh T., and Behloul A., "Implementation of a new hybrid method for stemming of Arabic text", International Journal of Computer Applications, vol.46 (8), 2012

work page 2012
[20]

Lexical Cohesion and Entailment based Segmentation for Arabic Text Summarization (LCEAS),

AL -Khawaldeh F., Samawi V., "Lexical Cohesion and Entailment based Segmentation for Arabic Text Summarization (LCEAS)," The World of Computer Science and Information Technology Journal (WSCIT), Vol. 5(3), pp. 51, 60, 2015

work page 2015
[21]

Entailment–based Linear Segmentation in Summarization

Tatar D., Mihis A., and Lupsa D., "Entailment–based Linear Segmentation in Summarization", International Journal of Software Engineering and Knowledge Engineering vol. 19(80), pp. 1023–1038, 2009. 63.27 66.19 61.48 68.53 56 58 60 62 64 66 68 70THE ACCURACY THE SYSTEMS ASK GOOGLE YAHOO EWAQ

work page 2009

[1] [1]

What do You Mean? Finding Answers to Complex Questions

Diekema A., Yilmazel O., Chen J., Harwell S., Liddy E. and He L.," What do You Mean? Finding Answers to Complex Questions", In Maybury, M.T. (Ed.) New Directions in Question Answering. The MIT Press, pp. 141-152, 2004

work page 2004

[2] [2]

Natural language QA: the view from here

Hirschman L. and Gaizauskas R.," Natural language QA: the view from here", Natural Language Engineering, vol. 7(4), pp.275-300, 2001

work page 2001

[3] [3]

the TREC -8 QA Track Report,

Voorhees E., "the TREC -8 QA Track Report, "In Proceedings of the Eighth Text Retrieval Conference (TREC-8), 2000

work page 2000

[4] [4]

Finding an answer based on the recognition of the issue focus

Ferret O., Grau B. and Huraults-Plantet M., Illouz G. Monceaux, L., Robba I., Vilnat A. “Finding an answer based on the recognition of the issue focus", In Proceedings ofTREC-10, 2001

work page 2001

[5] [5]

and NègreCross S.,“Lingual QA using QRISTAL for CLEF 2006, Lecture Notes in Computer Science, Vol

Laurent D., Seguela P. and NègreCross S.,“Lingual QA using QRISTAL for CLEF 2006, Lecture Notes in Computer Science, Vol. 4730, pp. 339-350,2007

work page 2006

[6] [6]

ASK website: http//:www.Ask.com- Last visited-April, 2015

work page 2015

[7] [7]

A knowledge -based Arabic QA System (AQAS),

Mohammed F., Nasser K. and Harb H., “A knowledge -based Arabic QA System (AQAS),” In Proceedings of ACM SIGART Bulletin, pp. 21-33, 1993

work page 1993

[8] [8]

QARAB: A QA System to Support the Arabic Language

Hammo B., Abu-Salem H. and Lytinen S., “QARAB: A QA System to Support the Arabic Language”. In Proceedings of the workshop on computational approaches to Semitic languages, pp. 55 -65, Philadelphia, 2002

work page 2002

[9] [9]

Implementation of the ArabiQA QA System's components

Benajiba Y., Rosso P. and Lyhyaoui A. “Implementation of the ArabiQA QA System's components”, In Proceedings of Workshop on Arabic Natural Language Processing, 2nd Information Communication Technologies Int. Symposium, ICTIS, 2007

work page 2007

[10] [10]

nooj website: http://www.nooj4nlp- Last visited-April, 2015

work page 2015

[11] [11]

An Arabic Question- Answering system for factoid questions

Brini W., Ellouze M. Mesfar S. and Belguith L. “An Arabic Question- Answering system for factoid questions”,In Proceedings of IEEE International Conference on Natural Langu age Processing and Knowledge Engineering, 2009

work page 2009

[12] [12]

ARQA High-Performance Arabic QA System

Badawy O., Shaheen M. and Hamadene A. “ARQA High-Performance Arabic QA System”, In Proceedings of Arabic Language Technology International Conference, pp. 129- 136, 2011

work page 2011

[13] [13]

QArabPro: A Rule Based Question Answering System for Reading Comprehension Tests in Arabic

Akour M., Abufardeh S., Magel K. and Al -Radaideh Q.," QArabPro: A Rule Based Question Answering System for Reading Comprehension Tests in Arabic", American Journal of Applied Sciences, vol. 8 (6), pp. 652-661, 2011

work page 2011

[14] [14]

Development of Yes/No ArabicQA System

Bdour W, Gharaibeh N., "Development of Yes/No ArabicQA System ", International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4(1), pp. 51-63, 2013

work page 2013

[15] [15]

The Pascal recognizi ng textual entailment challenge

Dagan, I., Oren G., and Magnini B.," The Pascal recognizi ng textual entailment challenge", In Proceedings of the PASCAL Challenges Workshop on Recognizing Textual Entailment, 2005

work page 2005

[16] [16]

Question Answering via Bayesian Inference on Lexical Relations

Ramakrishnan G., Jadhav A., Joshi A., Chakrabarti, S. and Bhattacharyya P.," Question Answering via Bayesian Inference on Lexical Relations” , In Proceeding of ACL Workshop Multilingual Summarization and Question Answering, pp. 1-10, 2003

work page 2003

[17] [17]

Global WordNet website: http://globalwordnet.org/arabic -wordnet- Last visited-April, 2015

work page 2015

[18] [18]

Effects of stop words elimination for arabic information retrieval: a comparative study

Abu-Elkhair I., "Effects of stop words elimination for arabic information retrieval: a comparative study", International Journal of Computing and Information Science, vol. 4(3), pp 119-133, 2006

work page 2006

[19] [19]

Implementation of a new hybrid method for stemming of Arabic text

Dilekh T., and Behloul A., "Implementation of a new hybrid method for stemming of Arabic text", International Journal of Computer Applications, vol.46 (8), 2012

work page 2012

[20] [20]

Lexical Cohesion and Entailment based Segmentation for Arabic Text Summarization (LCEAS),

AL -Khawaldeh F., Samawi V., "Lexical Cohesion and Entailment based Segmentation for Arabic Text Summarization (LCEAS)," The World of Computer Science and Information Technology Journal (WSCIT), Vol. 5(3), pp. 51, 60, 2015

work page 2015

[21] [21]

Entailment–based Linear Segmentation in Summarization

Tatar D., Mihis A., and Lupsa D., "Entailment–based Linear Segmentation in Summarization", International Journal of Software Engineering and Knowledge Engineering vol. 19(80), pp. 1023–1038, 2009. 63.27 66.19 61.48 68.53 56 58 60 62 64 66 68 70THE ACCURACY THE SYSTEMS ASK GOOGLE YAHOO EWAQ

work page 2009