Answer Extraction for Why Arabic Questions Answering Systems: EWAQ
Pith reviewed 2026-05-25 09:11 UTC · model grok-4.3
The pith
Textual entailment metrics improve accuracy in extracting answers to Arabic why-questions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The EWAQ system extracts the answer only to why questions by scoring each answer with entailment metrics and ranking them according to their scores. When compared with search engines like yahoo, google and ask.com using a manual test set, EWAQ shows increased accuracy by implementing the textual entailment in re-ranking the retrieved relevant passages and deciding the correct answer.
What carries the argument
Entailment metrics for re-ranking retrieved passages and selecting the highest-scoring answer for why-questions.
If this is right
- The accuracy of answer extraction for why-questions in Arabic increases with the use of entailment-based similarity.
- EWAQ outperforms established web-based QA systems like Yahoo, Google, and Ask.com on manual test data.
- Textual entailment can be used to tackle the answer extraction module in Arabic language QA systems.
Where Pith is reading between the lines
- The same entailment re-ranking step might be tested on other question types such as how or what in Arabic.
- Similar scoring could be applied to non-English languages that lack specialized QA resources.
- Placing entailment scoring earlier in the pipeline before full passage retrieval might change the accuracy gains.
Load-bearing premise
Textual entailment metrics applied to passages from general search engines can reliably identify the correct answers for why-questions in Arabic.
What would settle it
Running EWAQ and the search engines on a new independent set of Arabic why-questions and finding no accuracy improvement over the baselines would falsify the central claim.
Figures
read the original abstract
With the increasing amount of web information, questions answering systems becomes very important to allow users to access to direct answers for their requests. This paper presents an Arabic Questions Answering Systems based on entailment metrics. The type of questions which this paper focuses on is why questions. There are many reasons lead us to develop this system: generally, the lack of Arabic Questions Answering Systems and scarcity Arabic Questions Answering Systems which focus on why questions. The goal of the proposed system in this research is to extract answers from re-ranked retrieved passages which are retrieved by search engines. This system extracts the answer only to why questions. This system is called by EWAQ: Entailment based Why Arabic Questions Answering. Each answer is scored with entailment metrics and ranked according to their scores in order to determine the most possible correct answer. EWAQ is compared with search engines: yahoo, google and ask.com, the well-established web-based Questions Answering systems, using manual test set. In EWAQ experiments, it is showed that the accuracy is increased by implementing the textual entailment in re-raking the retrieved relevant passages by search engines and deciding the correct answer. The obtained results show that using entailment based similarity can help significantly to tackle the why Answer Extraction module in Arabic language.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents EWAQ, an Arabic QA system focused on why-questions. It retrieves candidate passages via general web search engines (Google, Yahoo, Ask.com), then applies textual entailment metrics to re-rank the passages and extract the highest-scoring answer; the abstract claims this yields higher accuracy than the search engines alone on a manual test set.
Significance. If the performance claims were substantiated with quantitative results, the work would address a documented gap in Arabic QA resources for explanatory questions by showing that off-the-shelf entailment can improve answer extraction over raw retrieval. The approach is simple and leverages existing tools, which could be useful if the Arabic-specific issues are handled.
major comments (3)
- [Abstract] Abstract: the central claim that 'the accuracy is increased by implementing the textual entailment' and that entailment 'can help significantly' is unsupported; no accuracy numbers, no test-set size or construction details, no baseline scores, and no error analysis are supplied anywhere in the manuscript.
- [Method] Method / Experiments: no description is given of the textual entailment model or metric applied to Arabic text, any Arabic-specific preprocessing or adaptation, or the precise scoring formula used to rank passages; without these the re-ranking claim cannot be reproduced or evaluated.
- [Experiments] Evaluation: the comparison to search engines is described only at the level of naming the engines; there is no account of how answers are extracted from the re-ranked passages, how ties or non-entailing passages are handled, or any statistical test of improvement.
minor comments (3)
- [Abstract] Abstract contains multiple grammatical and phrasing errors ('questions answering systems becomes', 'scarcity Arabic Questions Answering Systems', 'access to direct answers for their requests').
- [Abstract] The system acronym expansion 'Entailment based Why Arabic Questions Answering' is inconsistent with the title phrasing.
- [Introduction] No references to prior Arabic QA or RTE work are mentioned in the provided text, leaving the novelty claim unanchored.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We agree that the submitted manuscript lacks sufficient detail to support its claims and will revise accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'the accuracy is increased by implementing the textual entailment' and that entailment 'can help significantly' is unsupported; no accuracy numbers, no test-set size or construction details, no baseline scores, and no error analysis are supplied anywhere in the manuscript.
Authors: The referee correctly notes that the abstract and body provide no numerical results, test-set details, baselines or error analysis. We will revise the abstract and add a full Experiments section reporting accuracy figures, test-set size and construction, direct baseline comparisons, and error analysis. revision: yes
-
Referee: [Method] Method / Experiments: no description is given of the textual entailment model or metric applied to Arabic text, any Arabic-specific preprocessing or adaptation, or the precise scoring formula used to rank passages; without these the re-ranking claim cannot be reproduced or evaluated.
Authors: We acknowledge the method section omits the entailment model, Arabic preprocessing steps and exact scoring formula. The revised manuscript will supply these details, including the metric employed, any language-specific adaptations, and the passage-ranking formula. revision: yes
-
Referee: [Experiments] Evaluation: the comparison to search engines is described only at the level of naming the engines; there is no account of how answers are extracted from the re-ranked passages, how ties or non-entailing passages are handled, or any statistical test of improvement.
Authors: The evaluation description is indeed limited. We will expand it to explain answer extraction from re-ranked passages, tie-breaking and non-entailment handling, and any statistical tests of improvement over the search-engine baselines. revision: yes
Circularity Check
No circularity: empirical system evaluation on external test set with no self-referential reductions.
full rationale
The paper describes an Arabic why-QA pipeline that retrieves passages via third-party search engines (Google, Yahoo, Ask.com) then re-ranks them with off-the-shelf textual entailment metrics. The central claim is an observed accuracy lift on a manually constructed test set. No equations, fitted parameters, or uniqueness theorems are presented; the performance numbers are direct measurements against an external baseline rather than quantities defined by the same data or by prior self-citations. The derivation chain therefore contains no self-definitional, fitted-input, or self-citation-load-bearing steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Textual entailment metrics can be used to rank candidate answers for why-questions in Arabic.
Reference graph
Works this paper leans on
-
[1]
What do You Mean? Finding Answers to Complex Questions
Diekema A., Yilmazel O., Chen J., Harwell S., Liddy E. and He L.," What do You Mean? Finding Answers to Complex Questions", In Maybury, M.T. (Ed.) New Directions in Question Answering. The MIT Press, pp. 141-152, 2004
work page 2004
-
[2]
Natural language QA: the view from here
Hirschman L. and Gaizauskas R.," Natural language QA: the view from here", Natural Language Engineering, vol. 7(4), pp.275-300, 2001
work page 2001
-
[3]
Voorhees E., "the TREC -8 QA Track Report, "In Proceedings of the Eighth Text Retrieval Conference (TREC-8), 2000
work page 2000
-
[4]
Finding an answer based on the recognition of the issue focus
Ferret O., Grau B. and Huraults-Plantet M., Illouz G. Monceaux, L., Robba I., Vilnat A. “Finding an answer based on the recognition of the issue focus", In Proceedings ofTREC-10, 2001
work page 2001
-
[5]
and NègreCross S.,“Lingual QA using QRISTAL for CLEF 2006, Lecture Notes in Computer Science, Vol
Laurent D., Seguela P. and NègreCross S.,“Lingual QA using QRISTAL for CLEF 2006, Lecture Notes in Computer Science, Vol. 4730, pp. 339-350,2007
work page 2006
-
[6]
ASK website: http//:www.Ask.com- Last visited-April, 2015
work page 2015
-
[7]
A knowledge -based Arabic QA System (AQAS),
Mohammed F., Nasser K. and Harb H., “A knowledge -based Arabic QA System (AQAS),” In Proceedings of ACM SIGART Bulletin, pp. 21-33, 1993
work page 1993
-
[8]
QARAB: A QA System to Support the Arabic Language
Hammo B., Abu-Salem H. and Lytinen S., “QARAB: A QA System to Support the Arabic Language”. In Proceedings of the workshop on computational approaches to Semitic languages, pp. 55 -65, Philadelphia, 2002
work page 2002
-
[9]
Implementation of the ArabiQA QA System's components
Benajiba Y., Rosso P. and Lyhyaoui A. “Implementation of the ArabiQA QA System's components”, In Proceedings of Workshop on Arabic Natural Language Processing, 2nd Information Communication Technologies Int. Symposium, ICTIS, 2007
work page 2007
-
[10]
nooj website: http://www.nooj4nlp- Last visited-April, 2015
work page 2015
-
[11]
An Arabic Question- Answering system for factoid questions
Brini W., Ellouze M. Mesfar S. and Belguith L. “An Arabic Question- Answering system for factoid questions”,In Proceedings of IEEE International Conference on Natural Langu age Processing and Knowledge Engineering, 2009
work page 2009
-
[12]
ARQA High-Performance Arabic QA System
Badawy O., Shaheen M. and Hamadene A. “ARQA High-Performance Arabic QA System”, In Proceedings of Arabic Language Technology International Conference, pp. 129- 136, 2011
work page 2011
-
[13]
QArabPro: A Rule Based Question Answering System for Reading Comprehension Tests in Arabic
Akour M., Abufardeh S., Magel K. and Al -Radaideh Q.," QArabPro: A Rule Based Question Answering System for Reading Comprehension Tests in Arabic", American Journal of Applied Sciences, vol. 8 (6), pp. 652-661, 2011
work page 2011
-
[14]
Development of Yes/No ArabicQA System
Bdour W, Gharaibeh N., "Development of Yes/No ArabicQA System ", International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4(1), pp. 51-63, 2013
work page 2013
-
[15]
The Pascal recognizi ng textual entailment challenge
Dagan, I., Oren G., and Magnini B.," The Pascal recognizi ng textual entailment challenge", In Proceedings of the PASCAL Challenges Workshop on Recognizing Textual Entailment, 2005
work page 2005
-
[16]
Question Answering via Bayesian Inference on Lexical Relations
Ramakrishnan G., Jadhav A., Joshi A., Chakrabarti, S. and Bhattacharyya P.," Question Answering via Bayesian Inference on Lexical Relations” , In Proceeding of ACL Workshop Multilingual Summarization and Question Answering, pp. 1-10, 2003
work page 2003
-
[17]
Global WordNet website: http://globalwordnet.org/arabic -wordnet- Last visited-April, 2015
work page 2015
-
[18]
Effects of stop words elimination for arabic information retrieval: a comparative study
Abu-Elkhair I., "Effects of stop words elimination for arabic information retrieval: a comparative study", International Journal of Computing and Information Science, vol. 4(3), pp 119-133, 2006
work page 2006
-
[19]
Implementation of a new hybrid method for stemming of Arabic text
Dilekh T., and Behloul A., "Implementation of a new hybrid method for stemming of Arabic text", International Journal of Computer Applications, vol.46 (8), 2012
work page 2012
-
[20]
Lexical Cohesion and Entailment based Segmentation for Arabic Text Summarization (LCEAS),
AL -Khawaldeh F., Samawi V., "Lexical Cohesion and Entailment based Segmentation for Arabic Text Summarization (LCEAS)," The World of Computer Science and Information Technology Journal (WSCIT), Vol. 5(3), pp. 51, 60, 2015
work page 2015
-
[21]
Entailment–based Linear Segmentation in Summarization
Tatar D., Mihis A., and Lupsa D., "Entailment–based Linear Segmentation in Summarization", International Journal of Software Engineering and Knowledge Engineering vol. 19(80), pp. 1023–1038, 2009. 63.27 66.19 61.48 68.53 56 58 60 62 64 66 68 70THE ACCURACY THE SYSTEMS ASK GOOGLE YAHOO EWAQ
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.