Combining Q&A Pair Quality and Question Relevance Features on Community-based Question Retrieval

Dong Li; Lin Li

arxiv: 1907.02031 · v1 · pith:3QED3XERnew · submitted 2019-07-03 · 💻 cs.IR · cs.AI· cs.CL

Combining Q&A Pair Quality and Question Relevance Features on Community-based Question Retrieval

Dong Li , Lin Li This is my paper

Pith reviewed 2026-05-25 09:34 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CL

keywords question retrievalcommunity QAtopic translation modelconvolutional neural networksMAPterm weightingQ&A quality featuresrelevance features

0 comments

The pith

T2LM+ term weighting and a CNN model raise MAP in community question retrieval by 4.91% and 6.31% over advanced baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing translation models for retrieving similar questions in online communities fail to account for query-specific semantics when weighting terms. The paper extends the topic translation model into T2LM+ by adding features for the quality of question-answer pairs and for question relevance. It also introduces a separate convolutional neural network approach to the same retrieval task. Experiments on community datasets show both methods deliver measurable gains in mean average precision.

Core claim

The authors establish that a term-weighting model called T2LM+ which augments the traditional topic translation model with Q&A pair quality characteristics and question relevance, together with a convolutional neural network retrieval method, each produce higher MAP than relatively advanced prior methods, with reported gains of 4.91% and 6.31%.

What carries the argument

T2LM+ term weighting model that fuses Q&A pair quality features and question relevance into the topic translation framework, plus a convolutional neural network architecture for direct question retrieval.

If this is right

Retrieval systems in Q&A communities can achieve higher precision by incorporating pair quality signals into term weighting.
The CNN architecture offers an alternative route to the same task that also exceeds prior MAP levels.
Query-specific semantics become usable inside translation models once quality and relevance features are added.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the gains hold across more domains, search engines for forums and knowledge bases could adopt similar quality-aware weighting as a default step.
The two methods might be combined or used to rerank each other's outputs for further accuracy.
The approach leaves open whether the same features would help in related tasks such as answer ranking or duplicate detection.

Load-bearing premise

The chosen evaluation datasets and comparison baselines fairly represent typical community question retrieval performance without selection effects that would inflate the reported gains.

What would settle it

Re-evaluating both proposed methods on a fresh, independent community Q&A collection and observing no MAP improvement or a reversal of the gains would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 1907.02031 by Dong Li, Lin Li.

**Figure 1.** Figure 1: A search example of T2LM II. RELATED WORK In response to the shortcomings of the word-based exact matching question relevance model, the researchers introduced the statistical machine translation model [17] into the field of information retrieval, and used the translation [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Fusion Q&A on the quality of sorting learning model framework Learning sorting is a supervised machine learning method that can easily fuse multiple features with fewer artificial parameters. From the current research methods, there are three strategies for learning sorting, namely pointwise, pairwise and listwise. The pointwise method converts the sorting problem into a multi-class classification or regre… view at source ↗

**Figure 3.** Figure 3: Q&A based on user information for quality assessment algorithm [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Community-based question retrieval method based on fusion question and answer on quality and question relevance IV. EXPERIMENTAL RESULTS AND ANALYSIS A. Experimental data We used the data set from NDBC CUP 2016 as experimental data. There are 578608 questions and 1,729,263 answers in the data set. Since there is no correlation mark between the query and the candidate in the dataset, and there is currently … view at source ↗

**Figure 5.** Figure 5: TextCNN-Attention model framework C. Experimental results and analysis s TABLE I. COMPARISON OF T2LM+ WITH EXISTING ADVANCED MODELS VSM BM25 LM TLM IBLM T2LM T2LM+ MAP 0.3475 0.3506 0.3583 0.3746 0.3916 0.4361 0.4695 VSM N/A +0.31 +1.08 +2.71 +4.41 +8.86 +12.20 BM25 N/A N/A +0.77 +2.40 +4.10 +8.55 +11.89 LM N/A N/A N/A +1.63 +3.33 +7.78 +11.12 TLM N/A N/A N/A N/A +1.70 +6.15 +9.49 IBLM N/A N/A N/A N/A N/A … view at source ↗

read the original abstract

The Q&A community has become an important way for people to access knowledge and information from the Internet. However, the existing translation based on models does not consider the query specific semantics when assigning weights to query terms in question retrieval. So we improve the term weighting model based on the traditional topic translation model and further considering the quality characteristics of question and answer pairs, this paper proposes a communitybased question retrieval method that combines question and answer on quality and question relevance (T2LM+). We have also proposed a question retrieval method based on convolutional neural networks. The results show that Compared with the relatively advanced methods, the two methods proposed in this paper increase MAP by 4.91% and 6.31%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims 5-6% MAP gains from quality features on translation models plus a CNN for community QA retrieval, but supplies no baselines, datasets or stats so the numbers cannot be checked.

read the letter

The paper extends a topic translation model by folding in Q&A pair quality and question relevance for term weighting, and it also offers a CNN-based retrieval method. It reports MAP lifts of 4.91% and 6.31% over relatively advanced methods on community question retrieval tasks. That combination of quality signals with the translation approach is the concrete step they take beyond the models they cite. The CNN part is presented as a second, separate route. Both are straightforward applications of existing IR tools rather than a new framework. The evaluation section is the clear gap. The abstract states the percentage improvements but names neither the baselines nor the dataset, and it gives no statistical tests or collection statistics. Without those, it is impossible to tell whether the deltas come from stronger features or from weaker comparison points and data that happen to reward the added signals. The stress-test concern about unverified baselines and representativeness holds up on the text we have. This kind of incremental feature work is aimed at researchers already focused on community question-answering platforms. Someone in that niche might pick up the quality-feature idea for their own experiments, but the missing experimental details make the claims hard to use or build on. I would not send the paper to peer review in this form; the central numerical claims rest on information that is simply not supplied.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes two methods for community-based question retrieval: T2LM+, which extends a topic translation model by incorporating Q&A pair quality characteristics and question relevance into term weighting, and a convolutional neural network approach. It claims that these methods improve MAP by 4.91% and 6.31% respectively over relatively advanced methods.

Significance. If the reported gains are shown to hold against strong, contemporaneous baselines on standard CQA datasets with appropriate statistical testing, the work would offer a concrete demonstration that quality and relevance features can usefully augment both translation-based and neural retrieval models in community Q&A settings.

major comments (1)

[Abstract] Abstract: the central claim consists of specific MAP improvements (4.91% and 6.31%) yet supplies no dataset identifiers, baseline names, experimental protocol, or statistical significance tests. Without these elements the numerical gains cannot be evaluated and the headline result remains unassessable.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment. We agree that the abstract requires additional context to allow proper evaluation of the reported results and will revise it accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim consists of specific MAP improvements (4.91% and 6.31%) yet supplies no dataset identifiers, baseline names, experimental protocol, or statistical significance tests. Without these elements the numerical gains cannot be evaluated and the headline result remains unassessable.

Authors: We agree with this observation. The current abstract is too terse and does not identify the datasets, name the baselines, outline the protocol, or reference significance testing. The body of the manuscript contains these details (standard CQA collections, the specific advanced baselines compared, the train/test splits, and the statistical tests performed). We will expand the abstract to include concise references to the datasets, the baseline methods, the evaluation protocol, and the fact that improvements were assessed for statistical significance. revision: yes

Circularity Check

0 steps flagged

No derivation chain; paper reports empirical MAP gains only

full rationale

The provided abstract and description contain no equations, first-principles derivations, or load-bearing self-citations. The paper proposes two retrieval methods (T2LM+ and a CNN variant) and states empirical MAP improvements of 4.91% and 6.31% over unspecified advanced baselines. Because no mathematical derivation or parameter-fitting step is claimed, none of the enumerated circularity patterns can be exhibited by quote. The central claim is therefore an empirical result whose validity rests on external dataset and baseline choices rather than on any internal reduction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on unstated assumptions about feature quality measurement and evaluation fairness.

pith-pipeline@v0.9.0 · 5643 in / 997 out tokens · 23738 ms · 2026-05-25T09:34:37.784879+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

[1]

A vector s-pace model for automatic ind exing[J]

Salton G, Wong A, Yang C S. A vector s-pace model for automatic ind exing[J]. Co-mmunications of the ACM, 1975, 18(11):613-620

work page 1975
[2]

Robertson S E, Walker S.Okapi/Keenbowa-tTREC-8.[J].TREC,1999,8: 151-162

work page 1999
[3]

A general language model for information retrieva l[C]//Proceedi-ngs of the eighth international conference on Informatio n and knowledge management.ACM,1999: 316-321

Song F, Croft W B. A general language model for information retrieva l[C]//Proceedi-ngs of the eighth international conference on Informatio n and knowledge management.ACM,1999: 316-321

work page 1999
[4]

Ponte J M, Croft W B. A language modeli-ng approach to information r etrieval[C]//Pr-oceedings of the 21st annual international A-CM SIGIR conference on Research and de-velopment in information retrieval.AC M,1998:275-281

work page 1998
[6]

Research on Ranking Method in Community Question an d Answer Search [D]

Haocheng W. Research on Ranking Method in Community Question an d Answer Search [D]. Anhui: University of Science and Technologyof China,2017

work page 2017
[7]

A topic translation model for community-bas ed question retrieval [J]

Weinan Z, Yu Z, Ting L. A topic translation model for community-bas ed question retrieval [J]. Journal of Computer,2015, 38(2):313-321

work page 2015
[8]

Y. Wang, X. Lin, L. Wu, et al, Robust subspace clustering for multi-vie w data by exploiting correlation consensus. IEEE Transactions on Imag e Processing, 24(11):3939-3949, 2015

work page 2015
[9]

Y. Wang, L. Wu, X. Lin, J. Gao. Multiview Spectral Clustering via Str uctured Low-Rank Matrix Factorization. IEEE Transactions on Neural Networks and Learning Systems 29 (10), 4833-4843, 2018

work page 2018
[10]

Y. Wang, W. Zhang, L. Wu et al., Iterative Views Agreement: An Itera tive Low-Rank based Structured Optimization Method to Multi-View S pectral Clustering. IJCAI 2016

work page 2016
[11]

L. Wu, Y. Wang. Beyond Low-Rank Representations: Orthogonal Clus tering Basis Reconstruction with Optimized Graph Structure for Multi- view Spectral Clustering. Neural Networks, 103:1-8, 2018

work page 2018
[12]

Y. Wang, X. Lin, L. Wu, W. Zhang. Effective Multi -Query Expansion s: Collaborative Deep Networks for Robust Landmark Retrieval. IEEE Transactions on Image Processing 26 (3), 1393-1404,

work page
[13]

L. Wu, Y. Wang, X. Li, J. Gao. Deep Attention-based Spatially Recursi ve Networks for Fine-Grained Visual Recognition. IEEE Transactions on Cybernetics 49 (5), 1791-1802, 2019

work page 2019
[14]

L. Wu, Y. Wang, L. Shao. Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval. IEEE Transactions on Image Processing 28 (4), 1602-1612, 2019

work page 2019
[15]

Y. Wang, X. Lin, L. Wu et al., LBMCH: Learning Bridging Mapping f or Cross-modal Hashing. ACM SIGIR 2015

work page 2015
[16]

Learning To Rank

huagong_adu. Learning To Rank. https://blog.csdn.net/huagong_adu/ar ticle/details/40710305

work page arXiv
[17]

The mathematics of statis tical machine translation: parameter estimation[J].Computational Lingu istics, 1993,19(2):263-311

Brown P F, Pietra V J D, Pietra S A D, et al. The mathematics of statis tical machine translation: parameter estimation[J].Computational Lingu istics, 1993,19(2):263-311

work page 1993
[18]

Ponte J M, Croft W B. A language modeling approach to information r etrieval[C]//Proceedings of the 21st annual international ACM SIGIR c onference on Research and development in information retrieval.ACM, 1998:275-281

work page 1998
[19]

Xue X, Jeon J, Croft W B. Retrieval models for question and answer ar chives[C]//Proceedings of the 31st annual international ACM SIGIR co nference on Research and development in information retrieval.ACM,2 008:475-482

work page
[20]

Lee J T, Kim S B, Song Y I, et al. Bridging lexical gaps between querie s and questions on large online Q&A collections with compact translati on models[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguisti cs,2008:410-418

work page 2008
[21]

Bernhard D, Gurevych I. Combining lexical semantic resources with qu estion & answer archives for translation-based answer finding[C]//Proc eedings of the Joint Conference of the 47th Annual Meeting of the AC L and the 4th International Joint Conference on Natural Language Proc essing of the AFNLP: Volume 2-Volume 2. Association for Computati onal Linguis...

work page 2009
[22]

Dependence language model for informati on retrieval[C]

Gao J, Nie J Y, Wu G, et al. Dependence language model for informati on retrieval[C]. Proceedings of the 27th annual international ACM SIG IR conference on Research and development in information retrieval.20 04:170–177

work page
[23]

Approaches to exploring category informa tion for question retrieval in community question-answer archives[J]

Cao X, Cong G, Cui B, et al. Approaches to exploring category informa tion for question retrieval in community question-answer archives[J]. A CM Transactions on Information Systems,2012,30(2):7

work page 2012

[1] [1]

A vector s-pace model for automatic ind exing[J]

Salton G, Wong A, Yang C S. A vector s-pace model for automatic ind exing[J]. Co-mmunications of the ACM, 1975, 18(11):613-620

work page 1975

[2] [2]

Robertson S E, Walker S.Okapi/Keenbowa-tTREC-8.[J].TREC,1999,8: 151-162

work page 1999

[3] [3]

A general language model for information retrieva l[C]//Proceedi-ngs of the eighth international conference on Informatio n and knowledge management.ACM,1999: 316-321

Song F, Croft W B. A general language model for information retrieva l[C]//Proceedi-ngs of the eighth international conference on Informatio n and knowledge management.ACM,1999: 316-321

work page 1999

[4] [4]

Ponte J M, Croft W B. A language modeli-ng approach to information r etrieval[C]//Pr-oceedings of the 21st annual international A-CM SIGIR conference on Research and de-velopment in information retrieval.AC M,1998:275-281

work page 1998

[5] [6]

Research on Ranking Method in Community Question an d Answer Search [D]

Haocheng W. Research on Ranking Method in Community Question an d Answer Search [D]. Anhui: University of Science and Technologyof China,2017

work page 2017

[6] [7]

A topic translation model for community-bas ed question retrieval [J]

Weinan Z, Yu Z, Ting L. A topic translation model for community-bas ed question retrieval [J]. Journal of Computer,2015, 38(2):313-321

work page 2015

[7] [8]

Y. Wang, X. Lin, L. Wu, et al, Robust subspace clustering for multi-vie w data by exploiting correlation consensus. IEEE Transactions on Imag e Processing, 24(11):3939-3949, 2015

work page 2015

[8] [9]

Y. Wang, L. Wu, X. Lin, J. Gao. Multiview Spectral Clustering via Str uctured Low-Rank Matrix Factorization. IEEE Transactions on Neural Networks and Learning Systems 29 (10), 4833-4843, 2018

work page 2018

[9] [10]

Y. Wang, W. Zhang, L. Wu et al., Iterative Views Agreement: An Itera tive Low-Rank based Structured Optimization Method to Multi-View S pectral Clustering. IJCAI 2016

work page 2016

[10] [11]

L. Wu, Y. Wang. Beyond Low-Rank Representations: Orthogonal Clus tering Basis Reconstruction with Optimized Graph Structure for Multi- view Spectral Clustering. Neural Networks, 103:1-8, 2018

work page 2018

[11] [12]

Y. Wang, X. Lin, L. Wu, W. Zhang. Effective Multi -Query Expansion s: Collaborative Deep Networks for Robust Landmark Retrieval. IEEE Transactions on Image Processing 26 (3), 1393-1404,

work page

[12] [13]

L. Wu, Y. Wang, X. Li, J. Gao. Deep Attention-based Spatially Recursi ve Networks for Fine-Grained Visual Recognition. IEEE Transactions on Cybernetics 49 (5), 1791-1802, 2019

work page 2019

[13] [14]

L. Wu, Y. Wang, L. Shao. Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval. IEEE Transactions on Image Processing 28 (4), 1602-1612, 2019

work page 2019

[14] [15]

Y. Wang, X. Lin, L. Wu et al., LBMCH: Learning Bridging Mapping f or Cross-modal Hashing. ACM SIGIR 2015

work page 2015

[15] [16]

Learning To Rank

huagong_adu. Learning To Rank. https://blog.csdn.net/huagong_adu/ar ticle/details/40710305

work page arXiv

[16] [17]

The mathematics of statis tical machine translation: parameter estimation[J].Computational Lingu istics, 1993,19(2):263-311

Brown P F, Pietra V J D, Pietra S A D, et al. The mathematics of statis tical machine translation: parameter estimation[J].Computational Lingu istics, 1993,19(2):263-311

work page 1993

[17] [18]

Ponte J M, Croft W B. A language modeling approach to information r etrieval[C]//Proceedings of the 21st annual international ACM SIGIR c onference on Research and development in information retrieval.ACM, 1998:275-281

work page 1998

[18] [19]

Xue X, Jeon J, Croft W B. Retrieval models for question and answer ar chives[C]//Proceedings of the 31st annual international ACM SIGIR co nference on Research and development in information retrieval.ACM,2 008:475-482

work page

[19] [20]

Lee J T, Kim S B, Song Y I, et al. Bridging lexical gaps between querie s and questions on large online Q&A collections with compact translati on models[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguisti cs,2008:410-418

work page 2008

[20] [21]

Bernhard D, Gurevych I. Combining lexical semantic resources with qu estion & answer archives for translation-based answer finding[C]//Proc eedings of the Joint Conference of the 47th Annual Meeting of the AC L and the 4th International Joint Conference on Natural Language Proc essing of the AFNLP: Volume 2-Volume 2. Association for Computati onal Linguis...

work page 2009

[21] [22]

Dependence language model for informati on retrieval[C]

Gao J, Nie J Y, Wu G, et al. Dependence language model for informati on retrieval[C]. Proceedings of the 27th annual international ACM SIG IR conference on Research and development in information retrieval.20 04:170–177

work page

[22] [23]

Approaches to exploring category informa tion for question retrieval in community question-answer archives[J]

Cao X, Cong G, Cui B, et al. Approaches to exploring category informa tion for question retrieval in community question-answer archives[J]. A CM Transactions on Information Systems,2012,30(2):7

work page 2012