Natural Language Understanding with the Quora Question Pairs Dataset

Lakshay Sharma , Laura Graesser , Nikita Nangia , Utku Evci

Authors on Pith no claims yet

classification 💻 cs.CL cs.LG

keywords datasetmodelsconductedlanguagenaturalquestionquoraunderstanding

read the original abstract

This paper explores the task Natural Language Understanding (NLU) by looking at duplicate question detection in the Quora dataset. We conducted extensive exploration of the dataset and used various machine learning models, including linear and tree-based models. Our final finding was that a simple Continuous Bag of Words neural network model had the best performance, outdoing more complicated recurrent and attention based models. We also conducted error analysis and found some subjectivity in the labeling of the dataset.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Improving Parameter-Efficient Federated Learning with Differentially Private Refactorization
cs.CR 2026-05 unverdicted novelty 6.0

FedPower improves the accuracy-privacy tradeoff in differentially private LoRA-based federated learning by reconstructing and clipping full-rank updates then using PowerDP to inject noise before orthonormalization in ...
Semantics-Aware Hierarchical Token Communication: Clustering, Bit Mapping, and Power Allocation
eess.SP 2026-04 unverdicted novelty 6.0

H-TokCom groups tokens by semantic similarity and protects cluster-level bits with higher power, raising semantic similarity from 0.206 to 0.279 at 3 dB SNR on COCO data.