pith. machine review for the scientific record. sign in

arxiv: 1907.01041 · v1 · submitted 2019-07-01 · 💻 cs.CL · cs.LG

Recognition: unknown

Natural Language Understanding with the Quora Question Pairs Dataset

Authors on Pith no claims yet
classification 💻 cs.CL cs.LG
keywords datasetmodelsconductedlanguagenaturalquestionquoraunderstanding
0
0 comments X
read the original abstract

This paper explores the task Natural Language Understanding (NLU) by looking at duplicate question detection in the Quora dataset. We conducted extensive exploration of the dataset and used various machine learning models, including linear and tree-based models. Our final finding was that a simple Continuous Bag of Words neural network model had the best performance, outdoing more complicated recurrent and attention based models. We also conducted error analysis and found some subjectivity in the labeling of the dataset.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Improving Parameter-Efficient Federated Learning with Differentially Private Refactorization

    cs.CR 2026-05 unverdicted novelty 6.0

    FedPower improves the accuracy-privacy tradeoff in differentially private LoRA-based federated learning by reconstructing and clipping full-rank updates then using PowerDP to inject noise before orthonormalization in ...

  2. Semantics-Aware Hierarchical Token Communication: Clustering, Bit Mapping, and Power Allocation

    eess.SP 2026-04 unverdicted novelty 6.0

    H-TokCom groups tokens by semantic similarity and protects cluster-level bits with higher power, raising semantic similarity from 0.206 to 0.279 at 3 dB SNR on COCO data.