Natural language un- derstanding with the Quora question pairs dataset

· 2019 · cs.CL · arXiv 1907.01041

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

This paper explores the task Natural Language Understanding (NLU) by looking at duplicate question detection in the Quora dataset. We conducted extensive exploration of the dataset and used various machine learning models, including linear and tree-based models. Our final finding was that a simple Continuous Bag of Words neural network model had the best performance, outdoing more complicated recurrent and attention based models. We also conducted error analysis and found some subjectivity in the labeling of the dataset.

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

Improving Parameter-Efficient Federated Learning with Differentially Private Refactorization

cs.CR · 2026-05-08 · unverdicted · novelty 6.0

FedPower improves the accuracy-privacy tradeoff in differentially private LoRA-based federated learning by reconstructing and clipping full-rank updates then using PowerDP to inject noise before orthonormalization in low-rank factorization.

Semantics-Aware Hierarchical Token Communication: Clustering, Bit Mapping, and Power Allocation

eess.SP · 2026-04-30 · unverdicted · novelty 6.0

H-TokCom groups tokens by semantic similarity and protects cluster-level bits with higher power, raising semantic similarity from 0.206 to 0.279 at 3 dB SNR on COCO data.

Convex Dataset Valuation for Post-Training

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

A convex KMM-based valuation method that accounts for both target-task alignment and inter-dataset redundancy in gradient space outperforms standard gradient-alignment baselines for LLM post-training data selection.

citing papers explorer

Showing 3 of 3 citing papers.

Improving Parameter-Efficient Federated Learning with Differentially Private Refactorization cs.CR · 2026-05-08 · unverdicted · none · ref 18
FedPower improves the accuracy-privacy tradeoff in differentially private LoRA-based federated learning by reconstructing and clipping full-rank updates then using PowerDP to inject noise before orthonormalization in low-rank factorization.
Semantics-Aware Hierarchical Token Communication: Clustering, Bit Mapping, and Power Allocation eess.SP · 2026-04-30 · unverdicted · none · ref 17
H-TokCom groups tokens by semantic similarity and protects cluster-level bits with higher power, raising semantic similarity from 0.206 to 0.279 at 3 dB SNR on COCO data.
Convex Dataset Valuation for Post-Training cs.LG · 2026-05-15 · unverdicted · none · ref 19 · internal anchor
A convex KMM-based valuation method that accounts for both target-task alignment and inter-dataset redundancy in gradient space outperforms standard gradient-alignment baselines for LLM post-training data selection.

Natural language un- derstanding with the Quora question pairs dataset

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer