Natural Language Understanding with the Quora Question Pairs Dataset

· 2019 · cs.CL · arXiv 1907.01041

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open full Pith review browse 6 citing papers arXiv PDF

abstract

This paper explores the task Natural Language Understanding (NLU) by looking at duplicate question detection in the Quora dataset. We conducted extensive exploration of the dataset and used various machine learning models, including linear and tree-based models. Our final finding was that a simple Continuous Bag of Words neural network model had the best performance, outdoing more complicated recurrent and attention based models. We also conducted error analysis and found some subjectivity in the labeling of the dataset.

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates

cs.LG · 2026-05-29 · unverdicted · novelty 7.0

Post-hoc truncation of the tail of the SVD of ΔW reduces spurious-group gaps by up to 5× with <2 pp accuracy loss across 0.5B–7B models and four benchmarks.

Bilinear Coordinate Alignment for Training-Free Task-Vector Transfer

cs.LG · 2026-05-27 · unverdicted · novelty 6.0

BiCo transfers task vectors across models differing in width, depth, and pre-training by estimating dual-space orthogonal Procrustes mappings from one forward-backward pass on a calibration set.

Improving Parameter-Efficient Federated Learning with Differentially Private Refactorization

cs.CR · 2026-05-08 · unverdicted · novelty 6.0

FedPower improves the accuracy-privacy tradeoff in differentially private LoRA-based federated learning by reconstructing and clipping full-rank updates then using PowerDP to inject noise before orthonormalization in low-rank factorization.

Semantics-Aware Hierarchical Token Communication: Clustering, Bit Mapping, and Power Allocation

eess.SP · 2026-04-30 · unverdicted · novelty 6.0

H-TokCom groups tokens by semantic similarity and protects cluster-level bits with higher power, raising semantic similarity from 0.206 to 0.279 at 3 dB SNR on COCO data.

Convex Dataset Valuation for Post-Training

cs.LG · 2026-05-15 · unverdicted · novelty 5.0

A convex KMM-based valuation method that accounts for both target-task alignment and inter-dataset redundancy in gradient space outperforms standard gradient-alignment baselines for LLM post-training data selection.

ML-Embed: Inclusive and Efficient Embeddings for a Multilingual World

cs.CL · 2026-05-14 · unverdicted · novelty 5.0

ML-Embed releases open multilingual embedding models trained with a new 3D-ML framework that reportedly set new MTEB records on 9 of 17 benchmarks, especially in low-resource languages.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Improving Parameter-Efficient Federated Learning with Differentially Private Refactorization cs.CR · 2026-05-08 · unverdicted · none · ref 18
FedPower improves the accuracy-privacy tradeoff in differentially private LoRA-based federated learning by reconstructing and clipping full-rank updates then using PowerDP to inject noise before orthonormalization in low-rank factorization.

Natural Language Understanding with the Quora Question Pairs Dataset

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer