hub

B ool Q : Exploring the surprising difficulty of natural yes/no questions

Clark, Christopher, Lee, Kenton, Chang, Ming-Wei, Kwiatkowski, Tom, Collins, Michael, Toutanova, Kristina , booktitle = · 2019 · DOI 10.18653/v1/n19-1300

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

open at publisher browse 20 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.

EdgeFlowerTune: Evaluating Federated LLM Fine-Tuning Under Realistic Edge System Constraints

cs.CL · 2026-05-09 · unverdicted · novelty 7.0

EdgeFlowerTune is a real-device benchmark that jointly assesses model quality and system costs for federated LLM fine-tuning on edge hardware using three protocols: Quality-under-Budget, Cost-to-Target, and Robustness.

Layer Collapse in Diffusion Language Models

cs.LG · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

Diffusion language models develop early-layer collapse around an indispensable super-outlier due to overtraining, resulting in higher compressibility and reversed optimal sparsity patterns versus autoregressive models.

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

SimDiff: Depth Pruning via Similarity and Difference

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

SimDiff uses similarity and difference metrics to prune LLM layers more effectively than cosine similarity alone, retaining over 91% performance at 25% pruning on LLaMA2-7B.

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

cs.CL · 2024-05-07 · unverdicted · novelty 7.0

DeepSeek-V2 delivers top-tier open-source LLM performance using only 21B active parameters by compressing the KV cache 93.3% and cutting training costs 42.5% via MLA and DeepSeekMoE.

GAIA: a benchmark for General AI Assistants

cs.CL · 2023-11-21 · unverdicted · novelty 7.0

GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.

The Power of Scale for Parameter-Efficient Prompt Tuning

cs.CL · 2021-04-18 · unverdicted · novelty 7.0

Prompt tuning matches full model tuning performance on large language models while tuning only a small fraction of parameters and improves robustness to domain shifts.

Fitting Is Not Enough: Smoothness in Extremely Quantized LLMs

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

Extremely quantized LLMs degrade in smoothness, sparsifying the decoding tree and hurting generation quality; a smoothness-preserving principle delivers gains beyond numerical fitting.

Learning in the Fisher Subspace: A Guided Initialization for LoRA Fine-Tuning

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

Fisher information from target data provides a better criterion than weight geometry for choosing LoRA subspaces, yielding consistent performance gains on downstream tasks.

GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models

cs.AI · 2026-04-21 · unverdicted · novelty 6.0

GRASPrune removes 50% of parameters from LLaMA-2-7B via global gating and projected straight-through estimation, reaching 12.18 WikiText-2 perplexity and competitive zero-shot accuracy after four epochs on 512 calibration sequences.

Representation-Guided Parameter-Efficient LLM Unlearning

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.

TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models

cs.LG · 2026-04-07 · unverdicted · novelty 6.0

TalkLoRA equips MoE-LoRA experts with a communication module that smooths routing dynamics and improves performance on language tasks under similar parameter budgets.

Titans: Learning to Memorize at Test Time

cs.LG · 2024-12-31 · unverdicted · novelty 6.0

Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

cs.CL · 2023-05-23 · conditional · novelty 6.0

UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.

BloombergGPT: A Large Language Model for Finance

cs.LG · 2023-03-30 · conditional · novelty 6.0

BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.

Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling

cs.CL · 2026-04-28 · unverdicted · novelty 5.0

Marco-MoE delivers open multilingual MoE models with 5% activation sparsity that outperform similarly sized dense models on English and multilingual benchmarks through efficient upcycling.

VLA Foundry: A Unified Framework for Training Vision-Language-Action Models

cs.RO · 2026-04-21 · unverdicted · novelty 5.0

VLA Foundry provides a single training stack for VLA models and releases open models that match prior closed-source performance or outperform baselines on multi-task manipulation in simulation.

Gated Delta Networks: Improving Mamba2 with Delta Rule

cs.CL · 2024-12-09 · unverdicted · novelty 5.0

Gated DeltaNet integrates gating and delta rules into linear transformers, outperforming Mamba2 and DeltaNet on language modeling, reasoning, retrieval, and long-context tasks.

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

cs.CL · 2024-01-05 · unverdicted · novelty 4.0

DeepSeek LLM 67B exceeds LLaMA-2 70B on code, mathematics and reasoning benchmarks after pre-training on 2 trillion tokens and alignment via SFT and DPO.

citing papers explorer

Showing 20 of 20 citing papers.

TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment cs.CL · 2026-05-13 · unverdicted · none · ref 123
TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.
EdgeFlowerTune: Evaluating Federated LLM Fine-Tuning Under Realistic Edge System Constraints cs.CL · 2026-05-09 · unverdicted · none · ref 6
EdgeFlowerTune is a real-device benchmark that jointly assesses model quality and system costs for federated LLM fine-tuning on edge hardware using three protocols: Quality-under-Budget, Cost-to-Target, and Robustness.
Layer Collapse in Diffusion Language Models cs.LG · 2026-05-07 · unverdicted · none · ref 5 · 2 links
Diffusion language models develop early-layer collapse around an indispensable super-outlier due to overtraining, resulting in higher compressibility and reversed optimal sparsity patterns versus autoregressive models.
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences cs.LG · 2026-04-22 · unverdicted · none · ref 70
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
SimDiff: Depth Pruning via Similarity and Difference cs.AI · 2026-04-21 · unverdicted · none · ref 23
SimDiff uses similarity and difference metrics to prune LLM layers more effectively than cosine similarity alone, retaining over 91% performance at 25% pruning on LLaMA2-7B.
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model cs.CL · 2024-05-07 · unverdicted · none · ref 128
DeepSeek-V2 delivers top-tier open-source LLM performance using only 21B active parameters by compressing the KV cache 93.3% and cutting training costs 42.5% via MLA and DeepSeekMoE.
GAIA: a benchmark for General AI Assistants cs.CL · 2023-11-21 · unverdicted · none · ref 86
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
The Power of Scale for Parameter-Efficient Prompt Tuning cs.CL · 2021-04-18 · unverdicted · none · ref 5
Prompt tuning matches full model tuning performance on large language models while tuning only a small fraction of parameters and improves robustness to domain shifts.
Fitting Is Not Enough: Smoothness in Extremely Quantized LLMs cs.CL · 2026-05-09 · unverdicted · none · ref 6
Extremely quantized LLMs degrade in smoothness, sparsifying the decoding tree and hurting generation quality; a smoothness-preserving principle delivers gains beyond numerical fitting.
Learning in the Fisher Subspace: A Guided Initialization for LoRA Fine-Tuning cs.LG · 2026-05-01 · unverdicted · none · ref 22
Fisher information from target data provides a better criterion than weight geometry for choosing LoRA subspaces, yielding consistent performance gains on downstream tasks.
GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models cs.AI · 2026-04-21 · unverdicted · none · ref 64
GRASPrune removes 50% of parameters from LLaMA-2-7B via global gating and projected straight-through estimation, reaching 12.18 WikiText-2 perplexity and competitive zero-shot accuracy after four epochs on 512 calibration sequences.
Representation-Guided Parameter-Efficient LLM Unlearning cs.CL · 2026-04-19 · unverdicted · none · ref 176
REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.
TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models cs.LG · 2026-04-07 · unverdicted · none · ref 5
TalkLoRA equips MoE-LoRA experts with a communication module that smooths routing dynamics and improves performance on language tasks under similar parameter budgets.
Titans: Learning to Memorize at Test Time cs.LG · 2024-12-31 · unverdicted · none · ref 21
Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations cs.CL · 2023-05-23 · conditional · none · ref 122
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
BloombergGPT: A Large Language Model for Finance cs.LG · 2023-03-30 · conditional · none · ref 23
BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.
Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling cs.CL · 2026-04-28 · unverdicted · none · ref 3
Marco-MoE delivers open multilingual MoE models with 5% activation sparsity that outperform similarly sized dense models on English and multilingual benchmarks through efficient upcycling.
VLA Foundry: A Unified Framework for Training Vision-Language-Action Models cs.RO · 2026-04-21 · unverdicted · none · ref 14
VLA Foundry provides a single training stack for VLA models and releases open models that match prior closed-source performance or outperform baselines on multi-task manipulation in simulation.
Gated Delta Networks: Improving Mamba2 with Delta Rule cs.CL · 2024-12-09 · unverdicted · none · ref 100
Gated DeltaNet integrates gating and delta rules into linear transformers, outperforming Mamba2 and DeltaNet on language modeling, reasoning, retrieval, and long-context tasks.
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism cs.CL · 2024-01-05 · unverdicted · none · ref 132
DeepSeek LLM 67B exceeds LLaMA-2 70B on code, mathematics and reasoning benchmarks after pre-training on 2 trillion tokens and alignment via SFT and DPO.

B ool Q : Exploring the surprising difficulty of natural yes/no questions

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer