hub

Data mining and knowledge discovery , 33(4):917–963

Johannes Welbl, Nelson F · 2017 · DOI 10.18653/v1/w17-4413

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

open at publisher browse 15 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

SemGrad is a gradient-based uncertainty quantification technique for free-form LLM generation that operates in semantic space using a Semantic Preservation Score to select stable embeddings.

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

Answer tokens show forward drift and key-anchor focus when reading correct reasoning traces; a geometric-plus-semantic SRQ steering method boosts quantitative reasoning accuracy without training.

RWKV: Reinventing RNNs for the Transformer Era

cs.CL · 2023-05-22 · unverdicted · novelty 7.0

RWKV uses a linear attention mechanism to deliver Transformer-level performance with RNN-style inference efficiency, demonstrated at up to 14 billion parameters.

Multilingual Knowledge Transfer under Data Constraints via Lexical Interventions

cs.CL · 2026-05-22 · unverdicted · novelty 6.0

LINK improves cross-lingual knowledge transfer via lexical substitutions in English pretraining data, yielding notable downstream gains and up to 2x training speedup across eight languages and five model sizes.

Scaling Laws for Mixture Pretraining Under Data Constraints

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Empirical study shows mixture pretraining tolerates higher target data repetition than single-source training, with a new repetition-aware scaling law enabling principled mixture selection based on data size, compute, and model scale.

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

A shared global expert pool in MoE improves validation loss over per-layer experts and allows sublinear expert-parameter growth with depth.

GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models

cs.AI · 2026-04-21 · unverdicted · novelty 6.0

GRASPrune removes 50% of parameters from LLaMA-2-7B via global gating and projected straight-through estimation, reaching 12.18 WikiText-2 perplexity and competitive zero-shot accuracy after four epochs on 512 calibration sequences.

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

cs.AI · 2025-07-01 · conditional · novelty 6.0

Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.

Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models

cs.CL · 2025-02-20 · unverdicted · novelty 6.0

Adapts multi-layer token-level Mahalanobis distance with supervised linear regression to yield improved uncertainty scores for LLM truthfulness tasks.

Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models

cs.CL · 2024-08-20 · unverdicted · novelty 6.0

A regression model using attention features and recurrent uncertainty scores improves selective generation in LLMs over unsupervised and supervised baselines on ten datasets and three models.

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

cs.CL · 2022-04-14 · accept · novelty 6.0

GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.

Learning Uncertainty from Sequential Internal Dispersion in Large Language Models

cs.CL · 2026-04-17 · unverdicted · novelty 5.0

SIVR detects LLM hallucinations by learning from token-wise and layer-wise variance patterns in internal hidden states, outperforming baselines with better generalization and less training data.

(How) Learning Rates Regulate Catastrophic Overtraining

cs.LG · 2026-04-15 · unverdicted · novelty 5.0

Learning rate decay during SFT increases pretrained model sharpness, which exacerbates catastrophic forgetting and causes overtraining in LLMs.

Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked?

cs.CL · 2025-07-21 · unverdicted · novelty 4.0

LLM accuracy on reasoning tasks differs significantly by question type, with step-by-step reasoning accuracy often uncorrelated to final answer selection.

citing papers explorer

Showing 15 of 15 citing papers.

Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models cs.CL · 2026-05-06 · unverdicted · none · ref 20
SemGrad is a gradient-based uncertainty quantification technique for free-form LLM generation that operates in semantic space using a Semantic Preservation Score to select stable embeddings.
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences cs.LG · 2026-04-22 · unverdicted · none · ref 81
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning cs.CL · 2026-04-21 · unverdicted · none · ref 22
Answer tokens show forward drift and key-anchor focus when reading correct reasoning traces; a geometric-plus-semantic SRQ steering method boosts quantitative reasoning accuracy without training.
RWKV: Reinventing RNNs for the Transformer Era cs.CL · 2023-05-22 · unverdicted · none · ref 8
RWKV uses a linear attention mechanism to deliver Transformer-level performance with RNN-style inference efficiency, demonstrated at up to 14 billion parameters.
Multilingual Knowledge Transfer under Data Constraints via Lexical Interventions cs.CL · 2026-05-22 · unverdicted · none · ref 38
LINK improves cross-lingual knowledge transfer via lexical substitutions in English pretraining data, yielding notable downstream gains and up to 2x training speedup across eight languages and five model sizes.
Scaling Laws for Mixture Pretraining Under Data Constraints cs.LG · 2026-05-12 · unverdicted · none · ref 50
Empirical study shows mixture pretraining tolerates higher target data repetition than single-source training, with a new repetition-aware scaling law enabling principled mixture selection based on data size, compute, and model scale.
UniPool: A Globally Shared Expert Pool for Mixture-of-Experts cs.LG · 2026-05-07 · unverdicted · none · ref 53
A shared global expert pool in MoE improves validation loss over per-layer experts and allows sublinear expert-parameter growth with depth.
GRASPrune: Global Gating for Budgeted Structured Pruning of Large Language Models cs.AI · 2026-04-21 · unverdicted · none · ref 65
GRASPrune removes 50% of parameters from LLaMA-2-7B via global gating and projected straight-through estimation, reaching 12.18 WikiText-2 perplexity and competitive zero-shot accuracy after four epochs on 512 calibration sequences.
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning cs.AI · 2025-07-01 · conditional · none · ref 12
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.
Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models cs.CL · 2025-02-20 · unverdicted · none · ref 60
Adapts multi-layer token-level Mahalanobis distance with supervised linear regression to yield improved uncertainty scores for LLM truthfulness tasks.
Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models cs.CL · 2024-08-20 · unverdicted · none · ref 42
A regression model using attention features and recurrent uncertainty scores improves selective generation in LLMs over unsupervised and supervised baselines on ten datasets and three models.
GPT-NeoX-20B: An Open-Source Autoregressive Language Model cs.CL · 2022-04-14 · accept · none · ref 101
GPT-NeoX-20B is a publicly released 20B parameter autoregressive language model trained on the Pile that shows strong gains in five-shot reasoning over similarly sized prior models.
Learning Uncertainty from Sequential Internal Dispersion in Large Language Models cs.CL · 2026-04-17 · unverdicted · none · ref 55
SIVR detects LLM hallucinations by learning from token-wise and layer-wise variance patterns in internal hidden states, outperforming baselines with better generalization and less training data.
(How) Learning Rates Regulate Catastrophic Overtraining cs.LG · 2026-04-15 · unverdicted · none · ref 21
Learning rate decay during SFT increases pretrained model sharpness, which exacerbates catastrophic forgetting and causes overtraining in LLMs.
Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked? cs.CL · 2025-07-21 · unverdicted · none · ref 36
LLM accuracy on reasoning tasks differs significantly by question type, with step-by-step reasoning accuracy often uncorrelated to final answer selection.

Data mining and knowledge discovery , 33(4):917–963

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer