Title resolution pending

· 1910 · arXiv 1910.13267

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

cs.LG · 2023-09-01 · conditional · novelty 6.0

Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.

Efficient Training of Language Models to Fill in the Middle

cs.CL · 2022-07-28 · unverdicted · novelty 6.0

Autoregressive language models trained on data with middle spans relocated to the end learn infilling without degrading left-to-right perplexity or sampling quality.

Re-Triggering Safeguards within LLMs for Jailbreak Detection

cs.CR · 2026-05-11 · unverdicted · novelty 5.0

Embedding disruption re-triggers LLM internal safeguards to detect jailbreak prompts more effectively than standalone defenses.

Which Pieces Does Unigram Tokenization Really Need?

cs.CL · 2025-12-14 · unverdicted · novelty 5.0

Unigram tokenization can be implemented more accessibly and simplified to a variant that improves compression at the cost of slightly higher training loss.

Lessons from the Trenches on Reproducible Evaluation of Language Models

cs.CL · 2024-05-23

citing papers explorer

Showing 5 of 5 citing papers.

Baseline Defenses for Adversarial Attacks Against Aligned Language Models cs.LG · 2023-09-01 · conditional · none · ref 45
Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.
Efficient Training of Language Models to Fill in the Middle cs.CL · 2022-07-28 · unverdicted · none · ref 129
Autoregressive language models trained on data with middle spans relocated to the end learn infilling without degrading left-to-right perplexity or sampling quality.
Re-Triggering Safeguards within LLMs for Jailbreak Detection cs.CR · 2026-05-11 · unverdicted · none · ref 10
Embedding disruption re-triggers LLM internal safeguards to detect jailbreak prompts more effectively than standalone defenses.
Which Pieces Does Unigram Tokenization Really Need? cs.CL · 2025-12-14 · unverdicted · none · ref 1
Unigram tokenization can be implemented more accessibly and simplified to a variant that improves compression at the cost of slightly higher training loss.
Lessons from the Trenches on Reproducible Evaluation of Language Models cs.CL · 2024-05-23 · unreviewed · ref 119

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer