Llama guard 3-1b-int4: Compact and 9 efficient safeguard for human-ai conversations

Igor Fedorov, Kate Plawiak, Lemeng Wu, Tarek Elgamal, Naveen Suda, et al · 2024 · arXiv 2411.17713

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

other 1

citation-polarity summary

unclear 1

representative citing papers

When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models

cs.CR · 2025-10-09 · unverdicted · novelty 7.0

CREST-Search is a red-teaming framework that crafts seemingly benign search queries to induce unsafe citations from web-augmented LLMs, backed by a new WebSearch-Harm dataset for fine-tuning a specialized attacker model.

LPG: Balancing Efficiency and Policy Reasoning in Latent Policy Guardrails

cs.CR · 2026-05-17 · conditional · novelty 6.0

LPG compresses policy deliberation into 10 latent tokens to reach 84.5% safety accuracy and 11x speedup over explicit reasoning baselines on guardrail benchmarks.

To Lie or Not to Lie? Investigating The Biased Spread of Global Lies by LLMs

cs.CL · 2026-04-08 · unverdicted · novelty 6.0

LLMs propagate misinformation more in lower-resource languages and lower-HDI countries, with input safety classifiers and retrieval-augmented fact-checking showing cross-lingual and regional gaps.

MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale Deployment

cs.LG · 2026-03-16 · unverdicted · novelty 6.0

MobileLLM-Flash creates 350M-1.4B parameter LLMs via latency-guided search and attention skipping, delivering up to 1.8x faster prefill and 1.6x faster decode on mobile CPUs with comparable or better quality.

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

cs.CL · 2025-11-16 · unverdicted · novelty 6.0

EvoSynth evolves code-based jailbreak algorithms via multi-agent self-correction, reaching 85.5% ASR on Claude-Sonnet-4.5 and 95.9% average across targets with greater diversity.

citing papers explorer

Showing 5 of 5 citing papers.

When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models cs.CR · 2025-10-09 · unverdicted · none · ref 11
CREST-Search is a red-teaming framework that crafts seemingly benign search queries to induce unsafe citations from web-augmented LLMs, backed by a new WebSearch-Harm dataset for fine-tuning a specialized attacker model.
LPG: Balancing Efficiency and Policy Reasoning in Latent Policy Guardrails cs.CR · 2026-05-17 · conditional · none · ref 4
LPG compresses policy deliberation into 10 latent tokens to reach 84.5% safety accuracy and 11x speedup over explicit reasoning baselines on guardrail benchmarks.
To Lie or Not to Lie? Investigating The Biased Spread of Global Lies by LLMs cs.CL · 2026-04-08 · unverdicted · none · ref 14
LLMs propagate misinformation more in lower-resource languages and lower-HDI countries, with input safety classifiers and retrieval-augmented fact-checking showing cross-lingual and regional gaps.
MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale Deployment cs.LG · 2026-03-16 · unverdicted · none · ref 5
MobileLLM-Flash creates 350M-1.4B parameter LLMs via latency-guided search and attention skipping, delivering up to 1.8x faster prefill and 1.6x faster decode on mobile CPUs with comparable or better quality.
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs cs.CL · 2025-11-16 · unverdicted · none · ref 11
EvoSynth evolves code-based jailbreak algorithms via multi-agent self-correction, reaching 85.5% ASR on Claude-Sonnet-4.5 and 95.9% average across targets with greater diversity.

Llama guard 3-1b-int4: Compact and 9 efficient safeguard for human-ai conversations

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer