pith. sign in

hub Tool reference

Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection

Tool reference. 80% of classified Pith citations use this work as a method, library, or software dependency, not as a substantive claim.

19 Pith papers citing it
Method reference 80% of classified citations

hub tools

citation-role summary

dataset 4 background 1

citation-polarity summary

clear filters

representative citing papers

Boiling the Frog: A Multi-Turn Benchmark for Agentic Safety

cs.CL · 2026-05-21 · unverdicted · novelty 7.0 · 2 refs

Boiling the Frog is a new stateful multi-turn benchmark that finds an aggregate 44.4% strict attack success rate for incremental safety violations across nine AI models, with rates ranging from 20.5% to 92.9%.

Leveraging RAG for Training-Free Alignment of LLMs

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

RAG-Pref is a training-free RAG-based alignment technique that conditions LLMs on contrastive preference samples during inference, yielding over 3.7x average improvement in agentic attack refusals when combined with offline methods across five LLMs.

Textbooks Are All You Need II: phi-1.5 technical report

cs.CL · 2023-09-11 · unverdicted · novelty 6.0

phi-1.5 is a 1.3B parameter model trained on synthetic textbook data that matches the reasoning performance of models five times larger on natural language, math, and basic coding tasks.

TrustLLM: Trustworthiness in Large Language Models

cs.CL · 2024-01-10 · unverdicted · novelty 5.0

TrustLLM defines eight trustworthiness principles, creates a six-dimension benchmark, and evaluates 16 LLMs showing proprietary models generally lead but some open-source ones are close while over-calibration can hurt utility.

Baichuan 2: Open Large-scale Language Models

cs.CL · 2023-09-19 · unverdicted · novelty 4.0

Baichuan 2 presents 7B and 13B LLMs trained on 2.6T tokens that match or exceed similar open models on MMLU, CMMLU, GSM8K, HumanEval and excel in medicine and law.

citing papers explorer

Showing 15 of 15 citing papers after filters.