super hub Mixed citations

Qwen2.5 Technical Report

arXiv preprint arXiv:2412 · 2024 · cs.CL · arXiv 2412.15115

Mixed citation behavior. Most common role is background (64%).

906 Pith papers citing it

Background 64% of classified citations

open full Pith review browse 906 citing papers more from arXiv preprint arXiv:2412 arXiv PDF

abstract

In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This provides a strong foundation for common sense, expert knowledge, and reasoning capabilities. In terms of post-training, we implement intricate supervised finetuning with over 1 million samples, as well as multistage reinforcement learning. Post-training techniques enhance human preference, and notably improve long text generation, structural data analysis, and instruction following. To handle diverse and varied use cases effectively, we present Qwen2.5 LLM series in rich sizes. Open-weight offerings include base and instruction-tuned models, with quantized versions available. In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2.5-Turbo and Qwen2.5-Plus, both available from Alibaba Cloud Model Studio. Qwen2.5 has demonstrated top-tier performance on a wide range of benchmarks evaluating language understanding, reasoning, mathematics, coding, human preference alignment, etc. Specifically, the open-weight flagship Qwen2.5-72B-Instruct outperforms a number of open and proprietary models and demonstrates competitive performance to the state-of-the-art open-weight model, Llama-3-405B-Instruct, which is around 5 times larger. Qwen2.5-Turbo and Qwen2.5-Plus offer superior cost-effectiveness while performing competitively against GPT-4o-mini and GPT-4o respectively. Additionally, as the foundation, Qwen2.5 models have been instrumental in training specialized models such as Qwen2.5-Math, Qwen2.5-Coder, QwQ, and multimodal models.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 89 method 21 baseline 13 other 8 dataset 7

citation-polarity summary

background 89 use method 20 baseline 13 unclear 9 use dataset 7

claims ledger

abstract In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This provides a strong foundation for common sense, expert knowledge, and reasoning capabilities. In terms of post-training, we implement intricate supervised finetuning with over 1 million samples, as well

authors

arXiv preprint arXiv:2412

co-cited works

representative citing papers

Entropy-Gated Latent Recursion

cs.LG · 2026-06-15 · unverdicted · novelty 8.0 · 2 refs

EGLR adds a deterministic layer-recursion axis gated by entropy that is complementary to temperature sampling, raising joint oracle accuracy on MATH-500 from 83.4% to 91.6% for a 3B model.

EnergyAgentBench: Benchmarking LLM Agents on Live Energy Infrastructure Data

econ.EM · 2026-05-13 · accept · novelty 8.0

EnergyAgentBench is a new benchmark with 70 task variants that evaluates LLM agents on live energy data for datacenter siting, long-horizon optimization, and causal grid diagnosis.

Acceptance Cards:A Four-Diagnostic Standard for Safe Fine-Tuning Defense Claims

cs.CR · 2026-05-11 · unverdicted · novelty 8.0

Acceptance Cards is a new four-diagnostic standard for safe fine-tuning defense claims that requires statistical reliability, fresh semantic generalization, mechanism alignment, and cross-task transfer; under this protocol SafeLoRA fails the full-card pass on Gemma-2-2B-it.

FormalRewardBench: A Benchmark for Formal Theorem Proving Reward Models

cs.AI · 2026-05-11 · conditional · novelty 8.0

FormalRewardBench is the first benchmark for reward models in formal theorem proving, consisting of 250 Lean 4 preference pairs that show frontier LLMs scoring 59.8% while specialized provers score only 24.4%.

Unifying Scientific Communication: Fine-Grained Correspondence Across Scientific Media

cs.CV · 2026-05-07 · unverdicted · novelty 8.0 · 2 refs

Creates the first benchmark dataset integrating papers, slides, videos, and presentations for evaluating AI models on fine-grained multimodal correspondences in science.

ArgBench: Benchmarking LLMs on Computational Argumentation Tasks

cs.CL · 2026-04-19 · unverdicted · novelty 8.0

ArgBench unifies 33 existing datasets into a standardized benchmark for testing LLMs across 46 argumentation tasks and analyzes the impact of prompting techniques and model factors on performance.

RLCracker: Evaluating the Worst-Case Vulnerability of LLM Watermarks with Adaptive RL Attacks

cs.CR · 2025-09-25 · conditional · novelty 8.0

RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.

Large Language Diffusion Models

cs.CL · 2025-02-14 · unverdicted · novelty 8.0

LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.

TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

TRIAGE augments GRPO with role-typed segment rewards derived from a judge that detects regression and exploration, yielding higher success rates and fewer turns on ALFWorld, Search-QA, and WebShop.

FlexiSLM: A Dynamic and Controllable Frame Rate Spoken Language Model

cs.SD · 2026-06-30 · unverdicted · novelty 7.0

FlexiSLM is the first spoken language model supporting dynamic and controllable frame rates on speech input and output, outperforming fixed-rate 7B models at high quality and enabling faster inference at lower rates like 6.25 Hz.

CORTEX: High-Quality Cross-Domain Organization of Web-Scale Corpora through Ontological Corpus Graph

cs.CL · 2026-06-29 · unverdicted · novelty 7.0

Cortex uses an Ontological Corpus Graph to structure web-scale corpora, creating a refined 24.14B-token corpus and a new benchmark validated on eight LLMs.

Fuzzing Large Language Models to Elicit Hidden Behaviours

cs.LG · 2026-06-28 · unverdicted · novelty 7.0

Fuzzing via Gaussian noise on weights or residual activations elicits hidden backdoor behaviors more often than temperature sampling on four of six models, with proxy-task hyperparameter selection via Thompson sampling improving results over uniform sweeps.

Anisotropy Decides Cosine vs. Rank Metrics for Text Embeddings

cs.CL · 2026-06-28 · conditional · novelty 7.0

Anisotropy, quantified by dominant-dimension variance fraction, determines the best parameter-free similarity metric for text embeddings, with rank-based metrics gaining ~20% relative where cosine is weakest.

CRAFT: Counterfactual Credit Assignment from Free Sibling Rollouts for Self-Distilled Agentic Reinforcement Learning

cs.LG · 2026-06-28 · unverdicted · novelty 7.0

CRAFT is a three-pillar credit assignment scheme that uses counterfactual token importance from GRPO sibling rollouts to provide signed per-token distillation signals in self-distilled agentic RL.

Turn-Averaged SAEs for Feature Discovery and Long-Context Attribution

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

Turn-averaged SAEs reconstruct average activations over conversation turns to represent high-level turn characteristics with a fixed number of features, simplifying long-context interpretability compared to per-token SAEs.

Ko-WideSearch: A Korean Breadth-Search Benchmark for Exhaustive Set Enumeration by Web Agents

cs.CL · 2026-06-25 · unverdicted · novelty 7.0

Ko-WideSearch is a new Korean breadth-search benchmark spanning 16 categories and three difficulty tiers that evaluates web agents on full set membership plus per-item attributes, showing consistent gaps between set recovery and row completion.

Large Language Model Teaches Visual Students: Cross-Modality Transfer of Fine-Grained Conceptual Knowledge

cs.CV · 2026-06-25 · unverdicted · novelty 7.0

LaViD distills LLM conceptual knowledge to vision models via LLM-generated MCQ soft labels, outperforming vision-language distillation baselines on fine-grained benchmarks while improving robustness on spurious correlation datasets.

Supersede: Diagnosing and Training the Memory-Update Gap in LLM Agents

cs.CL · 2026-06-25 · unverdicted · novelty 7.0

The supersession gap in LLM agents—failing to use current facts and discard superseded ones—is a distinct failure not fixed by scale or memory size, but improvable via RL training on a new environment.

Looking Is Not Picking: An Attention-Segment Account of Tool-Selection Failures in LLM Agents

cs.AI · 2026-06-15 · unverdicted · novelty 7.0

Attention analysis shows that LLM tool selection failures occur at the readout/decision stage, not because the model fails to attend to the correct tool definition.

Toward Calibrated, Fair, and accurate Deepfake Detection

cs.LG · 2026-06-03 · unverdicted · novelty 7.0

Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.

Locality Does Not Imply Reachability: Boundary Repair in Block-Sparse Causal Attention

cs.LG · 2026-06-01 · conditional · novelty 7.0

Fixed block causal masks create reachability boundaries where representations depend only on block prefixes, formalized via dependency sets and phase-conditioned coverage functions, with a parameter-free boundary bridge repair.

Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

MERIT enables decentralized instruction tuning via conflict-aware PCA splitting and parameter-space merging, raising average benchmark scores above joint training on multimodal and text mixtures.

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

cs.CL · 2026-05-31 · unverdicted · novelty 7.0

PolySpeech-100 is a new benchmark for native-level speech comprehension across 110 linguistic variants that evaluates 22 models and reports E2E advantages on dialects, robustness gaps on low-resource languages, and degradation from Chain-of-Thought prompting.

Citation Grounding: Detecting and Reducing LLM Citation Hallucinations via Legal Citation Graphs

cs.CL · 2026-05-30 · unverdicted · novelty 7.0

Citation Grounding metric and CG-DPO training method detect and reduce hallucinations in LLM-generated legal citations using a graph from 100.8 million court decisions.

citing papers explorer

Showing 38 of 38 citing papers after filters.

Acceptance Cards:A Four-Diagnostic Standard for Safe Fine-Tuning Defense Claims cs.CR · 2026-05-11 · unverdicted · none · ref 22 · internal anchor
Acceptance Cards is a new four-diagnostic standard for safe fine-tuning defense claims that requires statistical reliability, fresh semantic generalization, mechanism alignment, and cross-task transfer; under this protocol SafeLoRA fails the full-card pass on Gemma-2-2B-it.
RLCracker: Evaluating the Worst-Case Vulnerability of LLM Watermarks with Adaptive RL Attacks cs.CR · 2025-09-25 · conditional · none · ref 28 · internal anchor
RLCracker is a reinforcement learning attack that erases LLM watermarks at 98.5% success rate with minimal data and generalizes across ten schemes and multiple model sizes.
MRMMIA: Membership Inference Attacks on Memory in Chat Agents cs.CR · 2026-05-27 · unverdicted · none · ref 21 · internal anchor
MRMMIA is a multi-recall-probe membership inference attack that extracts signals from chat agent memory and outperforms baselines in black-, gray-, and white-box settings.
Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs cs.CR · 2026-05-20 · conditional · none · ref 62 · internal anchor
Compilation optimizations can be exploited to create stealthy backdoors in LLMs that remain dormant without optimization but achieve ~90% attack success while preserving clean accuracy near 100%.
Mitigating Many-shot Jailbreak Attacks with One Single Demonstration cs.CR · 2026-05-08 · conditional · none · ref 51 · internal anchor
A single safety demonstration appended at inference time mitigates many-shot jailbreak attacks by counteracting implicit malicious fine-tuning on harmful examples.
Stateful Agent Backdoor cs.CR · 2026-05-07 · unverdicted · none · ref 27 · internal anchor
A stateful backdoor for LLM agents, modeled as a Mealy machine with a decomposition framework, enables incremental malicious actions across sessions and achieves 80-95% attack success rate on four models.
Needle-in-RAG: Prompt-Conditioned Character-Level Traceback of Poisoned Spans in Retrieved Evidence cs.CR · 2026-05-03 · unverdicted · none · ref 55 · internal anchor
RAGCharacter localizes poisoned character spans in RAG evidence via prompt-conditioned counterfactual masking and achieves the best accuracy-over-attribution trade-off across tested attacks and models.
Tracing the Dynamics of Refusal: Exploiting Latent Refusal Trajectories for Robust Jailbreak Detection cs.CR · 2026-05-02 · unverdicted · none · ref 6 · 2 links · internal anchor
Causal tracing reveals a persistent Refusal Trajectory in LLM hidden states; SALO detector using sparse activations from a layer window improves jailbreak detection across Qwen, Llama, and Mistral models.
VOW: Verifiable and Oblivious Watermark Detection for Large Language Models cs.CR · 2026-04-30 · unverdicted · none · ref 38 · internal anchor
VOW formulates LLM watermark detection as a secure two-party computation using a Verifiable Oblivious Pseudorandom Function to achieve private and cryptographically verifiable detection.
ReTokSync: Self-Synchronizing Tokenization Disambiguation for Generative Linguistic Steganography cs.CR · 2026-04-28 · unverdicted · none · ref 22 · internal anchor
ReTokSync resolves tokenization ambiguity in generative linguistic steganography via targeted self-synchronizing resets, achieving over 99.7% extraction accuracy and 100% recovery with an auxiliary channel while matching baseline security and quality.
Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks cs.CR · 2026-04-20 · unverdicted · none · ref 90 · internal anchor
LLM tutors leak answers under adversarial student attacks, but a fine-tuned jailbreak agent and simple defenses can benchmark and improve robustness.
RLSpoofer: A Lightweight Evaluator for LLM Watermark Spoofing Resilience cs.CR · 2026-04-13 · unverdicted · none · ref 21 · internal anchor
RLSpoofer trains a 4B model on 100 watermarked paraphrase pairs to spoof PF watermarks at 62% success rate, far exceeding baselines trained on up to 10,000 samples.
Security--Fidelity Tradeoffs: The Hidden Cost of Prompt Injection Defense cs.CR · 2026-06-29 · unverdicted · none · ref 10 · internal anchor
Prompt injection defenses create a security-fidelity tradeoff with no model or defense achieving both high security and high fidelity on the SecFid benchmark across 1,168 examples.
TRACE: Task-Aware Adaptive Self-Evolving Agentic Jailbreaking cs.CR · 2026-05-29 · unverdicted · none · ref 3 · internal anchor
TRACE is a task-aware adaptive self-evolving jailbreaking framework that achieves up to 100% bypass rates on LLM agents via subtask decomposition and scenario evolution.
Echoes within the Reasoning: Stealthy and Effective Watermarking via Chain of Thought cs.CR · 2026-05-27 · unverdicted · none · ref 58 · internal anchor
BiCoT embeds watermarks into the internal geometry of Chain-of-Thought reasoning traces in LLMs via private signature subspace alignment and introduces Robust Subspace Registration for black-box verification under attacks.
Ellipsoid Control: A White-list Jailbreak Defense via Benign Latent Modeling cs.CR · 2026-05-23 · unverdicted · none · ref 27 · internal anchor
Ellipsoid Control is a white-list test-time jailbreak defense that fits an anisotropic ellipsoid from benign activations to constrain projected gradient descent updates, aiming to improve the safety-utility tradeoff over black-list RepE methods.
EVA: Editing for Versatile Alignment against Jailbreaks cs.CR · 2026-05-14 · unverdicted · none · ref 76 · internal anchor
EVA applies direct model editing to surgically neutralize jailbreak vulnerabilities in LLMs and VLMs by targeting specific neurons while preserving general capabilities.
Few-Shot Truly Benign DPO Attack for Jailbreaking LLMs cs.CR · 2026-05-09 · unverdicted · none · ref 38 · internal anchor
A truly benign DPO attack using 10 harmless preference pairs jailbreaks frontier LLMs by suppressing refusal behavior, achieving up to 81.73% attack success rate on GPT-4.1-nano at low cost.
CI-Work: Benchmarking Contextual Integrity in Enterprise LLM Agents cs.CR · 2026-04-23 · unverdicted · none · ref 3 · internal anchor
A new benchmark shows enterprise LLM agents violate contextual integrity at rates of 15.8-50.9% with leakage up to 26.7%, and higher task performance correlates with more privacy breaches that model scaling does not fix.
Text Steganography with Dynamic Codebook and Multimodal Large Language Model cs.CR · 2026-04-22 · unverdicted · none · ref 31 · internal anchor
A black-box text steganography method using a dynamic codebook generated by multimodal LLMs and reject-sampling feedback achieves higher embedding capacity and text quality than prior white-box and fixed-codebook black-box approaches.
Compiling Activation Steering into Weights via Null-Space Constraints for Stealthy Backdoors cs.CR · 2026-04-14 · unverdicted · none · ref 8 · internal anchor
A method compiles a behavioral steering vector into persistent weight edits via null-space projection, enabling stealthy and reliable backdoors in LLMs that trigger only on specific inputs.
Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models cs.CR · 2026-04-01 · conditional · none · ref 21 · internal anchor
A new benchmark exposes food-safety gaps in current LLMs and guardrails, and a fine-tuned 4B model is offered as a domain-specific fix.
SWaRL: Safeguard Code Watermarking via Reinforcement Learning cs.CR · 2026-01-05 · unverdicted · none · ref 26 · internal anchor
SWaRL trains code LLMs with RL using compiler correctness signals and a confidential verifier reward to embed robust, functionality-preserving watermarks that resist refactoring attacks.
Intrinsic Fingerprint of LLMs: Continue Training is NOT All You Need to Steal A Model! cs.CR · 2025-07-02 · unverdicted · none · ref 14 · internal anchor
Standard deviation distributions of attention matrices in LLMs remain distinctive and stable after continued training, enabling fingerprinting to trace model lineage and detect potential plagiarism such as in Pangu Pro MoE.
Logit-Gap Steering: A Forward-Pass Diagnostic for Alignment Robustness cs.CR · 2025-06-30 · unverdicted · none · ref 22 · internal anchor
The refusal-affirmation logit gap quantifies alignment safety margins on toxic prompts, and logit-gap steering efficiently discovers transferable in-distribution suffixes that close the gap and yield 38-96% True ASR with low perplexity.
Toward Principled LLM Safety Testing: Solving the Jailbreak Oracle Problem cs.CR · 2025-06-17 · unverdicted · none · ref 50 · internal anchor
Formalizes the jailbreak oracle problem for LLMs and introduces Boa, a two-phase breadth-first then depth-first search system to solve it efficiently.
Benchmarking Misuse Mitigation Against Covert Adversaries cs.CR · 2025-06-06 · unverdicted · none · ref 22 · internal anchor
Develops the BSD data generation pipeline and two new datasets to evaluate decomposition attacks as effective misuse enablers and stateful defenses as a countermeasure in language model safety.
To trust or not to trust: Attention-based Trust Management for LLM Multi-Agent Systems cs.CR · 2025-06-03 · unverdicted · none · ref 51 · internal anchor
Introduces six-dimension trustworthiness definition and attention-based A-Trust score with a TMS to improve LLM-MAS robustness against malicious or unreliable messages.
Can Large Language Models Really Recognize Your Name? cs.CR · 2025-05-20 · unverdicted · none · ref 58 · internal anchor
LLMs exhibit 20-40% lower recall on ambiguous human names for PII detection, worsening under prompt injections, as shown via the new AmBench benchmark.
Manufactured Confidence: How Memory Consolidation Turns Hearsay into Confident Facts cs.CR · 2026-06-28 · unverdicted · none · ref 32 · internal anchor
LLM memory consolidation turns casual hedged statements into confident facts that agents obey regardless of source or verification.
DataShield: Safety-degrading Data Filtering for LLM Benign Instruction Fine-Tuning cs.CR · 2026-05-29 · unverdicted · none · ref 40 · internal anchor
DataShield scores training samples by their contribution to increased LLM response compliance and filters high-risk ones using a compliance vector and layer-specific CAS metric.
Ablating Safety: Mechanisms for Removing Alignment in Language Models for Security Applications cs.CR · 2026-05-17 · unverdicted · none · ref 33 · internal anchor
Empirical comparison of alignment ablation methods on a 60-prompt security evaluation suite shows task-only LoRA achieves 0.87 mean security score with 0.13 unsafe compliance.
SoK: Robustness in Large Language Models against Jailbreak Attacks cs.CR · 2026-05-06 · accept · none · ref 86 · internal anchor
The paper taxonomizes jailbreak attacks and defenses for LLMs, introduces the Security Cube multi-dimensional evaluation framework, benchmarks 13 attacks and 5 defenses, and identifies open challenges in LLM robustness.
FedDetox: Robust Federated SLM Alignment via On-Device Data Sanitization cs.CR · 2026-04-08 · unverdicted · none · ref 19 · internal anchor
FedDetox uses on-device knowledge-distilled classifiers to sanitize toxic data in federated SLM training, preserving safety alignment comparable to centralized baselines.
No Data? No Problem: Synthesizing Security Graphs for Better Intrusion Detection cs.CR · 2025-06-06 · unverdicted · none · ref 66 · internal anchor
PROVSYN synthesizes high-fidelity security provenance graphs via graph generation and LLMs to augment imbalanced datasets, improving downstream APT detection accuracy by up to 38% on benchmarks.
Probe Choice Changes Canary-Memorization Verdicts: Three Post-Hoc Disagreement Case Studies in a Text-Dominant LoRA-Tuned Autoregressive Testbed cs.CR · 2026-06-30 · unverdicted · none · ref 18 · internal anchor
A prefix-window mean-NLL memorization probe disagrees with full-span NLL and exact-recall in three cases on a controlled autoregressive testbed, leading to recommendations for multi-probe reporting.
Position: LLM Watermarking Should Align Stakeholders' Incentives for Practical Adoption cs.CR · 2025-10-21 · unverdicted · none · ref 64 · internal anchor
LLM watermarking adoption is limited by misaligned stakeholder incentives; incentive-aligned approaches such as in-context watermarking can enable practical use in targeted domains like education and peer review.
SelfGrader: LLM Jailbreak Detection via Anchored Token-Level Logits cs.CR · 2026-04-01 · unreviewed · ref 24 · internal anchor

Qwen2.5 Technical Report

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer