Graphgen: Enhancing supervised fine-tuning for llms with knowledge-driven synthetic data generation

Zihong Chen, Wanli Jiang, Jinzhe Li, Zhonghang Yuan, Huanjun Kong, Wanli Ouyang, Nanqing Dong · 2025 · arXiv 2505.20416

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

UniShield: Unified Face Attack Detection via KG-Informed Multimodal Reasoning

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

UniShield introduces a knowledge-graph-informed multimodal framework that improves unified detection of physical and digital face attacks through instruction tuning and consistency-optimized reasoning.

EmbGen: Teaching with Reassembled Corpora

cs.CL · 2026-05-19 · unverdicted · novelty 6.0

EmbGen creates synthetic QA data by entity decomposition, embedding-based reassembly into clusters, and multi-level sampling with cluster-specific prompts, yielding up to 88.9% higher Binary Accuracy than baselines on heterogeneous datasets under fixed token budgets.

TeleCom-Bench: How Far Are Large Language Models from Industrial Telecommunication Applications?

cs.AI · 2026-05-18 · conditional · novelty 6.0

TeleCom-Bench reveals LLMs reach 90% on telecom intent and entity tasks but drop to 30% on solution generation and root cause analysis in live network scenarios.

MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

cs.CL · 2026-02-13 · unverdicted · novelty 4.0

MedXIAOHE is a medical MLLM that claims state-of-the-art benchmark performance through specialized pretraining to cover long-tail diseases and RL-based reasoning training.

citing papers explorer

Showing 4 of 4 citing papers.

UniShield: Unified Face Attack Detection via KG-Informed Multimodal Reasoning cs.CV · 2026-05-09 · unverdicted · none · ref 4
UniShield introduces a knowledge-graph-informed multimodal framework that improves unified detection of physical and digital face attacks through instruction tuning and consistency-optimized reasoning.
EmbGen: Teaching with Reassembled Corpora cs.CL · 2026-05-19 · unverdicted · none · ref 4
EmbGen creates synthetic QA data by entity decomposition, embedding-based reassembly into clusters, and multi-level sampling with cluster-specific prompts, yielding up to 88.9% higher Binary Accuracy than baselines on heterogeneous datasets under fixed token budgets.
TeleCom-Bench: How Far Are Large Language Models from Industrial Telecommunication Applications? cs.AI · 2026-05-18 · conditional · none · ref 6
TeleCom-Bench reveals LLMs reach 90% on telecom intent and entity tasks but drop to 30% on solution generation and root cause analysis in live network scenarios.
MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs cs.CL · 2026-02-13 · unverdicted · none · ref 10
MedXIAOHE is a medical MLLM that claims state-of-the-art benchmark performance through specialized pretraining to cover long-tail diseases and RL-based reasoning training.

Graphgen: Enhancing supervised fine-tuning for llms with knowledge-driven synthetic data generation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer