The two main benchmarks for LLM instructed code editing over-represent Python, miss common real-world domains and edit types, and have test coverage issues that limit what they measure.
hub
Unixcoder: Unified cross-modal pre-training for code representation
12 Pith papers cite this work, alongside 459 external citations. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
CGFuse enables deep token-level fusion of graph-derived structural features into language models, yielding 10-16% BLEU and 6-11% CodeBLEU gains on code generation tasks.
Parallel-SFT mixes parallel programs across languages during SFT to produce more transferable RL initializations, yielding better zero-shot generalization to unseen programming languages.
RepoBench is a new benchmark with retrieval, completion, and pipeline tasks to evaluate code auto-completion systems on entire repositories instead of single files.
Full natural-language rewriting of code and queries boosts retrieval on code benchmarks while corpus-only rewriting often hurts, with token entropy difference serving as a cheap predictor of gains.
SAGE uses sparse autoencoders to boost vulnerability signals in LLMs, raising internal SNR 12.7x and delivering up to 318% MCC gains on vulnerability detection benchmarks.
MARGIN reduces geometric distortions in imbalanced vulnerability embeddings by dynamically regularizing margins with von Mises-Fisher concentration estimates and hyperspherical prototypes.
Reasoning-oriented knowledge distillation from DeepSeek-R1 plus response stabilization improves reliability and often performance of compact models for cross-language code clone detection on pairs like Python-Java and Rust-Java.
Text fine-tuning of 8B LLMs on C/C++ vulnerability data inflates cross-language false-positive rates through surface-cue memorization, which an AST inference probe can partially reverse while direct AST fine-tuning cannot.
GTE_base is a compact text embedding model using multi-stage contrastive learning on diverse data that outperforms OpenAI's API and 10x larger models on massive benchmarks and works for code as text.
CTT is a compression pipeline for LLMs that achieves up to 49x memory reduction, 10x faster inference, 81% lower CO2 emissions, and retains 68-98% accuracy on code clone detection, summarization, and generation tasks.
LoRA-MME ensembles LoRA-adapted UniXcoder, CodeBERT, GraphCodeBERT, and CodeBERTa with learned weights to reach 0.7906 weighted F1 and 0.6867 macro F1 on code comment classification.
citing papers explorer
-
Edit, But Verify: An Empirical Audit of Instructed Code-Editing Benchmarks
The two main benchmarks for LLM instructed code editing over-represent Python, miss common real-world domains and edit types, and have test coverage issues that limit what they measure.
-
Deep Graph-Language Fusion for Structure-Aware Code Generation
CGFuse enables deep token-level fusion of graph-derived structural features into language models, yielding 10-16% BLEU and 6-11% CodeBLEU gains on code generation tasks.
-
Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL
Parallel-SFT mixes parallel programs across languages during SFT to produce more transferable RL initializations, yielding better zero-shot generalization to unseen programming languages.
-
RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems
RepoBench is a new benchmark with retrieval, completion, and pipeline tasks to evaluate code auto-completion systems on entire repositories instead of single files.
-
Do not copy and paste! Rewriting strategies for code retrieval
Full natural-language rewriting of code and queries boosts retrieval on code benchmarks while corpus-only rewriting often hurts, with token entropy difference serving as a cheap predictor of gains.
-
SAGE: Signal-Amplified Guided Embeddings for LLM-based Vulnerability Detection
SAGE uses sparse autoencoders to boost vulnerability signals in LLMs, raising internal SNR 12.7x and delivering up to 318% MCC gains on vulnerability detection benchmarks.
-
MARGIN: Margin-Aware Regularized Geometry for Imbalanced Vulnerability Detection
MARGIN reduces geometric distortions in imbalanced vulnerability embeddings by dynamically regularizing margins with von Mises-Fisher concentration estimates and hyperspherical prototypes.
-
Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection
Reasoning-oriented knowledge distillation from DeepSeek-R1 plus response stabilization improves reliability and often performance of compact models for cross-language code clone detection on pairs like Python-Java and Rust-Java.
-
How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection
Text fine-tuning of 8B LLMs on C/C++ vulnerability data inflates cross-language false-positive rates through surface-cue memorization, which an AST inference probe can partially reverse while direct AST fine-tuning cannot.
-
Towards General Text Embeddings with Multi-stage Contrastive Learning
GTE_base is a compact text embedding model using multi-stage contrastive learning on diverse data that outperforms OpenAI's API and 10x larger models on massive benchmarks and works for code as text.
-
Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models
CTT is a compression pipeline for LLMs that achieves up to 49x memory reduction, 10x faster inference, 81% lower CO2 emissions, and retains 68-98% accuracy on code clone detection, summarization, and generation tasks.
-
LoRA-MME: Multi-Model Ensemble of LoRA-Tuned Encoders for Code Comment Classification
LoRA-MME ensembles LoRA-adapted UniXcoder, CodeBERT, GraphCodeBERT, and CodeBERTa with learned weights to reach 0.7906 weighted F1 and 0.6867 macro F1 on code comment classification.