ClassEval-Pro benchmark shows frontier LLMs achieve at most 45.6% Pass@1 on class-level code tasks, with logic errors (56%) and dependency errors (38%) as dominant failure modes.
arXiv preprint arXiv:1901.11196 , year=
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
KL regularization aligning model predictions with empirical transition patterns improves macro-F1 by 9-42% in next dialogue act prediction on German counselling data and transfers to other datasets.
The paper introduces a new taxonomy that groups AI-driven psychological computing tasks by their underlying computational patterns into four categories and reviews over 300 works from the pre-trained model to LLM eras.
ART automatically generates multi-step reasoning programs with tool integration for LLMs, yielding substantial gains over few-shot and auto-CoT prompting on BigBench and MMLU while matching hand-crafted CoT on most tasks.
Contrastive pre-training on unsupervised data at scale creates text and code embeddings that set new state-of-the-art results on classification and semantic search benchmarks.
HAMR combines meta-learning with hardness-aware weighting and neighborhood resampling to improve minority-class performance on imbalanced NLP datasets.
Systematic review of 80 papers shows TTP extraction shifting to transformer and LLM methods but limited by narrow datasets, single-label focus, and low reproducibility.
DeBERTa-V3-base with focal loss, discourse features, and LLM-augmented data for minority classes achieves 0.76 Macro F1 on clarity-level classification of political QA pairs, ranking 8th in SemEval-2026 Task 6.
citing papers explorer
-
ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation
ClassEval-Pro benchmark shows frontier LLMs achieve at most 45.6% Pass@1 on class-level code tasks, with logic errors (56%) and dependency errors (38%) as dominant failure modes.
-
Transition-Matrix Regularization for Next Dialogue Act Prediction in Counselling Conversations
KL regularization aligning model predictions with empirical transition patterns improves macro-F1 by 9-42% in next dialogue act prediction on German counselling data and transfers to other datasets.
-
From Pre-trained Models to Large Language Models: A Comprehensive Survey of AI-Driven Psychological Computing
The paper introduces a new taxonomy that groups AI-driven psychological computing tasks by their underlying computational patterns into four categories and reviews over 300 works from the pre-trained model to LLM eras.
-
ART: Automatic multi-step reasoning and tool-use for large language models
ART automatically generates multi-step reasoning programs with tool integration for LLMs, yielding substantial gains over few-shot and auto-CoT prompting on BigBench and MMLU while matching hand-crafted CoT on most tasks.
-
Text and Code Embeddings by Contrastive Pre-Training
Contrastive pre-training on unsupervised data at scale creates text and code embeddings that set new state-of-the-art results on classification and semantic search benchmarks.
-
Model-Agnostic Meta Learning for Class Imbalance Adaptation
HAMR combines meta-learning with hardness-aware weighting and neighborhood resampling to improve minority-class performance on imbalanced NLP datasets.
-
What Are Adversaries Doing? Automating Tactics, Techniques, and Procedures Extraction: A Systematic Review
Systematic review of 80 papers shows TTP extraction shifting to transformer and LLM methods but limited by narrow datasets, single-label focus, and low reproducibility.
-
Duluth at SemEval-2026 Task 6: DeBERTa with LLM-Augmented Data for Unmasking Political Question Evasions
DeBERTa-V3-base with focal loss, discourse features, and LLM-augmented data for minority classes achieves 0.76 Macro F1 on clarity-level classification of political QA pairs, ranking 8th in SemEval-2026 Task 6.
- Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models