hub

LLMSR @ XLLM 25: SWRV : Empowering Self-Verification of Small Language Models through Step-wise Reasoning and Verification

Danchun Chen · 2025 · Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025) · DOI 10.18653/v1/2025.xllm-1.29

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

open at publisher browse 11 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

representative citing papers

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

cs.AI · 2026-06-22 · unverdicted · novelty 7.0

A matched benchmark shows GUI computer-use agents at 59.1% full pass rate versus 48.2% for original-skill CLI agents, rising to 69.3% with verifier-guided augmentation, indicating modality-specific execution bottlenecks.

Misaligned by Reward: Socially Undesirable Preferences in LLMs

cs.CL · 2026-05-06 · unverdicted · novelty 6.0

Reward models for LLMs frequently select socially undesirable options across four social domains, show no overall best performer, and exhibit a bias-avoidance versus context-sensitivity trade-off.

Psychologically Potent, Computationally Invisible: LLMs Generate Social-Comparison-Eliciting Posts They Fail to Detect

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

LLMs generate Xiaohongshu-style posts that elicit social comparison but show stable failures in prompt-based detection of the same reader-grounded signal.

A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models

cs.CL · 2026-06-02 · unverdicted · novelty 5.0

Lexical richness is a robust linguistic signal for AI-generated text detection across models and domains, while most other features are context-dependent.

Why Low-Resource NLP Needs More Than Cross-Lingual Transfer: Lessons Learned from Luxembourgish

cs.CL · 2026-05-11 · unverdicted · novelty 4.0

Cross-lingual transfer and language-specific data efforts are interdependent and complementary for effective low-resource NLP, as demonstrated through Luxembourgish case studies and synthesis.

From If-Statements to ML Pipelines: Revisiting Bias in Code-Generation

cs.CL · 2026-04-23 · unverdicted · novelty 4.0

LLM-generated ML pipelines show higher bias (87.7% sensitive attributes) than conditional statements (59.2%), indicating that simple if-statement tests underestimate bias risk in practical code generation.

LLM Consumer Behavior Theory: Foundations of a Novel Research Field

cs.AI · 2026-06-16 · unverdicted · novelty 3.0

Introduces LLM Consumer Behavior Theory to analyze consumer behavior when LLMs serve as autonomous decision-making agents in markets.

FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals

cs.CL · 2026-05-05 · unverdicted · novelty 3.0

A feature-based decision tree with parsing-derived signals and heuristics detects LLM-generated code in a lightweight, CPU-only setup for SemEval-2026 Task 13.

mdok-style at SemEval-2026 Task 10: Finetuning LLMs for Conspiracy Detection

cs.CL · 2026-05-04 · unverdicted · novelty 2.0

Finetuning Qwen3-32B with data augmentation and self-training achieves competitive 8th-place ranking on SemEval-2026 conspiracy detection.

mdok-style at SemEval-2026 Task 9: Finetuning LLMs for Multilingual Polarization Detection

cs.CL · 2026-05-04 · unverdicted · novelty 2.0

Finetuning LLMs with QLoRA and multilingual data augmentation for polarization detection, type, and manifestation in SemEval-2026 Task 9.

mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code

cs.LG · 2026-04-23 · unverdicted · novelty 2.0

Fine-tuning LLMs by adapting the mdok approach produces competitive results on binary detection, source attribution, and hybrid/adversarial code identification in SemEval-2026 Task 13.

citing papers explorer

Showing 11 of 11 citing papers after filters.

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents cs.AI · 2026-06-22 · unverdicted · none · ref 66
A matched benchmark shows GUI computer-use agents at 59.1% full pass rate versus 48.2% for original-skill CLI agents, rising to 69.3% with verifier-guided augmentation, indicating modality-specific execution bottlenecks.
Misaligned by Reward: Socially Undesirable Preferences in LLMs cs.CL · 2026-05-06 · unverdicted · none · ref 40
Reward models for LLMs frequently select socially undesirable options across four social domains, show no overall best performer, and exhibit a bias-avoidance versus context-sensitivity trade-off.
Psychologically Potent, Computationally Invisible: LLMs Generate Social-Comparison-Eliciting Posts They Fail to Detect cs.CL · 2026-05-01 · unverdicted · none · ref 40
LLMs generate Xiaohongshu-style posts that elicit social comparison but show stable failures in prompt-based detection of the same reader-grounded signal.
A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models cs.CL · 2026-06-02 · unverdicted · none · ref 40
Lexical richness is a robust linguistic signal for AI-generated text detection across models and domains, while most other features are context-dependent.
Why Low-Resource NLP Needs More Than Cross-Lingual Transfer: Lessons Learned from Luxembourgish cs.CL · 2026-05-11 · unverdicted · none · ref 40
Cross-lingual transfer and language-specific data efforts are interdependent and complementary for effective low-resource NLP, as demonstrated through Luxembourgish case studies and synthesis.
From If-Statements to ML Pipelines: Revisiting Bias in Code-Generation cs.CL · 2026-04-23 · unverdicted · none · ref 40
LLM-generated ML pipelines show higher bias (87.7% sensitive attributes) than conditional statements (59.2%), indicating that simple if-statement tests underestimate bias risk in practical code generation.
LLM Consumer Behavior Theory: Foundations of a Novel Research Field cs.AI · 2026-06-16 · unverdicted · none · ref 145
Introduces LLM Consumer Behavior Theory to analyze consumer behavior when LLMs serve as autonomous decision-making agents in markets.
FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals cs.CL · 2026-05-05 · unverdicted · none · ref 112
A feature-based decision tree with parsing-derived signals and heuristics detects LLM-generated code in a lightweight, CPU-only setup for SemEval-2026 Task 13.
mdok-style at SemEval-2026 Task 10: Finetuning LLMs for Conspiracy Detection cs.CL · 2026-05-04 · unverdicted · none · ref 54
Finetuning Qwen3-32B with data augmentation and self-training achieves competitive 8th-place ranking on SemEval-2026 conspiracy detection.
mdok-style at SemEval-2026 Task 9: Finetuning LLMs for Multilingual Polarization Detection cs.CL · 2026-05-04 · unverdicted · none · ref 53
Finetuning LLMs with QLoRA and multilingual data augmentation for polarization detection, type, and manifestation in SemEval-2026 Task 9.
mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code cs.LG · 2026-04-23 · unverdicted · none · ref 52
Fine-tuning LLMs by adapting the mdok approach produces competitive results on binary detection, source attribution, and hybrid/adversarial code identification in SemEval-2026 Task 13.

LLMSR @ XLLM 25: SWRV : Empowering Self-Verification of Small Language Models through Step-wise Reasoning and Verification

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer