Aeon: a method for automatic evaluation of nlp test cases

doi: 10 · 2022 · arXiv 3767.353439

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

LLM-based vs. Search-based Merge Conflict Resolution: An Empirical Study of Competing Paradigms

cs.SE · 2026-05-15 · unverdicted · novelty 7.0

LLM-based merge conflict resolution performs well on imbalanced conflicts but struggles with large or non-English inputs, while search-based methods show better generalization and strength on balanced conflicts.

Raven: Rethinking Automated Assessment for Scratch Programs via Video-Grounded Evaluation

cs.SE · 2026-04-20 · unverdicted · novelty 6.0

Raven automates Scratch program assessment by having instructors specify task-level video generation rules and using LLMs to analyze resulting videos for behavioral compliance, outperforming prior tools on real student submissions.

SemLink: A Semantic-Aware Automated Test Oracle for Hyperlink Verification using Siamese Sentence-BERT

cs.SE · 2026-04-07 · unverdicted · novelty 6.0

SemLink applies a Siamese SBERT model to detect semantic drift in hyperlinks, achieving 96% recall at 47.5 times the speed of GPT-5.2 using a new 60k-pair dataset.

citing papers explorer

Showing 3 of 3 citing papers.

LLM-based vs. Search-based Merge Conflict Resolution: An Empirical Study of Competing Paradigms cs.SE · 2026-05-15 · unverdicted · none · ref 20
LLM-based merge conflict resolution performs well on imbalanced conflicts but struggles with large or non-English inputs, while search-based methods show better generalization and strength on balanced conflicts.
Raven: Rethinking Automated Assessment for Scratch Programs via Video-Grounded Evaluation cs.SE · 2026-04-20 · unverdicted · none · ref 71
Raven automates Scratch program assessment by having instructors specify task-level video generation rules and using LLMs to analyze resulting videos for behavioral compliance, outperforming prior tools on real student submissions.
SemLink: A Semantic-Aware Automated Test Oracle for Hyperlink Verification using Siamese Sentence-BERT cs.SE · 2026-04-07 · unverdicted · none · ref 23
SemLink applies a Siamese SBERT model to detect semantic drift in hyperlinks, achieving 96% recall at 47.5 times the speed of GPT-5.2 using a new 60k-pair dataset.

Aeon: a method for automatic evaluation of nlp test cases

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer