Zamfirescu-Pereira, Bjoern Hartmann, Aditya Parameswaran, and Ian Arawjo

Shreya Shankar, J · 2024 · arXiv 4777.367645

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

IdeaBlocks: Expressing and Reusing Divergent Intents for Graphic Design Exploration using Generative AI

cs.HC · 2025-07-29 · unverdicted · novelty 7.0

IdeaBlocks modularizes divergent intents into Exploration Blocks with multi-level reuse options, enabling 2.13 times more images explored and 12.5% greater visual diversity than baseline in a comparative user study.

ZORO: Active Rules for Reliable Vibe Coding

cs.HC · 2026-04-17 · unverdicted · novelty 6.0

ZORO integrates rules directly into AI coding workflows by enriching plans, enforcing compliance with proof requirements, and evolving rules via user feedback, resulting in better rule adherence and shifts in user behavior.

On Cost-Effective LLM-as-a-Judge Improvement Techniques

cs.CL · 2026-04-15 · conditional · novelty 5.0

Ensemble scoring plus task-specific criteria injection raises LLM judge accuracy to 85.8 percent on RewardBench 2, a 13.5-point gain over baseline, with small models gaining the most.

Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics

cs.CY · 2026-04-02 · unverdicted · novelty 5.0 · 2 refs

Case studies with blind UK residents and people from Kerala and Tamil Nadu demonstrate that community input at the systematization stage produces culturally grounded definitions of appropriateness for text-to-image model outputs.

Automated Self-Testing as a Quality Gate: Evidence-Driven Release Management for LLM Applications

cs.SE · 2026-03-13 · unverdicted · novelty 5.0

An automated self-testing framework with evidence-based quality gates for LLM application releases was evaluated in a longitudinal case study of a multi-agent conversational AI system, identifying rollback builds and supporting stable quality over four weeks.

VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents

cs.CL · 2025-06-17 · unverdicted · novelty 5.0

VIDEE introduces a human-in-the-loop system using Monte-Carlo Tree Search for task decomposition, executable pipeline generation, and LLM-based evaluation with visualizations to support non-expert text analytics.

citing papers explorer

Showing 6 of 6 citing papers.

IdeaBlocks: Expressing and Reusing Divergent Intents for Graphic Design Exploration using Generative AI cs.HC · 2025-07-29 · unverdicted · none · ref 76
IdeaBlocks modularizes divergent intents into Exploration Blocks with multi-level reuse options, enabling 2.13 times more images explored and 12.5% greater visual diversity than baseline in a comparative user study.
ZORO: Active Rules for Reliable Vibe Coding cs.HC · 2026-04-17 · unverdicted · none · ref 40
ZORO integrates rules directly into AI coding workflows by enriching plans, enforcing compliance with proof requirements, and evolving rules via user feedback, resulting in better rule adherence and shifts in user behavior.
On Cost-Effective LLM-as-a-Judge Improvement Techniques cs.CL · 2026-04-15 · conditional · none · ref 11
Ensemble scoring plus task-specific criteria injection raises LLM judge accuracy to 85.8 percent on RewardBench 2, a 13.5-point gain over baseline, with small models gaining the most.
Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics cs.CY · 2026-04-02 · unverdicted · none · ref 102 · 2 links
Case studies with blind UK residents and people from Kerala and Tamil Nadu demonstrate that community input at the systematization stage produces culturally grounded definitions of appropriateness for text-to-image model outputs.
Automated Self-Testing as a Quality Gate: Evidence-Driven Release Management for LLM Applications cs.SE · 2026-03-13 · unverdicted · none · ref 27
An automated self-testing framework with evidence-based quality gates for LLM application releases was evaluated in a longitudinal case study of a multi-agent conversational AI system, identifying rollback builds and supporting stable quality over four weeks.
VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents cs.CL · 2025-06-17 · unverdicted · none · ref 47
VIDEE introduces a human-in-the-loop system using Monte-Carlo Tree Search for task decomposition, executable pipeline generation, and LLM-based evaluation with visualizations to support non-expert text analytics.

Zamfirescu-Pereira, Bjoern Hartmann, Aditya Parameswaran, and Ian Arawjo

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer