hub

arXiv preprint arXiv:2307.09009 , year=

Lingjiao Chen, Matei Zaharia, James Zou · 2023 · arXiv 2307.09009

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

dataset 2 background 1

citation-polarity summary

use dataset 2 background 1

representative citing papers

VISTA: Video Interaction Spatio-Temporal Analysis Benchmark

cs.CV · 2026-05-02 · unverdicted · novelty 8.0

VISTA is the first large-scale interaction-aware benchmark that decomposes videos into entities, actions, and relations to diagnose spatio-temporal biases in vision-language models.

Guidelines for Empirical Studies in Software Engineering involving Large Language Models

cs.SE · 2025-08-21 · accept · novelty 7.0 · 2 refs

The paper delivers a taxonomy of seven LLM study types in software engineering along with eight guidelines that separate mandatory requirements from recommended practices to address reproducibility challenges.

The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

cs.CL · 2024-06-06 · accept · novelty 7.0

This systematic survey organizes prompt engineering into a taxonomy of 58 LLM techniques and 40 others, supplies a shared vocabulary, and offers guidelines for state-of-the-art models.

The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference

cs.LG · 2026-04-16 · unverdicted · novelty 6.0

FP16 KV caching in transformers causes deterministic token divergence versus cache-free inference due to non-associative floating-point accumulation orderings.

AgentSPEX: An Agent SPecification and EXecution Language

cs.CL · 2026-04-14 · unverdicted · novelty 6.0

AgentSPEX is a new language and harness for explicitly specifying and running structured LLM-agent workflows with typed steps, control flow, parallel execution, and a visual editor.

Analyzing the Presentation, Content, and Utilization of References in LLM-powered Conversational AI Systems

cs.HC · 2026-03-06 · unverdicted · novelty 6.0

LLM chat systems show large differences in reference quantity and quality, but users rarely click or engage with them.

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

cs.CL · 2023-10-17 · unverdicted · novelty 6.0

Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.

GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

cs.AI · 2023-09-19 · unverdicted · novelty 6.0

GPTFuzz is a black-box fuzzing framework that mutates seed jailbreak templates to automatically generate effective attacks, achieving over 90% success rates on models including ChatGPT and Llama-2.

From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work

cs.AI · 2026-05-07 · conditional · novelty 5.0

Execution lineage models AI-native work as a DAG of computations with explicit dependencies, achieving perfect state preservation in controlled update tasks where loop-based agents introduce churn and contamination.

A Non-Destructive Methodological Framework for Modernizing Legacy Clinical Reporting Systems for AI-Driven Pharmacoinformatics: A SAS Case Study

cs.SE · 2026-05-13 · unverdicted · novelty 4.0

A metadata framework modernizes legacy SAS clinical reporting for AI by adding a non-destructive wrapper layer, achieving 92% code reduction on consolidation and high report parity in validations.

citing papers explorer

Showing 10 of 10 citing papers.

VISTA: Video Interaction Spatio-Temporal Analysis Benchmark cs.CV · 2026-05-02 · unverdicted · none · ref 14
VISTA is the first large-scale interaction-aware benchmark that decomposes videos into entities, actions, and relations to diagnose spatio-temporal biases in vision-language models.
Guidelines for Empirical Studies in Software Engineering involving Large Language Models cs.SE · 2025-08-21 · accept · none · ref 22 · 2 links
The paper delivers a taxonomy of seven LLM study types in software engineering along with eight guidelines that separate mandatory requirements from recommended practices to address reproducibility challenges.
The Prompt Report: A Systematic Survey of Prompt Engineering Techniques cs.CL · 2024-06-06 · accept · none · ref 6
This systematic survey organizes prompt engineering into a taxonomy of 58 LLM techniques and 40 others, supplies a shared vocabulary, and offers guidelines for state-of-the-art models.
The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference cs.LG · 2026-04-16 · unverdicted · none · ref 4
FP16 KV caching in transformers causes deterministic token divergence versus cache-free inference due to non-associative floating-point accumulation orderings.
AgentSPEX: An Agent SPecification and EXecution Language cs.CL · 2026-04-14 · unverdicted · none · ref 1
AgentSPEX is a new language and harness for explicitly specifying and running structured LLM-agent workflows with typed steps, control flow, parallel execution, and a visual editor.
Analyzing the Presentation, Content, and Utilization of References in LLM-powered Conversational AI Systems cs.HC · 2026-03-06 · unverdicted · none · ref 11
LLM chat systems show large differences in reference quantity and quality, but users rarely click or engage with them.
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection cs.CL · 2023-10-17 · unverdicted · none · ref 156
Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts cs.AI · 2023-09-19 · unverdicted · none · ref 11
GPTFuzz is a black-box fuzzing framework that mutates seed jailbreak templates to automatically generate effective attacks, achieving over 90% success rates on models including ChatGPT and Llama-2.
From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work cs.AI · 2026-05-07 · conditional · none · ref 37
Execution lineage models AI-native work as a DAG of computations with explicit dependencies, achieving perfect state preservation in controlled update tasks where loop-based agents introduce churn and contamination.
A Non-Destructive Methodological Framework for Modernizing Legacy Clinical Reporting Systems for AI-Driven Pharmacoinformatics: A SAS Case Study cs.SE · 2026-05-13 · unverdicted · none · ref 22
A metadata framework modernizes legacy SAS clinical reporting for AI by adding a non-destructive wrapper layer, achieving 92% code reduction on consolidation and high report parity in validations.

arXiv preprint arXiv:2307.09009 , year=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer