Title resolution pending

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever · 2021

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

ReasonAudio: A Benchmark for Evaluating Reasoning Beyond Matching in Text-Audio Retrieval

cs.AI · 2026-05-05 · unverdicted · novelty 8.0 · 2 refs

ReasonAudio benchmark reveals that state-of-the-art text-audio retrieval models struggle with reasoning tasks like negation and duration, and multimodal LLMs lose reasoning ability after contrastive fine-tuning.

Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling

cs.CR · 2026-04-14 · unverdicted · novelty 7.0

SET detects input-level backdoors in T2I diffusion models by learning a benign cross-attention response space from clean samples and flagging deviations under multi-scale perturbations.

IdeaBlocks: Expressing and Reusing Divergent Intents for Graphic Design Exploration using Generative AI

cs.HC · 2025-07-29 · unverdicted · novelty 7.0

IdeaBlocks modularizes divergent intents into Exploration Blocks with multi-level reuse options, enabling 2.13 times more images explored and 12.5% greater visual diversity than baseline in a comparative user study.

Expand More, Shrink Less: Shaping Effective-Rank Dynamics for Dense Scaling in Recommendation

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

RankElastor mitigates embedding collapse via spectrum-robust token mixing and GLU-based P-FFNs, yielding better performance and scaling on industrial recommendation datasets.

VISOR: A Vision-Language Model-based Test Oracle for Testing Robots

cs.SE · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

VISOR is a VLM-based automated test oracle that evaluates robot task correctness and quality from videos while reporting its own uncertainty, tested on GPT and Gemini across four tasks and over 1000 videos with Gemini showing higher recall and GPT higher precision but low uncertainty-correctness tie

GTPBD-MM: A Global Terraced Parcel and Boundary Dataset with Multi-Modality

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

GTPBD-MM is the first multimodal benchmark for global terraced parcel extraction, integrating image, text, and DEM data with experiments showing that textual and terrain cues improve delineation accuracy over image-only approaches.

CIR: Lightweight Container Image for Cross-Platform Deployment

cs.DC · 2026-04-12 · unverdicted · novelty 6.0

CIR is a cross-platform container image format for Python/R-style apps that defers dependency assembly to deployment, cutting image size by 95% and deployment time by 40-60% versus traditional bundled images.

FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

cs.AI · 2026-04-04

citing papers explorer

Showing 8 of 8 citing papers.

ReasonAudio: A Benchmark for Evaluating Reasoning Beyond Matching in Text-Audio Retrieval cs.AI · 2026-05-05 · unverdicted · none · ref 24 · 2 links
ReasonAudio benchmark reveals that state-of-the-art text-audio retrieval models struggle with reasoning tasks like negation and duration, and multimodal LLMs lose reasoning ability after contrastive fine-tuning.
Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling cs.CR · 2026-04-14 · unverdicted · none · ref 21
SET detects input-level backdoors in T2I diffusion models by learning a benign cross-attention response space from clean samples and flagging deviations under multi-scale perturbations.
IdeaBlocks: Expressing and Reusing Divergent Intents for Graphic Design Exploration using Generative AI cs.HC · 2025-07-29 · unverdicted · none · ref 50
IdeaBlocks modularizes divergent intents into Exploration Blocks with multi-level reuse options, enabling 2.13 times more images explored and 12.5% greater visual diversity than baseline in a comparative user study.
Expand More, Shrink Less: Shaping Effective-Rank Dynamics for Dense Scaling in Recommendation cs.LG · 2026-05-22 · unverdicted · none · ref 26
RankElastor mitigates embedding collapse via spectrum-robust token mixing and GLU-based P-FFNs, yielding better performance and scaling on industrial recommendation datasets.
VISOR: A Vision-Language Model-based Test Oracle for Testing Robots cs.SE · 2026-05-11 · unverdicted · none · ref 47 · 2 links
VISOR is a VLM-based automated test oracle that evaluates robot task correctness and quality from videos while reporting its own uncertainty, tested on GPT and Gemini across four tasks and over 1000 videos with Gemini showing higher recall and GPT higher precision but low uncertainty-correctness tie
GTPBD-MM: A Global Terraced Parcel and Boundary Dataset with Multi-Modality cs.CV · 2026-04-14 · unverdicted · none · ref 30
GTPBD-MM is the first multimodal benchmark for global terraced parcel extraction, integrating image, text, and DEM data with experiments showing that textual and terrain cues improve delineation accuracy over image-only approaches.
CIR: Lightweight Container Image for Cross-Platform Deployment cs.DC · 2026-04-12 · unverdicted · none · ref 41
CIR is a cross-platform container image format for Python/R-style apps that defers dependency assembly to deployment, cutting image size by 95% and deployment time by 40-60% versus traditional bundled images.
FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning cs.AI · 2026-04-04 · unreviewed · ref 38

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer