ViperGPT generates executable Python code to compose pre-trained vision-and-language modules into programs that answer visual queries, reaching state-of-the-art results with no additional training.
Selvaraju and Abhishek Das and Ramakrishna Vedantam and Michael Cogswell and Devi Parikh and Dhruv Batra , title =
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
Inpainting auxiliary task improves clustering of embeddings for individual zebrafish identification based on skin patterns.
E-PCN reaches 94.67% macro-accuracy on 10-class jet tagging by weighting graphs with angular separation, transverse momentum, momentum fraction, and invariant mass, with Grad-CAM showing the first two account for 76% of decisions and yielding gains over baseline PCN.
citing papers explorer
-
ViperGPT: Visual Inference via Python Execution for Reasoning
ViperGPT generates executable Python code to compose pre-trained vision-and-language modules into programs that answer visual queries, reaching state-of-the-art results with no additional training.
-
Vision Transformers Need Registers
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
-
Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification
Inpainting auxiliary task improves clustering of embeddings for individual zebrafish identification based on skin patterns.
-
E-PCN: Jet Tagging with Explainable Particle Chebyshev Networks Using Kinematic Features
E-PCN reaches 94.67% macro-accuracy on 10-class jet tagging by weighting graphs with angular separation, transverse momentum, momentum fraction, and invariant mass, with Grad-CAM showing the first two account for 76% of decisions and yielding gains over baseline PCN.