Canonical reference

Efficient attention: Attention with linear complexities

Shen, Z · 2021 · arXiv 8630.2021

Canonical reference. 80% of citing Pith papers cite this work as background.

16 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 16 citing papers

citation-role summary

background 7 dataset 3

citation-polarity summary

background 8 use dataset 2

representative citing papers

Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation

cs.CV · 2026-05-11 · accept · novelty 8.0

Vision2Code is a multi-domain benchmark that evaluates image-to-code generation via rendered outputs scored by a VLM rater with dataset-specific rubrics, revealing domain-dependent model performance and enabling improvement without paired reference code.

Field-Localized Forgery Detection for Digital Identity Documents

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

FLiD is a field-localized forgery detection method for identity documents that outperforms full-document baselines and general detectors with significantly fewer parameters.

Transformer Neural Processes - Kernel Regression

cs.LG · 2024-11-19 · unverdicted · novelty 7.0

TNP-KR adds a kernel regression transformer block, kernel attention bias, scan attention for translation invariance, and deep kernel attention to achieve lower complexity and state-of-the-art results on meta-regression and related benchmarks.

PaLI: A Jointly-Scaled Multilingual Language-Image Model

cs.CV · 2022-09-14 · conditional · novelty 7.0

PaLI jointly scales a 4B-parameter vision transformer with language models on a new 10B multilingual image-text dataset to reach state-of-the-art results on vision-language tasks while keeping a simple modular design.

From Vulnerable Data Subjects to Vulnerabilizing Data Practices: Navigating the Protection Paradox in AI-Based Analyses of Platformized Lives

cs.CY · 2026-04-17 · unverdicted · novelty 6.0

The authors propose a reflexive ethics protocol for AI analyses of platform data that maps how technical choices at four pipeline stages can enact new vulnerabilities, illustrated by quantifying child presence in monetized family vlogs.

AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement Learning

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

AnomalyAgent uses tool-augmented reinforcement learning with self-reflection to generate realistic industrial anomalies, achieving better metrics than zero-shot methods on MVTec-AD.

Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification

stat.ML · 2026-04-07 · unverdicted · novelty 6.0

Ensemble-based method of moments on softmax outputs produces stable Dirichlet predictive distributions that improve uncertainty-guided tasks like selective classification over evidential deep learning.

Holi-DETR: Holistic Fashion Item Detection Leveraging Contextual Information

cs.CV · 2025-12-29 · unverdicted · novelty 6.0

Holi-DETR improves fashion item detection by integrating co-occurrence probabilities, inter-item spatial arrangements, and body keypoint relationships into the DETR architecture.

SmolVLM: Redefining small and efficient multimodal models

cs.AI · 2025-04-07 · unverdicted · novelty 6.0

SmolVLM-256M outperforms a 300-times larger model using under 1 GB GPU memory, while the 2.2B version matches state-of-the-art VLMs at half the memory cost.

Position: Age Estimation Models Do Not Process Biometric Data

cs.CY · 2026-05-17 · unverdicted · novelty 5.0

Empirical evaluation shows age estimation models perform orders of magnitude below identification thresholds on face verification benchmarks, indicating they do not extract identity-discriminative representations.

Bridging the RGB-IR Gap: Consensus and Discrepancy Modeling for Text-Guided Multispectral Detection

cs.CV · 2026-04-13 · unverdicted · novelty 5.0

A text-guided fusion method for RGB-IR object detection aligns modalities via semantic bridging and incorporates both consensus and discrepancy cues through dynamic recalibration.

Protecting and Preserving Protest Dynamics for Responsible Analysis

cs.CV · 2026-04-06 · unverdicted · novelty 5.0

A responsible computing framework substitutes real protest imagery with labeled synthetic reproductions from conditional image synthesis to enable privacy-aware analysis of collective action patterns.

A Resource-Efficient Hybrid CNN-LSTM network for image-based bean leaf disease classification

cs.CV · 2026-04-15 · unverdicted · novelty 4.0

A lightweight hybrid CNN-LSTM network classifies bean leaf diseases at 94.38% accuracy and 1.86 MB size on the ibean dataset, with reported state-of-the-art F1 scores using EfficientNet-B7+LSTM.

Trajectory Prediction for Autonomous Driving: Progress, Limitations, and Future Directions

cs.RO · 2025-03-05 · unverdicted · novelty 4.0

A survey of trajectory prediction techniques for autonomous vehicles that proposes a taxonomy, overviews the prediction pipeline, and highlights remaining research gaps.

Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding

cs.CV · 2025-08-28 · unverdicted · novelty 3.0 · 2 refs

A literature survey on abstract concept recognition in videos that catalogs prior tasks and datasets while advocating for foundation models and reuse of decades of community experience.

A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends

cs.CV · 2025-07-14 · unverdicted · novelty 3.0

A survey of MLLM-based Visually Rich Document Understanding covering feature integration techniques, training paradigms, challenges like data scarcity, and emerging trends such as RAG and agentic frameworks.

citing papers explorer

Showing 16 of 16 citing papers.

Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation cs.CV · 2026-05-11 · accept · none · ref 26
Vision2Code is a multi-domain benchmark that evaluates image-to-code generation via rendered outputs scored by a VLM rater with dataset-specific rubrics, revealing domain-dependent model performance and enabling improvement without paired reference code.
Field-Localized Forgery Detection for Digital Identity Documents cs.CV · 2026-05-09 · unverdicted · none · ref 13
FLiD is a field-localized forgery detection method for identity documents that outperforms full-document baselines and general detectors with significantly fewer parameters.
Transformer Neural Processes - Kernel Regression cs.LG · 2024-11-19 · unverdicted · none · ref 29
TNP-KR adds a kernel regression transformer block, kernel attention bias, scan attention for translation invariance, and deep kernel attention to achieve lower complexity and state-of-the-art results on meta-regression and related benchmarks.
PaLI: A Jointly-Scaled Multilingual Language-Image Model cs.CV · 2022-09-14 · conditional · none · ref 143
PaLI jointly scales a 4B-parameter vision transformer with language models on a new 10B multilingual image-text dataset to reach state-of-the-art results on vision-language tasks while keeping a simple modular design.
From Vulnerable Data Subjects to Vulnerabilizing Data Practices: Navigating the Protection Paradox in AI-Based Analyses of Platformized Lives cs.CY · 2026-04-17 · unverdicted · none · ref 6
The authors propose a reflexive ethics protocol for AI analyses of platform data that maps how technical choices at four pipeline stages can enact new vulnerabilities, illustrated by quantifying child presence in monetized family vlogs.
AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement Learning cs.CV · 2026-04-09 · unverdicted · none · ref 41
AnomalyAgent uses tool-augmented reinforcement learning with self-reflection to generate realistic industrial anomalies, achieving better metrics than zero-shot methods on MVTec-AD.
Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification stat.ML · 2026-04-07 · unverdicted · none · ref 12
Ensemble-based method of moments on softmax outputs produces stable Dirichlet predictive distributions that improve uncertainty-guided tasks like selective classification over evidential deep learning.
Holi-DETR: Holistic Fashion Item Detection Leveraging Contextual Information cs.CV · 2025-12-29 · unverdicted · none · ref 28
Holi-DETR improves fashion item detection by integrating co-occurrence probabilities, inter-item spatial arrangements, and body keypoint relationships into the DETR architecture.
SmolVLM: Redefining small and efficient multimodal models cs.AI · 2025-04-07 · unverdicted · none · ref 28
SmolVLM-256M outperforms a 300-times larger model using under 1 GB GPU memory, while the 2.2B version matches state-of-the-art VLMs at half the memory cost.
Position: Age Estimation Models Do Not Process Biometric Data cs.CY · 2026-05-17 · unverdicted · none · ref 26
Empirical evaluation shows age estimation models perform orders of magnitude below identification thresholds on face verification benchmarks, indicating they do not extract identity-discriminative representations.
Bridging the RGB-IR Gap: Consensus and Discrepancy Modeling for Text-Guided Multispectral Detection cs.CV · 2026-04-13 · unverdicted · none · ref 60
A text-guided fusion method for RGB-IR object detection aligns modalities via semantic bridging and incorporates both consensus and discrepancy cues through dynamic recalibration.
Protecting and Preserving Protest Dynamics for Responsible Analysis cs.CV · 2026-04-06 · unverdicted · none · ref 40
A responsible computing framework substitutes real protest imagery with labeled synthetic reproductions from conditional image synthesis to enable privacy-aware analysis of collective action patterns.
A Resource-Efficient Hybrid CNN-LSTM network for image-based bean leaf disease classification cs.CV · 2026-04-15 · unverdicted · none · ref 47
A lightweight hybrid CNN-LSTM network classifies bean leaf diseases at 94.38% accuracy and 1.86 MB size on the ibean dataset, with reported state-of-the-art F1 scores using EfficientNet-B7+LSTM.
Trajectory Prediction for Autonomous Driving: Progress, Limitations, and Future Directions cs.RO · 2025-03-05 · unverdicted · none · ref 118
A survey of trajectory prediction techniques for autonomous vehicles that proposes a taxonomy, overviews the prediction pipeline, and highlights remaining research gaps.
Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding cs.CV · 2025-08-28 · unverdicted · none · ref 51 · 2 links
A literature survey on abstract concept recognition in videos that catalogs prior tasks and datasets while advocating for foundation models and reuse of decades of community experience.
A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends cs.CV · 2025-07-14 · unverdicted · none · ref 41
A survey of MLLM-based Visually Rich Document Understanding covering feature integration techniques, training paradigms, challenges like data scarcity, and emerging trends such as RAG and agentic frameworks.

Efficient attention: Attention with linear complexities

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer