super hub Canonical reference

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

doi: 10 · 2023 · arXiv 2729.2023

Canonical reference. 76% of citing Pith papers cite this work as background.

200 Pith papers citing it

Background 76% of classified citations

read on arXiv browse 200 citing papers more from doi: 10

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 36 baseline 7 method 4 dataset 2

citation-polarity summary

background 37 baseline 7 use method 4 use dataset 1

authors

doi: 10

co-cited works

representative citing papers

WildBox: A Dataset and Benchmark for Aerial Monocular 3D Detection of African Savanna Wildlife

cs.CV · 2026-06-19 · unverdicted · novelty 8.0 · 2 refs

WildBox provides over 237k 3D wildlife annotations from drone video and benchmarks reveal zero-shot 3D detection at 0 AP but fine-tuned performance of 8.68 AP-BEV and 13.17 AP3D, with depth estimation causing most errors.

Vision-language models for chest radiography do not always need the image

cs.CV · 2026-06-16 · accept · novelty 8.0

A causal audit with image interventions shows text-only models reach within 5.7 accuracy points of top multimodal VLMs on chest radiography, with some large multimodal models statistically indistinguishable from small text-only baselines.

MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons

cs.CV · 2026-04-30 · unverdicted · novelty 8.0

MoCapAnything V2 presents the first end-to-end learnable Video-to-Pose and Pose-to-Rotation framework for monocular arbitrary-skeleton motion capture by conditioning on a reference pose-rotation pair.

SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE

cs.CV · 2026-06-30 · unverdicted · novelty 7.0 · 2 refs

SpheRoPE modifies rotary position embeddings in diffusion transformers to enforce spherical topology for zero-shot 360 panorama generation across multiple backbones.

RESOLVE: A Multi-Resolution and Multi-Modal Dataset for Roadside Cooperative Perception

cs.CV · 2026-06-30 · accept · novelty 7.0

RESOLVE provides a controlled multi-resolution LiDAR and camera benchmark for evaluating 3D detection and tracking under point sparsity variations in roadside cooperative perception.

Think While You Map: Asynchronous Vision-Language Agents for Incremental 3D Scene Graphs

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

An asynchronous architecture decouples incremental voxel-based mapping from VLM-based semantic enrichment to produce queryable open-vocabulary 3D scene graphs that match or exceed prior methods on segmentation and grounding benchmarks.

Learning to Deny: Action Denial in Multimodal Large Language Models

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

MLLMs drop from over 85% accuracy on action presence to under 50% on matched action-denial videos, exposing a causal verification gap that causal graph prompts partially close.

Diffusion-Based Material Regularization for Physics-Based Inverse Rendering

cs.CV · 2026-06-30 · unverdicted · novelty 7.0 · 2 refs

A regularization technique that treats diffusion model outputs as a similarity kernel during material optimization in inverse rendering, enabling joint reconstruction of geometry, materials, and illumination that satisfies the rendering equation and generalizes to new lighting.

Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction

cs.CV · 2026-06-28 · unverdicted · novelty 7.0

Introduces VG-GUIBench benchmark and TASKER keyframe extraction algorithm that improves performance on VideoQA and video-guided agentic tasks.

MIRAGE: Protecting against Malicious Image Editing via False Moderation

cs.CR · 2026-06-24 · unverdicted · novelty 7.0 · 2 refs

MIRAGE immunizes images by crafting perturbations that align them with policy-violating concepts in open-source moderation models, triggering refusals in closed-source commercial image editors at over 88% success rate.

FUTO Swipe: Layout-Agnostic Neural Swipe Decoding

cs.HC · 2026-06-24 · unverdicted · novelty 7.0

Neural swipe decoder trained with geometric augmentations on 1M+ swipes generalizes to unseen keyboard layouts by predicting per-point character locations and mapping via inference-time layout.

MATCH: Flow Matching for Multi-View Anomaly Detection

cs.CV · 2026-06-23 · unverdicted · novelty 7.0

MATCH is the first flow matching method for multi-view anomaly detection, reporting SOTA results on Real-IAD and the first comprehensive evaluation on MANTA-Tiny while enabling real-time use by omitting the divergence term.

Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation

cs.CV · 2026-06-22 · unverdicted · novelty 7.0

Arbor attaches constraint mesh tokens to a frozen text-to-3D denoiser to enable controllable generation obeying hull, avoidance, and touch constraints.

4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking

cs.CV · 2026-06-21 · conditional · novelty 7.0 · 2 refs

The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.

TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation

cs.CV · 2026-06-10 · unverdicted · novelty 7.0

A two-stage generative model (Graph CVAE + flow matching) learns topology-agnostic motion codes from a new 5k-topology dataset and retargets video motion to arbitrary unseen skeletons.

SpikeTAD: Spiking Neural Networks for End-to-End Temporal Action Detection

cs.CV · 2026-06-10 · unverdicted · novelty 7.0

SpikeTAD proposes the first SNN-based end-to-end TAD model, reporting 67.2% mAP on THUMOS14 and 37.42% on ActivityNet-1.3 with extremely low power consumption.

Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning

cs.CV · 2026-06-08 · unverdicted · novelty 7.0

FisherAdapTune uses temporal drift in Fisher geometry, measured by scale-invariant Jensen-Shannon distance, to progressively freeze stabilized parameter groups during fine-tuning, reporting gains on segmentation and zero-shot transfer.

Mind the Gap: Disentangling Performance Bottlenecks in Video Instance Segmentation

cs.CV · 2026-06-05 · unverdicted · novelty 7.0 · 2 refs

An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.

WHU-Infra3D: A Full-stack Multi-modal Dataset and Benchmark for 3D Roadside Infrastructure Inventory

cs.CV · 2026-06-03 · unverdicted · novelty 7.0

WHU-Infra3D is a new large-scale multi-modal dataset and benchmark for 3D roadside infrastructure inventory, providing over 175k 2D boxes, thousands of 3D instances, and 181k annotations across five core tasks while exposing cross-city gaps and long-tailed defect vulnerabilities.

Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior

cs.CV · 2026-06-01 · unverdicted · novelty 7.0

DivIn samples initial noise from a guidance potential posterior via Langevin dynamics to improve diversity in class-to-image and text-to-image generation.

GLENS: Global Search via Learning from Solver Iterates with Diffusion Models

cs.LG · 2026-05-29 · unverdicted · novelty 7.0

GLENS uses diffusion models on solver iterates to generate high-quality and diverse initial guesses for multimodal non-convex optimization, leading to faster solver convergence.

ClothTransformer: Unified Latent-Space Transformers for Scalable Cloth Simulation

cs.GR · 2026-05-27 · unverdicted · novelty 7.0

ClothTransformer is a unified latent-space Transformer for cloth simulation that handles body-driven garments, robotic manipulation, and free-fall collisions in one model with 4-9x lower error than prior methods and mesh-resolution independence.

RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

RS2AD-LiDAR reconstructs vehicle LiDAR data from roadside observations via coordinate transformation, virtual LiDAR modeling and resampling, claimed as the first such method, with experiments showing improved object detection when mixed with real data.

3D LULC classification using multispectral LiDAR and deep learning: current and prospective schemes

cs.CV · 2026-05-21 · conditional · novelty 7.0

Introduces NMCA-aligned L1/L2 LULC schemes and the Loosdorf-MSL benchmark dataset, with Point Transformer V3 reaching 79.4% mIoU on 8 classes and 58.9% on 20 classes, plus gains from multispectral inputs.

citing papers explorer

Showing 50 of 200 citing papers.

WildBox: A Dataset and Benchmark for Aerial Monocular 3D Detection of African Savanna Wildlife cs.CV · 2026-06-19 · unverdicted · none · ref 9 · 2 links
WildBox provides over 237k 3D wildlife annotations from drone video and benchmarks reveal zero-shot 3D detection at 0 AP but fine-tuned performance of 8.68 AP-BEV and 13.17 AP3D, with depth estimation causing most errors.
Vision-language models for chest radiography do not always need the image cs.CV · 2026-06-16 · accept · none · ref 42
A causal audit with image interventions shows text-only models reach within 5.7 accuracy points of top multimodal VLMs on chest radiography, with some large multimodal models statistically indistinguishable from small text-only baselines.
MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons cs.CV · 2026-04-30 · unverdicted · none · ref 3
MoCapAnything V2 presents the first end-to-end learnable Video-to-Pose and Pose-to-Rotation framework for monocular arbitrary-skeleton motion capture by conditioning on a reference pose-rotation pair.
SpheRoPE: Zero-Shot Optimization-Free 360 Panorama Generation with Spherical RoPE cs.CV · 2026-06-30 · unverdicted · none · ref 42 · 2 links
SpheRoPE modifies rotary position embeddings in diffusion transformers to enforce spherical topology for zero-shot 360 panorama generation across multiple backbones.
RESOLVE: A Multi-Resolution and Multi-Modal Dataset for Roadside Cooperative Perception cs.CV · 2026-06-30 · accept · none · ref 21
RESOLVE provides a controlled multi-resolution LiDAR and camera benchmark for evaluating 3D detection and tracking under point sparsity variations in roadside cooperative perception.
Think While You Map: Asynchronous Vision-Language Agents for Incremental 3D Scene Graphs cs.CV · 2026-06-30 · unverdicted · none · ref 48
An asynchronous architecture decouples incremental voxel-based mapping from VLM-based semantic enrichment to produce queryable open-vocabulary 3D scene graphs that match or exceed prior methods on segmentation and grounding benchmarks.
Learning to Deny: Action Denial in Multimodal Large Language Models cs.CV · 2026-06-30 · unverdicted · none · ref 61
MLLMs drop from over 85% accuracy on action presence to under 50% on matched action-denial videos, exposing a causal verification gap that causal graph prompts partially close.
Diffusion-Based Material Regularization for Physics-Based Inverse Rendering cs.CV · 2026-06-30 · unverdicted · none · ref 12 · 2 links
A regularization technique that treats diffusion model outputs as a similarity kernel during material optimization in inverse rendering, enabling joint reconstruction of geometry, materials, and illumination that satisfies the rendering equation and generalizes to new lighting.
Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction cs.CV · 2026-06-28 · unverdicted · none · ref 15
Introduces VG-GUIBench benchmark and TASKER keyframe extraction algorithm that improves performance on VideoQA and video-guided agentic tasks.
MIRAGE: Protecting against Malicious Image Editing via False Moderation cs.CR · 2026-06-24 · unverdicted · none · ref 15 · 2 links
MIRAGE immunizes images by crafting perturbations that align them with policy-violating concepts in open-source moderation models, triggering refusals in closed-source commercial image editors at over 88% success rate.
FUTO Swipe: Layout-Agnostic Neural Swipe Decoding cs.HC · 2026-06-24 · unverdicted · none · ref 23
Neural swipe decoder trained with geometric augmentations on 1M+ swipes generalizes to unseen keyboard layouts by predicting per-point character locations and mapping via inference-time layout.
MATCH: Flow Matching for Multi-View Anomaly Detection cs.CV · 2026-06-23 · unverdicted · none · ref 39
MATCH is the first flow matching method for multi-view anomaly detection, reporting SOTA results on Real-IAD and the first comprehensive evaluation on MANTA-Tiny while enabling real-time use by omitting the divergence term.
Arbor: Explicit Geometric Conditioning for Controllable 3D Asset Generation cs.CV · 2026-06-22 · unverdicted · none · ref 21
Arbor attaches constraint mesh tokens to a frozen text-to-3D denoiser to enable controllable generation obeying hull, avoidance, and touch constraints.
4DVLT: Dynamic Scene Understanding with Worldline-Centered Vision-Language Tracking cs.CV · 2026-06-21 · conditional · none · ref 9 · 2 links
The paper defines the 4DVLT task for worldline-centered 4D scene understanding, releases Instruct-4D with 129.4K QA pairs, and presents 4DTrack achieving 62.68 TGA_Top1, outperforming adapted baselines by 19.62 points.
TopoCap: Learning Topology-Agnostic Motion Priors for Monocular Video-to-Animation cs.CV · 2026-06-10 · unverdicted · none · ref 52
A two-stage generative model (Graph CVAE + flow matching) learns topology-agnostic motion codes from a new 5k-topology dataset and retargets video motion to arbitrary unseen skeletons.
SpikeTAD: Spiking Neural Networks for End-to-End Temporal Action Detection cs.CV · 2026-06-10 · unverdicted · none · ref 18
SpikeTAD proposes the first SNN-based end-to-end TAD model, reporting 67.2% mAP on THUMOS14 and 37.42% on ActivityNet-1.3 with extremely low power consumption.
Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning cs.CV · 2026-06-08 · unverdicted · none · ref 24
FisherAdapTune uses temporal drift in Fisher geometry, measured by scale-invariant Jensen-Shannon distance, to progressively freeze stabilized parameter groups during fine-tuning, reporting gains on segmentation and zero-shot transfer.
Mind the Gap: Disentangling Performance Bottlenecks in Video Instance Segmentation cs.CV · 2026-06-05 · unverdicted · none · ref 8 · 2 links
An ILP-based oracle applied to seven VIS methods on YouTube-VIS and OVIS shows tracking instability as the dominant bottleneck, producing gaps exceeding 20 AP under occlusion while classification impact is secondary.
WHU-Infra3D: A Full-stack Multi-modal Dataset and Benchmark for 3D Roadside Infrastructure Inventory cs.CV · 2026-06-03 · unverdicted · none · ref 23
WHU-Infra3D is a new large-scale multi-modal dataset and benchmark for 3D roadside infrastructure inventory, providing over 175k 2D boxes, thousands of 3D instances, and 181k annotations across five core tasks while exposing cross-city gaps and long-tailed defect vulnerabilities.
Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior cs.CV · 2026-06-01 · unverdicted · none · ref 28
DivIn samples initial noise from a guidance potential posterior via Langevin dynamics to improve diversity in class-to-image and text-to-image generation.
GLENS: Global Search via Learning from Solver Iterates with Diffusion Models cs.LG · 2026-05-29 · unverdicted · none · ref 58
GLENS uses diffusion models on solver iterates to generate high-quality and diverse initial guesses for multimodal non-convex optimization, leading to faster solver convergence.
ClothTransformer: Unified Latent-Space Transformers for Scalable Cloth Simulation cs.GR · 2026-05-27 · unverdicted · none · ref 9
ClothTransformer is a unified latent-space Transformer for cloth simulation that handles body-driven garments, robotic manipulation, and free-fall collisions in one model with 4-9x lower error than prior methods and mesh-resolution independence.
RS2AD-LiDAR: End-to-End Autonomous Driving LiDAR Data Generation from Roadside Sensor Observations cs.CV · 2026-05-22 · unverdicted · none · ref 17
RS2AD-LiDAR reconstructs vehicle LiDAR data from roadside observations via coordinate transformation, virtual LiDAR modeling and resampling, claimed as the first such method, with experiments showing improved object detection when mixed with real data.
3D LULC classification using multispectral LiDAR and deep learning: current and prospective schemes cs.CV · 2026-05-21 · conditional · none · ref 31
Introduces NMCA-aligned L1/L2 LULC schemes and the Loosdorf-MSL benchmark dataset, with Point Transformer V3 reaching 79.4% mIoU on 8 classes and 58.9% on 20 classes, plus gains from multispectral inputs.
AgroVG: A Large-Scale Multi-Source Benchmark for Agricultural Visual Grounding cs.CV · 2026-05-21 · accept · none · ref 7
AgroVG is a new multi-source benchmark for agricultural visual grounding formulated as generalized set prediction, with protocols for box and mask grounding across single-target, multi-target, and target-absent queries from six object families.
iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance cs.CV · 2026-05-20 · unverdicted · none · ref 4
iTryOn is a diffusion-based framework that adds spatial 3D hand guidance and semantic action-aware embeddings to handle complex garment deformations during human-clothing interactions in videos.
Oracle Supervision Transfers for Hyperparameter Prediction in Model-Based Image Denoising cs.CV · 2026-05-19 · conditional · none · ref 22
HyperDn is a configuration-conditioned predictor that transfers oracle supervision across denoising paradigms to achieve near-oracle hyperparameter prediction with few or zero target labels.
Preferences Order, Ratings Anchor: From Fused Expert Aesthetic Ground Truth to Self-Distillation cs.CV · 2026-05-19 · conditional · none · ref 10 · 4 links
PPaint fuses expert pairwise preferences and ratings into ground truth; PSDistill converts VLM pairwise judgments into calibrated pseudo-scores via Elo and trains the same VLM to produce a single-pass aesthetic scorer that improves SRCC across categories.
LMM-Track4D: Eliciting 4D Dynamic Reasoning in LMMs via Trajectory-Grounded Dialogue cs.CV · 2026-05-19 · unverdicted · none · ref 26
LMM-Track4D formulates a trajectory-grounded dialogue task, releases Track4D-Bench with 526 samples, and proposes RTGE encoding, TRK state token, and OSK-RA decoder to elicit better 4D spatiotemporal reasoning in LMMs.
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media cs.CL · 2026-05-16 · unverdicted · none · ref 102 · 4 links
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
Martingale-Consistent Self-Supervised Learning cs.LG · 2026-05-12 · unverdicted · none · ref 2
The paper develops a martingale-consistent SSL framework enforcing expected coherence between coarse and refined predictions via new objectives and a Monte Carlo estimator, improving robustness under partial observations.
Geometrically Approximated Modeling for Emitter-Centric Ray-Triangle Filtering in Arbitrarily Dynamic LiDAR Simulation cs.GR · 2026-05-11 · unverdicted · none · ref 47
GRCA uses emitter-centric geometric culling of rays per triangle to accelerate LiDAR simulation in arbitrarily dynamic scenes, reporting up to 14.55x speedup over Embree and 7.97x over OptiX.
AnomalyClaw: A Universal Visual Anomaly Detection Agent via Tool-Grounded Refutation cs.CV · 2026-05-11 · conditional · none · ref 8
AnomalyClaw turns single-step VLM anomaly judgments into a multi-round tool-grounded refutation process, delivering consistent macro-AUROC gains of 3.5-7.9 percentage points over direct inference across 12 cross-domain datasets.
Field-Localized Forgery Detection for Digital Identity Documents cs.CV · 2026-05-09 · unverdicted · none · ref 10 · 2 links
FLiD is a field-localized forgery detection method for identity documents that outperforms full-document baselines and general detectors with significantly fewer parameters.
ProtoSSL: Interpretable Prototype Learning from Unlabeled Time-Series Data cs.LG · 2026-05-07 · unverdicted · none · ref 30
ProtoSSL discovers generalizable prototypes from unlabeled time-series via self-supervision and assigns them to new tasks for interpretable predictions, outperforming supervised baselines in low-data regimes on ECG datasets.
Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting cs.CV · 2026-05-04 · unverdicted · none · ref 54
Text-guided class-agnostic counting models exhibit significant weaknesses in grounding textual prompts to visual objects, as demonstrated by new negative-label and distractor tests on a multi-category dataset.
ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue cs.RO · 2026-05-02 · unverdicted · none · ref 42
ESARBench is the first unified benchmark for MLLM-driven UAV agents that must explore, locate clues, and decide on victim positions in photorealistic simulated SAR environments.
LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction cs.IR · 2026-04-21 · unverdicted · none · ref 27
LoopCTR trains CTR models with recursive layer reuse and process supervision so that zero-loop inference outperforms baselines on public and industrial datasets.
Divide-and-Conquer Approach to Holistic Cognition in High-Similarity Contexts with Limited Data cs.CV · 2026-04-21 · unverdicted · none · ref 6 · 2 links
DHCNet improves ultra-fine-grained visual categorization by progressively building holistic cognition from local discrepancies using self-shuffling and refinement on limited data.
BasketHAR: A Multimodal Dataset for Human Activity Recognition and Sport Analysis in Basketball Training Scenarios cs.CV · 2026-04-18 · conditional · none · ref 7
BasketHAR is a publicly released multimodal dataset of professional basketball training activities captured with inertial sensors, physiological signals, and video, accompanied by a baseline alignment method.
Efficient Video Diffusion Models: Advancements and Challenges cs.CV · 2026-04-17 · unverdicted · none · ref 186
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
DiV-INR: Extreme Low-Bitrate Diffusion Video Compression with INR Conditioning eess.IV · 2026-04-09 · unverdicted · none · ref 13
DiV-INR integrates implicit neural representations as conditioning signals for diffusion models to achieve better perceptual quality than HEVC, VVC, and prior neural codecs at extremely low bitrates under 0.05 bpp.
On the Decompositionality of Neural Networks cs.LO · 2026-04-09 · unverdicted · none · ref 14
Neural decompositionality is defined via decision-boundary semantic preservation, and language transformers largely satisfy it under SAVED while vision models often do not.
Revealing Physical-World Semantic Vulnerabilities: Universal Adversarial Patches for Infrared Vision-Language Models cs.CV · 2026-04-03 · unverdicted · none · ref 41
UCGP is a universal physical adversarial patch that compromises cross-modal semantic alignment in IR-VLMs through curved-grid parameterization and representation-space disruption.
LumaFlux: Lifting 8-Bit Worlds to HDR Reality with Physically-Guided Diffusion Transformers cs.CV · 2026-04-03 · unverdicted · none · ref 3
LumaFlux is a physically and perceptually guided diffusion transformer for SDR-to-HDR conversion that introduces PGA, PCM, and HDR Residual Coupler modules plus a new training corpus and benchmark, outperforming prior ITM methods.
CORP: Closed-Form One-shot Representation-Preserving Structured Pruning for Transformers cs.LG · 2026-02-05 · unverdicted · none · ref 8
CORP performs one-shot structured pruning of Transformers by modeling removed components as affine functions of retained ones and solving closed-form ridge regressions on calibration data to fold compensation into weights, retaining 83.27% Top-1 accuracy on DeiT-Huge after 50% pruning.
Learning to Build Shapes by Extrusion cs.GR · 2026-01-30 · unverdicted · none · ref 18
Text Encoded Extrusions (TEE) lets LLMs generate and edit manifold 3D meshes by learning sequences of face extrusions from decomposed quadrilateral meshes.
FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing cs.CV · 2025-06-26 · unverdicted · none · ref 12
FaSTA* combines LLM fast planning with A* search and inductive subroutine mining to create an efficient agent for multi-turn image editing tasks.
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems cs.MA · 2025-06-05 · accept · none · ref 60
A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.
High Volume Rate 3D Ultrasound Reconstruction with Diffusion Models eess.IV · 2025-05-28 · unverdicted · none · ref 41 · 2 links
Diffusion models reconstruct high-resolution 3D cardiac ultrasound volumes from heavily undersampled elevation planes and outperform traditional interpolation and supervised deep learning baselines.

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

hub tools

citation-role summary

citation-polarity summary

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer