PolySLGen generates contextually appropriate and temporally coherent multimodal speaking and listening reactions for polyadic interactions by fusing group motion and social cues.
Lora: Low-rank adaptation of large language models.ICLR, 1(2):3
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9roles
method 1polarities
use method 1representative citing papers
SOLACE improves text-to-image generation by using intrinsic self-confidence rewards from noise reconstruction accuracy during reinforcement learning post-training without external supervision.
SeeCo is a training-free on-the-fly recalibration method using multi-view geometric consistency and adaptive textual calibration to improve open-vocabulary semantic segmentation in remote sensing images.
FIRE-CIR improves composed image retrieval accuracy on Fashion IQ by using generated visual questions for explicit attribute reasoning and candidate re-ranking instead of pure embedding similarity.
Saliency-R1 uses a novel saliency map technique and GRPO with human bounding-box overlap as reward to improve VLM reasoning faithfulness and interpretability.
Pointer-CAD unifies B-Rep geometry with command sequences via pointer-based entity selection, allowing LLMs to perform complex CAD edits while cutting topological errors from quantization.
DLED reformulates open-set face forgery detection as an uncertainty estimation task and uses dual-level spatial-frequency evidence collection to identify novel fake categories, claiming 20% average gains over baselines.
VC-Inspector introduces a lightweight open-source LMM and a controllable factual-error generation framework that achieves state-of-the-art correlation with human judgments on reference-free video caption evaluation.
DLC inserts lightweight classifier-proximal plugins into distillation-based continual learning to achieve 8% accuracy gains on large benchmarks with only 4% extra backbone parameters.
citing papers explorer
-
PolySLGen: Online Multimodal Speaking-Listening Reaction Generation in Polyadic Interaction
PolySLGen generates contextually appropriate and temporally coherent multimodal speaking and listening reactions for polyadic interactions by fusing group motion and social cues.
-
Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards
SOLACE improves text-to-image generation by using intrinsic self-confidence rewards from noise reconstruction accuracy during reinforcement learning post-training without external supervision.
-
Seeking Consensus: Geometric-Semantic On-the-Fly Recalibration for Open-Vocabulary Remote Sensing Semantic Segmentation
SeeCo is a training-free on-the-fly recalibration method using multi-view geometric consistency and adaptive textual calibration to improve open-vocabulary semantic segmentation in remote sensing images.
-
FIRE-CIR: Fine-grained Reasoning for Composed Fashion Image Retrieval
FIRE-CIR improves composed image retrieval accuracy on Fashion IQ by using generated visual questions for explicit attribute reasoning and candidate re-ranking instead of pure embedding similarity.
-
Saliency-R1: Enforcing Interpretable and Faithful Vision-language Reasoning via Saliency-map Alignment Reward
Saliency-R1 uses a novel saliency map technique and GRPO with human bounding-box overlap as reward to improve VLM reasoning faithfulness and interpretability.
-
Pointer-CAD: Unifying B-Rep and Command Sequences via Pointer-based Edges & Faces Selection
Pointer-CAD unifies B-Rep geometry with command sequences via pointer-based entity selection, allowing LLMs to perform complex CAD edits while cutting topological errors from quantization.
-
Open Set Face Forgery Detection via Dual-Level Evidence Collection
DLED reformulates open-set face forgery detection as an uncertainty estimation task and uses dual-level spatial-frequency evidence collection to identify novel fake categories, claiming 20% average gains over baselines.
-
VC-Inspector: Advancing Reference-free Evaluation of Video Captions with Factual Analysis
VC-Inspector introduces a lightweight open-source LMM and a controllable factual-error generation framework that achieves state-of-the-art correlation with human judgments on reference-free video caption evaluation.
-
Pushing the Limits of Distillation-Based Continual Learning via Classifier-Proximal Lightweight Plugins
DLC inserts lightweight classifier-proximal plugins into distillation-based continual learning to achieve 8% accuracy gains on large benchmarks with only 4% extra backbone parameters.