AnchorSeg uses ordered query banks of latent reasoning tokens plus a spatial anchor token and a Token-Mask Cycle Consistency loss to achieve 67.7% gIoU and 68.1% cIoU on the ReasonSeg benchmark.
NeurIPS , year=
6 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 6representative citing papers
ESLD extracts safety signals directly from the latent space of any guard model to enable faster and more accurate prompt-injection detection without retraining.
Distinguishable Deletion unifies knowledge erasure and refusal for LLM unlearning via an energy index that enforces boundaries during training and enables refusal at inference.
Gaze Attention groups visual embeddings into selectable regions and dynamically restricts attention to task-relevant ones, matching dense baselines with up to 90% fewer visual KV entries via added context tokens.
VPiT enables pretrained LLMs to perform both visual understanding and generation by predicting discrete text tokens and continuous visual tokens, with understanding data proving more effective than generation-specific data.
EMMA is an end-to-end multimodal LLM that converts camera data into trajectories, objects, and road graphs via text prompts and reports state-of-the-art motion planning on nuScenes plus competitive detection results on Waymo.
citing papers explorer
-
AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation
AnchorSeg uses ordered query banks of latent reasoning tokens plus a spatial anchor token and a Token-Mask Cycle Consistency loss to achieve 67.7% gIoU and 68.1% cIoU on the ReasonSeg benchmark.
-
ESLD (External Surrogate Latent Defense): A Latent-Space Architecture for Faster, Stronger Prompt-Injection Defense
ESLD extracts safety signals directly from the latent space of any guard model to enable faster and more accurate prompt-injection detection without retraining.
-
Distinguishable Deletion: Unifying Knowledge Erasure and Refusal for Large Language Model Unlearning
Distinguishable Deletion unifies knowledge erasure and refusal for LLM unlearning via an energy index that enforces boundaries during training and enables refusal at inference.
-
Learning to See What You Need: Gaze Attention for Multimodal Large Language Models
Gaze Attention groups visual embeddings into selectable regions and dynamically restricts attention to task-relevant ones, matching dense baselines with up to 90% fewer visual KV entries via added context tokens.
-
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
VPiT enables pretrained LLMs to perform both visual understanding and generation by predicting discrete text tokens and continuous visual tokens, with understanding data proving more effective than generation-specific data.
-
EMMA: End-to-End Multimodal Model for Autonomous Driving
EMMA is an end-to-end multimodal LLM that converts camera data into trajectories, objects, and road graphs via text prompts and reports state-of-the-art motion planning on nuScenes plus competitive detection results on Waymo.