Pairwise scoring signals in Vision Transformer token reduction are inherently unstable due to high perturbation counts and degrade in deep layers, causing collapse, while unary signals with triage enable CATIS to retain 96.9% accuracy at 63% FLOPs reduction on ViT-Large ImageNet-1K.
hub
" * write output.state after.block = add.period write newline
78 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- method It compares the model's prediction for the entire trajec- tory against the ground-truth trajectory label, Ltraj(s, a): Ltraj = LBCE Rϕ(s, a | x), Rtraj (12) where σ(·) denotes the sigmoid function, which converts the model's raw logit outputs into probabilities. LBCE(·, ·) denotes the BCE loss function. For a ground-truth label L ∈ { 0, 1} and a model logit output Rϕ, it is defined as LBCE(Rϕ, L) = −[L log σ(Rϕ) + (1− L) log(1 − σ(Rϕ))], By jointly optimizing this objective, Fin-PRM is train
co-cited works
roles
method 1polarities
use method 1representative citing papers
DTDR dynamically retrieves relevant tools by modeling dependencies from demonstrations and conditioning on the evolving agent plan, improving function calling success rates by 23-104% over static retrievers across benchmarks.
An incremental rank-lifting algorithm updates winning regions and policies in data-driven stochastic game abstractions by exploiting monotonic growth of under-approximations and shrinkage of over-approximations.
UWE is a task-agnostic bi-encoder that uses many-to-many InfoNCE and token-level soft late interaction to achieve zero-shot ranking across unseen work-related target spaces while using far fewer parameters than Qwen3-8B and improving MAP by 4.4 points.
MAS-Bench introduces 139 tasks, 88 predefined shortcuts, and 9 metrics to evaluate hybrid GUI-shortcut mobile agents, reporting up to 68.3% success and 39% efficiency gains over GUI-only baselines.
MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.
VoteGCL augments graph-based recommendation systems with high-confidence synthetic interactions generated via majority-voting LLM reranks and integrates them into graph contrastive learning to improve accuracy and reduce popularity bias.
OSPO trains optimal order dispatch policies for homogeneous AV fleets using only one-step group rewards, outperforming GRPO on a real ride-hailing dataset.
TRAM is a test-time mixture method that scores and composes risk-neutral source policies using reward and occupancy-based risk to achieve new reward-risk tradeoffs without parameter updates.
Counterspeech reduces the likelihood that hate-speech-using newcomers continue posting in hate subreddits, though toxic counterspeech raises the chance of continued hostility in the thread.
MDSE attack uses dynamic multi-surrogate gradient estimation to create adversarial examples that simultaneously fool SNNs, ViTs, and CNNs, with reported gains up to 91.4% on ensembles and 3x on adversarially trained SNNs versus Auto-PGD.
Stronger reasoning models in LLMs reduce behavioral negotiation by defaulting to authority outcomes in multi-agent settings, unlike structured scaffolds that enable concessions.
Introduces a four-axis difficulty taxonomy integrated into an enterprise RAG benchmark to systematically diagnose multi-dimensional challenges like reasoning complexity and retrieval difficulty.
TeCQR retrieves related questions in cQA by generating tag-enhanced clarifying questions, using noise-tolerant semantic matching, and two-stage training to learn fine-grained representations of queries, questions, and tags.
STORM enables robust reference-conditioned 6D pose tracking from one image via hierarchical spatial fusion attention and a BCE-trained verifier that detects drift for automatic re-initialization.
Routing architecture for MLLMs enables continual learning with constant compute, matching multi-task learning performance and supporting cross-modal transfer.
Quantization of VLMs improves multiple reliability metrics beyond accuracy by damping high-rank spectral components and promoting reliance on robust low-rank features.
PeCL applies token-level dynamic differential privacy and privacy-guided memory sculpting to achieve superior privacy-utility balance in continual learning.
LoFT uses parameter-efficient fine-tuning of foundation models for long-tailed semi-supervised learning, supported by proofs that this reduces hypothesis complexity to minimize balanced posterior error and compresses outlier acceptance regions, with LoFT-OW handling open-world OOD cases.
Sparse autoencoders plus greedy filtering and factorization-machine interaction modeling identify minimal sets of features in Gemma-2-2B-IT and LLaMA-3.1-8B-IT whose ablation produces jailbreaks by flipping refusal to compliance.
TeRA parametrizes high-rank LLM weight updates via a random Tucker-like tensor network with shared frozen factors and layer-specific scaling vectors, matching high-rank adapter performance at vector-level parameter counts.
CogDriver-Agent with sparse temporal memory and spatiotemporal distillation on CogDriver-Data achieves 22% higher closed-loop Driving Score on Bench2Drive and 21% lower mean L2 error on nuScenes.
GSR jointly trains LLMs to generate candidate solutions and refine a superior final answer from them, achieving state-of-the-art performance on five mathematical benchmarks while transferring across model scales.
Fin-PRM is a domain-specialized process reward model that supplies binary step-level and trajectory-level supervision signals for financial reasoning in LLMs and outperforms general PRMs on CFLUE and FinQA benchmarks.
citing papers explorer
-
Why Training-Free Token Reduction Collapses: The Inherent Instability of Pairwise Scoring Signals
Pairwise scoring signals in Vision Transformer token reduction are inherently unstable due to high perturbation counts and degrade in deep layers, causing collapse, while unary signals with triage enable CATIS to retain 96.9% accuracy at 63% FLOPs reduction on ViT-Large ImageNet-1K.
-
Dynamic Tool Dependency Retrieval for Lightweight Function Calling
DTDR dynamically retrieves relevant tools by modeling dependencies from demonstrations and conditioning on the evolving agent plan, improving function calling success rates by 23-104% over static retrievers across benchmarks.
-
Incremental Data-Driven Policy Synthesis via Game Abstractions
An incremental rank-lifting algorithm updates winning regions and policies in data-driven stochastic game abstractions by exploiting monotonic growth of under-approximations and shrinkage of over-approximations.
-
Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker
UWE is a task-agnostic bi-encoder that uses many-to-many InfoNCE and token-level soft late interaction to achieve zero-shot ranking across unseen work-related target spaces while using far fewer parameters than Qwen3-8B and improving MAP by 4.4 points.
-
MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents
MAS-Bench introduces 139 tasks, 88 predefined shortcuts, and 9 metrics to evaluate hybrid GUI-shortcut mobile agents, reporting up to 68.3% success and 39% efficiency gains over GUI-only baselines.
-
MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks
MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.
-
VoteGCL: Enhancing Graph-based Recommendations with Majority-Voting LLM-Rerank Augmentation
VoteGCL augments graph-based recommendation systems with high-confidence synthetic interactions generated via majority-voting LLM reranks and integrates them into graph contrastive learning to improve accuracy and reduce popularity bias.
-
One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms
OSPO trains optimal order dispatch policies for homogeneous AV fleets using only one-step group rewards, outperforming GRPO on a real ride-hailing dataset.
-
TRAM: Test-Time Risk Adaptation with Mixture of Agents
TRAM is a test-time mixture method that scores and composes risk-neutral source policies using reward and occupancy-based risk to achieve new reward-risk tradeoffs without parameter updates.
-
Assessing How Hate, Counterspeech, and Toxicity Affect Hate Group Newcomers
Counterspeech reduces the likelihood that hate-speech-using newcomers continue posting in hate subreddits, though toxic counterspeech raises the chance of continued hostility in the thread.
-
Attacking the Spike: On the Transferability and Security of Spiking Neural Networks to Adversarial Examples
MDSE attack uses dynamic multi-surrogate gradient estimation to create adversarial examples that simultaneously fool SNNs, ViTs, and CNNs, with reported gains up to 91.4% on ensembles and 3x on adversarially trained SNNs versus Auto-PGD.
-
When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation
Stronger reasoning models in LLMs reduce behavioral negotiation by defaulting to authority outcomes in multi-agent settings, unlike structured scaffolds that enable concessions.
-
Overcoming the "Impracticality" of RAG: Proposing a Real-World Benchmark and Multi-Dimensional Diagnostic Framework
Introduces a four-axis difficulty taxonomy integrated into an enterprise RAG benchmark to systematically diagnose multi-dimensional challenges like reasoning complexity and retrieval difficulty.
-
Beyond Static: Related Questions Retrieval Through Conversations in Community Question Answering
TeCQR retrieves related questions in cQA by generating tag-enhanced clarifying questions, using noise-tolerant semantic matching, and two-stage training to learn fine-grained representations of queries, questions, and tags.
-
STORM: Segment, Track, and Object Re-Localization from a Single Image
STORM enables robust reference-conditioned 6D pose tracking from one image via hierarchical spatial fusion attention and a BCE-trained verifier that detects drift for automatic re-initialization.
-
Routing-Based Continual Learning for Multimodal Large Language Models
Routing architecture for MLLMs enables continual learning with constant compute, matching multi-task learning performance and supporting cross-modal transfer.
-
Less Precise Can Be More Reliable: A Systematic Evaluation of Quantization's Impact on VLMs Beyond Accuracy
Quantization of VLMs improves multiple reliability metrics beyond accuracy by damping high-rank spectral components and promoting reliance on robust low-rank features.
-
Forget What's Sensitive, Remember What Matters: Token-Level Differential Privacy in Memory Sculpting for Continual Learning
PeCL applies token-level dynamic differential privacy and privacy-guided memory sculpting to achieve superior privacy-utility balance in continual learning.
-
LoFT: Parameter-Efficient Fine-Tuning for Long-tailed Semi-Supervised Learning in Open-World Scenarios
LoFT uses parameter-efficient fine-tuning of foundation models for long-tailed semi-supervised learning, supported by proofs that this reduces hypothesis complexity to minimize balanced posterior error and compresses outlier acceptance regions, with LoFT-OW handling open-world OOD cases.
-
Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal
Sparse autoencoders plus greedy filtering and factorization-machine interaction modeling identify minimal sets of features in Gemma-2-2B-IT and LLaMA-3.1-8B-IT whose ablation produces jailbreaks by flipping refusal to compliance.
-
TeRA: Vector-based Random Tensor Network for High-Rank Adaptation of Large Language Models
TeRA parametrizes high-rank LLM weight updates via a random Tucker-like tensor network with shared frozen factors and layer-specific scaling vectors, matching high-rank adapter performance at vector-level parameter counts.
-
CogDriver: Integrating Cognitive Inertia for Temporally Coherent Planning in Autonomous Driving
CogDriver-Agent with sparse temporal memory and spatiotemporal distillation on CogDriver-Data achieves 22% higher closed-loop Driving Score on Bench2Drive and 21% lower mean L2 error on nuScenes.
-
Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs
GSR jointly trains LLMs to generate candidate solutions and refine a superior final answer from them, achieving state-of-the-art performance on five mathematical benchmarks while transferring across model scales.
-
Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models
Fin-PRM is a domain-specialized process reward model that supplies binary step-level and trajectory-level supervision signals for financial reasoning in LLMs and outperforms general PRMs on CFLUE and FinQA benchmarks.
-
League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models
League of LLMs organizes LLMs into a self-governed mutual evaluation league using dynamic, transparent, objective, and professional criteria to distinguish model capabilities with 70.7% top-k ranking stability.
-
PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking
PRISM decomposes harmful instructions into benign visual gadgets and directs LVLMs via prompts to compose them through reasoning into harmful outputs, achieving ASR over 0.90 on SafeBench.
-
The Ratchet Effect in Silico through Interaction-Driven Cumulative Intelligence in Large Language Models
Populations of 1-4B parameter LLMs using peer verification and shared cultural memory achieve 8.8-18.9 point gains on mathematical reasoning tasks and close much of the gap to 70B+ single models.
-
Perception-Aware Policy Optimization for Multimodal Reasoning
PAPO integrates perception-aware supervision via a KL-based loss into RLVR methods like GRPO, yielding 4.4-17.5% gains on multimodal benchmarks and 30.5% fewer perception errors, with larger gains on vision-heavy tasks.
-
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription
Introduces OCR+PAGE-1 and OCR+PAGE-N prompting strategies that improve zero-shot multi-page handwritten document transcription by sharing context across pages.
-
ExPath: Targeted Pathway Inference for Biological Knowledge Bases via Graph Learning and Explanation
ExPath is a subgraph inference framework that classifies bio-networks with experimental data and uses explanations to identify targeted pathways, reporting up to 4.5x higher Fidelity+ and 14x lower Fidelity- than baselines on 301 networks.
-
ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation
ArchRAG proposes attributed-community hierarchical indexing and LLM clustering to improve accuracy and lower token usage in graph-based retrieval-augmented generation.
-
Motion-aware Contrastive Learning for Temporal Panoptic Scene Graph Generation
Motion-aware contrastive learning on mask tubes improves temporal panoptic scene graph generation over pooling-based methods on video and 4D datasets.
-
Multi-Scale Contrastive Learning for Video Temporal Grounding
A multi-scale and cross-scale contrastive learning framework uses intra-encoder stage features and a new sampling process to link short-range and long-range video moments for temporal grounding.
-
Explainable Representation of Finite-Memory Policies for POMDPs using Decision Trees
A translation method converts finite-state-controller policies for POMDPs into a decision-tree-plus-Mealy-machine form that is typically smaller and more explainable, with further simplifications for attractor-based policies.
-
SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks
SkillTree reduces continuous action spaces to discrete skills via a differentiable decision tree in a hierarchical policy, achieving comparable performance to neural skill methods with added skill-level explainability in robotic arm tasks.
-
Bayesian Inverse Transition Learning: Learning Dynamics From Near-Optimal Trajectories
A Bayesian method uses near-optimality constraints from expert trajectories to estimate transition dynamics in offline model-based reinforcement learning.
-
Uncovering the Internet's Hidden Values: An Empirical Study of Desirable Behavior Using Highly-Upvoted Content on Reddit
LLM analysis of highly-upvoted Reddit comments yields 64-72 macro/meso/micro values per year; existing prosocial measures capture only 18% on average while the method also recovers and extends prior qualitative taxonomies.
-
MVIGER: Multi-View Variational Integration of Complementary Knowledge for Generative Recommender
MVIGER integrates complementary knowledge from diverse prompts and indices in generative recommenders via a variational model with learnable prior over latent sources, showing superior performance on three datasets.
-
LaMSUM: Amplifying Voices Against Harassment through LLM Guided Extractive Summarization of User Incident Reports
LaMSUM is a novel multi-level LLM framework with voting methods for extractive summarization of large incident report collections that outperforms prior extractive methods.
-
READ: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling
READ recurrent adapters with partial video-language alignment via optimal transport outperform standard fine-tuning on low-resource temporal grounding and summarization tasks.
-
Studying Lobby Influence in the European Parliament
NLP comparison of lobby papers and MEP speeches discovers influence links validated indirectly via retweets and meetings, achieving AUC 0.77 and ideological alignment in aggregate analysis.
-
Image Captioning via Compact Bidirectional Architecture
Compact bidirectional transformer integrates L2R and R2L flows with sentence-level ensemble and two-flow self-critical training to achieve SOTA on MSCOCO without vision-language pretraining.
-
End-to-end PDDL Planning with Hardcoded and Dynamic Agents
An end-to-end LLM framework refines natural language into valid PDDL domains and problems via hardcoded and dynamic agents, generates plans with standard engines, and returns readable output.
-
End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering
CLSR is an end-to-end contrastive language-speech retriever using an intermediate text-like conversion step to improve retrieval of relevant segments from long audio for spoken question answering.
-
COGNOS: Universal Enhancement for Time Series Anomaly Detection via Constrained Gaussian-Noise Optimization and Smoothing
COGNOS improves reconstruction-based time series anomaly detection by enforcing Gaussian white noise residuals through training regularization and applying adaptive Kalman smoothing to produce more stable anomaly scores.
-
MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains
MULTIBENCH++ is a new large-scale benchmark integrating over 30 datasets across 15 modalities and 20 tasks, accompanied by an open-source automated evaluation pipeline that establishes new performance baselines for multimodal fusion.
-
Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning
A two-stage distillation plus reinforced fine-tuning approach produces a single humanoid locomotion controller that adapts across skills and irregular terrains.
-
Bridging the phenotype-target gap for molecular generation via multi-objective reinforcement learning
SmilesGEN uses dual VAEs to jointly model drug structures and transcriptional responses, generating molecules with higher validity, novelty, and similarity to known ligands than prior methods.
-
Simulating Online Social Media Conversations on Controversial Topics Using AI Agents Calibrated on Real-World Data
LLM agents calibrated on Italian election data produce coherent posts and realistic network structure but show less tone and toxicity variation than real users, with opinion changes resembling traditional mathematical models.
-
LifeAlign: Lifelong Alignment for Large Language Models with Memory-Augmented Focalized Preference Optimization
LifeAlign uses focalized preference optimization and short-to-long memory consolidation via dimensionality reduction to let LLMs align with new preferences while retaining prior knowledge.